NASA Astrophysics Data System (ADS)
Zhao, Shaoshuai; Ni, Chen; Cao, Jing; Li, Zhengqiang; Chen, Xingfeng; Ma, Yan; Yang, Leiku; Hou, Weizhen; Qie, Lili; Ge, Bangyu; Liu, Li; Xing, Jin
2018-03-01
The remote sensing image is usually polluted by atmosphere components especially like aerosol particles. For the quantitative remote sensing applications, the radiative transfer model based atmospheric correction is used to get the reflectance with decoupling the atmosphere and surface by consuming a long computational time. The parallel computing is a solution method for the temporal acceleration. The parallel strategy which uses multi-CPU to work simultaneously is designed to do atmospheric correction for a multispectral remote sensing image. The parallel framework's flow and the main parallel body of atmospheric correction are described. Then, the multispectral remote sensing image of the Chinese Gaofen-2 satellite is used to test the acceleration efficiency. When the CPU number is increasing from 1 to 8, the computational speed is also increasing. The biggest acceleration rate is 6.5. Under the 8 CPU working mode, the whole image atmospheric correction costs 4 minutes.
Otazo, Ricardo; Lin, Fa-Hsuan; Wiggins, Graham; Jordan, Ramiro; Sodickson, Daniel; Posse, Stefan
2009-01-01
Standard parallel magnetic resonance imaging (MRI) techniques suffer from residual aliasing artifacts when the coil sensitivities vary within the image voxel. In this work, a parallel MRI approach known as Superresolution SENSE (SURE-SENSE) is presented in which acceleration is performed by acquiring only the central region of k-space instead of increasing the sampling distance over the complete k-space matrix and reconstruction is explicitly based on intra-voxel coil sensitivity variation. In SURE-SENSE, parallel MRI reconstruction is formulated as a superresolution imaging problem where a collection of low resolution images acquired with multiple receiver coils are combined into a single image with higher spatial resolution using coil sensitivities acquired with high spatial resolution. The effective acceleration of conventional gradient encoding is given by the gain in spatial resolution, which is dictated by the degree of variation of the different coil sensitivity profiles within the low resolution image voxel. Since SURE-SENSE is an ill-posed inverse problem, Tikhonov regularization is employed to control noise amplification. Unlike standard SENSE, for which acceleration is constrained to the phase-encoding dimension/s, SURE-SENSE allows acceleration along all encoding directions — for example, two-dimensional acceleration of a 2D echo-planar acquisition. SURE-SENSE is particularly suitable for low spatial resolution imaging modalities such as spectroscopic imaging and functional imaging with high temporal resolution. Application to echo-planar functional and spectroscopic imaging in human brain is presented using two-dimensional acceleration with a 32-channel receiver coil. PMID:19341804
Ji, Jim; Wright, Steven
2005-01-01
Parallel imaging using multiple phased-array coils and receiver channels has become an effective approach to high-speed magnetic resonance imaging (MRI). To obtain high spatiotemporal resolution, the k-space is subsampled and later interpolated using multiple channel data. Higher subsampling factors result in faster image acquisition. However, the subsampling factors are upper-bounded by the number of parallel channels. Phase constraints have been previously proposed to overcome this limitation with some success. In this paper, we demonstrate that in certain applications it is possible to obtain acceleration factors potentially up to twice the channel numbers by using a real image constraint. Data acquisition and processing methods to manipulate and estimate of the image phase information are presented for improving image reconstruction. In-vivo brain MRI experimental results show that accelerations up to 6 are feasible with 4-channel data.
Accelerating EPI distortion correction by utilizing a modern GPU-based parallel computation.
Yang, Yao-Hao; Huang, Teng-Yi; Wang, Fu-Nien; Chuang, Tzu-Chao; Chen, Nan-Kuei
2013-04-01
The combination of phase demodulation and field mapping is a practical method to correct echo planar imaging (EPI) geometric distortion. However, since phase dispersion accumulates in each phase-encoding step, the calculation complexity of phase modulation is Ny-fold higher than conventional image reconstructions. Thus, correcting EPI images via phase demodulation is generally a time-consuming task. Parallel computing by employing general-purpose calculations on graphics processing units (GPU) can accelerate scientific computing if the algorithm is parallelized. This study proposes a method that incorporates the GPU-based technique into phase demodulation calculations to reduce computation time. The proposed parallel algorithm was applied to a PROPELLER-EPI diffusion tensor data set. The GPU-based phase demodulation method reduced the EPI distortion correctly, and accelerated the computation. The total reconstruction time of the 16-slice PROPELLER-EPI diffusion tensor images with matrix size of 128 × 128 was reduced from 1,754 seconds to 101 seconds by utilizing the parallelized 4-GPU program. GPU computing is a promising method to accelerate EPI geometric correction. The resulting reduction in computation time of phase demodulation should accelerate postprocessing for studies performed with EPI, and should effectuate the PROPELLER-EPI technique for clinical practice. Copyright © 2011 by the American Society of Neuroimaging.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-01-01
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-04-07
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P.
2016-01-01
Purpose Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. Theory and Methods The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly-accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely-used calibrationless uniformly-undersampled trajectories. Results Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. Conclusion The SENSE-LORAKS framework provides promising new opportunities for highly-accelerated MRI. PMID:27037836
Accelerating separable footprint (SF) forward and back projection on GPU
NASA Astrophysics Data System (ADS)
Xie, Xiaobin; McGaffin, Madison G.; Long, Yong; Fessler, Jeffrey A.; Wen, Minhua; Lin, James
2017-03-01
Statistical image reconstruction (SIR) methods for X-ray CT can improve image quality and reduce radiation dosages over conventional reconstruction methods, such as filtered back projection (FBP). However, SIR methods require much longer computation time. The separable footprint (SF) forward and back projection technique simplifies the calculation of intersecting volumes of image voxels and finite-size beams in a way that is both accurate and efficient for parallel implementation. We propose a new method to accelerate the SF forward and back projection on GPU with NVIDIA's CUDA environment. For the forward projection, we parallelize over all detector cells. For the back projection, we parallelize over all 3D image voxels. The simulation results show that the proposed method is faster than the acceleration method of the SF projectors proposed by Wu and Fessler.13 We further accelerate the proposed method using multiple GPUs. The results show that the computation time is reduced approximately proportional to the number of GPUs.
Filli, Lukas; Piccirelli, Marco; Kenkel, David; Guggenberger, Roman; Andreisek, Gustav; Beck, Thomas; Runge, Val M; Boss, Andreas
2015-07-01
The aim of this study was to investigate the feasibility of accelerated diffusion tensor imaging (DTI) of skeletal muscle using echo planar imaging (EPI) applying simultaneous multislice excitation with a blipped controlled aliasing in parallel imaging results in higher acceleration unaliasing technique. After federal ethics board approval, the lower leg muscles of 8 healthy volunteers (mean [SD] age, 29.4 [2.9] years) were examined in a clinical 3-T magnetic resonance scanner using a 15-channel knee coil. The EPI was performed at a b value of 500 s/mm2 without slice acceleration (conventional DTI) as well as with 2-fold and 3-fold acceleration. Fractional anisotropy (FA) and mean diffusivity (MD) were measured in all 3 acquisitions. Fiber tracking performance was compared between the acquisitions regarding the number of tracks, average track length, and anatomical precision using multivariate analysis of variance and Mann-Whitney U tests. Acquisition time was 7:24 minutes for conventional DTI, 3:53 minutes for 2-fold acceleration, and 2:38 minutes for 3-fold acceleration. Overall FA and MD values ranged from 0.220 to 0.378 and 1.595 to 1.829 mm2/s, respectively. Two-fold acceleration yielded similar FA and MD values (P ≥ 0.901) and similar fiber tracking performance compared with conventional DTI. Three-fold acceleration resulted in comparable MD (P = 0.199) but higher FA values (P = 0.006) and significantly impaired fiber tracking in the soleus and tibialis anterior muscles (number of tracks, P < 0.001; anatomical precision, P ≤ 0.005). Simultaneous multislice EPI with blipped controlled aliasing in parallel imaging results in higher acceleration can remarkably reduce acquisition time in DTI of skeletal muscle with similar image quality and quantification accuracy of diffusion parameters. This may increase the clinical applicability of muscle anisotropy measurements.
Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P
2017-03-01
Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely used calibrationless uniformly undersampled trajectories. Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. The SENSE-LORAKS framework provides promising new opportunities for highly accelerated MRI. Magn Reson Med 77:1021-1035, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
NASA Astrophysics Data System (ADS)
Akil, Mohamed
2017-05-01
The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Robson, Philip M; Grant, Aaron K; Madhuranthakam, Ananth J; Lattanzi, Riccardo; Sodickson, Daniel K; McKenzie, Charles A
2008-10-01
Parallel imaging reconstructions result in spatially varying noise amplification characterized by the g-factor, precluding conventional measurements of noise from the final image. A simple Monte Carlo based method is proposed for all linear image reconstruction algorithms, which allows measurement of signal-to-noise ratio and g-factor and is demonstrated for SENSE and GRAPPA reconstructions for accelerated acquisitions that have not previously been amenable to such assessment. Only a simple "prescan" measurement of noise amplitude and correlation in the phased-array receiver, and a single accelerated image acquisition are required, allowing robust assessment of signal-to-noise ratio and g-factor. The "pseudo multiple replica" method has been rigorously validated in phantoms and in vivo, showing excellent agreement with true multiple replica and analytical methods. This method is universally applicable to the parallel imaging reconstruction techniques used in clinical applications and will allow pixel-by-pixel image noise measurements for all parallel imaging strategies, allowing quantitative comparison between arbitrary k-space trajectories, image reconstruction, or noise conditioning techniques. (c) 2008 Wiley-Liss, Inc.
Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease
Shamonin, Denis P.; Bron, Esther E.; Lelieveldt, Boudewijn P. F.; Smits, Marion; Klein, Stefan; Staring, Marius
2013-01-01
Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: (i) parallelization on the CPU, to speed up the cost function derivative calculation; (ii) parallelization on the GPU building on and extending the OpenCL framework from ITKv4, to speed up the Gaussian pyramid computation and the image resampling step; (iii) exploitation of certain properties of the B-spline transformation model; (iv) further software optimizations. The accelerated registration tool is employed in a study on diagnostic classification of Alzheimer's disease and cognitively normal controls based on T1-weighted MRI. We selected 299 participants from the publicly available Alzheimer's Disease Neuroimaging Initiative database. Classification is performed with a support vector machine based on gray matter volumes as a marker for atrophy. We evaluated two types of strategies (voxel-wise and region-wise) that heavily rely on nonrigid image registration. Parallelization and optimization resulted in an acceleration factor of 4–5x on an 8-core machine. Using OpenCL a speedup factor of 2 was realized for computation of the Gaussian pyramids, and 15–60 for the resampling step, for larger images. The voxel-wise and the region-wise classification methods had an area under the receiver operator characteristic curve of 88 and 90%, respectively, both for standard and accelerated registration. We conclude that the image registration package elastix was substantially accelerated, with nearly identical results to the non-optimized version. The new functionality will become available in the next release of elastix as open source under the BSD license. PMID:24474917
Parallel Reconstruction Using Null Operations (PRUNO)
Zhang, Jian; Liu, Chunlei; Moseley, Michael E.
2011-01-01
A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Sodickson, Daniel K.
2010-01-01
Cardiovascular magnetic resonance imaging (CVMRI) is of proven clinical value in the non-invasive imaging of cardiovascular diseases. CVMRI requires rapid image acquisition, but acquisition speed is fundamentally limited in conventional MRI. Parallel imaging provides a means for increasing acquisition speed and efficiency. However, signal-to-noise (SNR) limitations and the limited number of receiver channels available on most MR systems have in the past imposed practical constraints, which dictated the use of moderate accelerations in CVMRI. High levels of acceleration, which were unattainable previously, have become possible with many-receiver MR systems and many-element, cardiac-optimized RF-coil arrays. The resulting imaging speed improvements can be exploited in a number of ways, ranging from enhancement of spatial and temporal resolution to efficient whole heart coverage to streamlining of CVMRI work flow. In this review, examples of these strategies are provided, following an outline of the fundamentals of the highly accelerated imaging approaches employed in CVMRI. Topics discussed include basic principles of parallel imaging; key requirements for MR systems and RF-coil design; practical considerations of SNR management, supported by multi-dimensional accelerations, 3D noise averaging and high field imaging; highly accelerated clinical state-of-the art cardiovascular imaging applications spanning the range from SNR-rich to SNR-limited; and current trends and future directions. PMID:17562047
Accelerated Slice Encoding for Metal Artifact Correction
Hargreaves, Brian A.; Chen, Weitian; Lu, Wenmiao; Alley, Marcus T.; Gold, Garry E.; Brau, Anja C. S.; Pauly, John M.; Pauly, Kim Butts
2010-01-01
Purpose To demonstrate accelerated imaging with artifact reduction near metallic implants and different contrast mechanisms. Materials and Methods Slice-encoding for metal artifact correction (SEMAC) is a modified spin echo sequence that uses view-angle tilting and slice-direction phase encoding to correct both in-plane and through-plane artifacts. Standard spin echo trains and short-TI inversion recovery (STIR) allow efficient PD-weighted imaging with optional fat suppression. A completely linear reconstruction allows incorporation of parallel imaging and partial Fourier imaging. The SNR effects of all reconstructions were quantified in one subject. 10 subjects with different metallic implants were scanned using SEMAC protocols, all with scan times below 11 minutes, as well as with standard spin echo methods. Results The SNR using standard acceleration techniques is unaffected by the linear SEMAC reconstruction. In all cases with implants, accelerated SEMAC significantly reduced artifacts compared with standard imaging techniques, with no additional artifacts from acceleration techniques. The use of different contrast mechanisms allowed differentiation of fluid from other structures in several subjects. Conclusion SEMAC imaging can be combined with standard echo-train imaging, parallel imaging, partial-Fourier imaging and inversion recovery techniques to offer flexible image contrast with a dramatic reduction of metal-induced artifacts in scan times under 11 minutes. PMID:20373445
Accelerated slice encoding for metal artifact correction.
Hargreaves, Brian A; Chen, Weitian; Lu, Wenmiao; Alley, Marcus T; Gold, Garry E; Brau, Anja C S; Pauly, John M; Pauly, Kim Butts
2010-04-01
To demonstrate accelerated imaging with both artifact reduction and different contrast mechanisms near metallic implants. Slice-encoding for metal artifact correction (SEMAC) is a modified spin echo sequence that uses view-angle tilting and slice-direction phase encoding to correct both in-plane and through-plane artifacts. Standard spin echo trains and short-TI inversion recovery (STIR) allow efficient PD-weighted imaging with optional fat suppression. A completely linear reconstruction allows incorporation of parallel imaging and partial Fourier imaging. The signal-to-noise ratio (SNR) effects of all reconstructions were quantified in one subject. Ten subjects with different metallic implants were scanned using SEMAC protocols, all with scan times below 11 minutes, as well as with standard spin echo methods. The SNR using standard acceleration techniques is unaffected by the linear SEMAC reconstruction. In all cases with implants, accelerated SEMAC significantly reduced artifacts compared with standard imaging techniques, with no additional artifacts from acceleration techniques. The use of different contrast mechanisms allowed differentiation of fluid from other structures in several subjects. SEMAC imaging can be combined with standard echo-train imaging, parallel imaging, partial-Fourier imaging, and inversion recovery techniques to offer flexible image contrast with a dramatic reduction of metal-induced artifacts in scan times under 11 minutes. (c) 2010 Wiley-Liss, Inc.
Pandit, Prachi; Rivoire, Julien; King, Kevin; Li, Xiaojuan
2016-03-01
Quantitative T1ρ imaging is beneficial for early detection for osteoarthritis but has seen limited clinical use due to long scan times. In this study, we evaluated the feasibility of accelerated T1ρ mapping for knee cartilage quantification using a combination of compressed sensing (CS) and data-driven parallel imaging (ARC-Autocalibrating Reconstruction for Cartesian sampling). A sequential combination of ARC and CS, both during data acquisition and reconstruction, was used to accelerate the acquisition of T1ρ maps. Phantom, ex vivo (porcine knee), and in vivo (human knee) imaging was performed on a GE 3T MR750 scanner. T1ρ quantification after CS-accelerated acquisition was compared with non CS-accelerated acquisition for various cartilage compartments. Accelerating image acquisition using CS did not introduce major deviations in quantification. The coefficient of variation for the root mean squared error increased with increasing acceleration, but for in vivo measurements, it stayed under 5% for a net acceleration factor up to 2, where the acquisition was 25% faster than the reference (only ARC). To the best of our knowledge, this is the first implementation of CS for in vivo T1ρ quantification. These early results show that this technique holds great promise in making quantitative imaging techniques more accessible for clinical applications. © 2015 Wiley Periodicals, Inc.
An L1-norm phase constraint for half-Fourier compressed sensing in 3D MR imaging.
Li, Guobin; Hennig, Jürgen; Raithel, Esther; Büchert, Martin; Paul, Dominik; Korvink, Jan G; Zaitsev, Maxim
2015-10-01
In most half-Fourier imaging methods, explicit phase replacement is used. In combination with parallel imaging, or compressed sensing, half-Fourier reconstruction is usually performed in a separate step. The purpose of this paper is to report that integration of half-Fourier reconstruction into iterative reconstruction minimizes reconstruction errors. The L1-norm phase constraint for half-Fourier imaging proposed in this work is compared with the L2-norm variant of the same algorithm, with several typical half-Fourier reconstruction methods. Half-Fourier imaging with the proposed phase constraint can be seamlessly combined with parallel imaging and compressed sensing to achieve high acceleration factors. In simulations and in in-vivo experiments half-Fourier imaging with the proposed L1-norm phase constraint enables superior performance both reconstruction of image details and with regard to robustness against phase estimation errors. The performance and feasibility of half-Fourier imaging with the proposed L1-norm phase constraint is reported. Its seamless combination with parallel imaging and compressed sensing enables use of greater acceleration in 3D MR imaging.
Klix, Sabrina; Hezel, Fabian; Fuchs, Katharina; Ruff, Jan; Dieringer, Matthias A.; Niendorf, Thoralf
2014-01-01
Purpose Design, validation and application of an accelerated fast spin-echo (FSE) variant that uses a split-echo approach for self-calibrated parallel imaging. Methods For self-calibrated, split-echo FSE (SCSE-FSE), extra displacement gradients were incorporated into FSE to decompose odd and even echo groups which were independently phase encoded to derive coil sensitivity maps, and to generate undersampled data (reduction factor up to R = 3). Reference and undersampled data were acquired simultaneously. SENSE reconstruction was employed. Results The feasibility of SCSE-FSE was demonstrated in phantom studies. Point spread function performance of SCSE-FSE was found to be competitive with traditional FSE variants. The immunity of SCSE-FSE for motion induced mis-registration between reference and undersampled data was shown using a dynamic left ventricular model and cardiac imaging. The applicability of black blood prepared SCSE-FSE for cardiac imaging was demonstrated in healthy volunteers including accelerated multi-slice per breath-hold imaging and accelerated high spatial resolution imaging. Conclusion SCSE-FSE obviates the need of external reference scans for SENSE reconstructed parallel imaging with FSE. SCSE-FSE reduces the risk for mis-registration between reference scans and accelerated acquisitions. SCSE-FSE is feasible for imaging of the heart and of large cardiac vessels but also meets the needs of brain, abdominal and liver imaging. PMID:24728341
Gai, Jiading; Obeid, Nady; Holtrop, Joseph L.; Wu, Xiao-Long; Lam, Fan; Fu, Maojing; Haldar, Justin P.; Hwu, Wen-mei W.; Liang, Zhi-Pei; Sutton, Bradley P.
2013-01-01
Several recent methods have been proposed to obtain significant speed-ups in MRI image reconstruction by leveraging the computational power of GPUs. Previously, we implemented a GPU-based image reconstruction technique called the Illinois Massively Parallel Acquisition Toolkit for Image reconstruction with ENhanced Throughput in MRI (IMPATIENT MRI) for reconstructing data collected along arbitrary 3D trajectories. In this paper, we improve IMPATIENT by removing computational bottlenecks by using a gridding approach to accelerate the computation of various data structures needed by the previous routine. Further, we enhance the routine with capabilities for off-resonance correction and multi-sensor parallel imaging reconstruction. Through implementation of optimized gridding into our iterative reconstruction scheme, speed-ups of more than a factor of 200 are provided in the improved GPU implementation compared to the previous accelerated GPU code. PMID:23682203
Schnaiter, Johannes Walter; Roemer, Frank; McKenna-Kuettner, Axel; Patzak, Hans-Joachim; May, Matthias Stefan; Janka, Rolf; Uder, Michael; Wuest, Wolfgang
2018-03-01
Parallel imaging allows for a considerable shortening of examination times. Limited data is available about the diagnostic accuracy of an accelerated knee MRI protocol based on parallel imaging evaluating all knee joint compartments in a large patient population compared to arthroscopy. 162 consecutive patients with a knee MRI (1.5 T, Siemens Aera) and arthroscopy were included. The total MRI scan time was less than 9 minutes. Meniscus and cartilage injuries, cruciate ligament lesions, loose joint bodies and medial patellar plicae were evaluated. Sensitivity (SE), specificity (SP), positive predictive value (PPV), and negative predictive value (NPV), as well as diagnostic accuracy were determined. For the medial meniscus, the values were: SE 97 %, SP 88 %, PPV 94 %, and NPV 94 %. For the lateral meniscus the values were: SE 77 %, SP 99 %, PPV 98 %, and NPV 89 %. For cartilage injuries the values were: SE 72 %, SP 80 %, PPV 86 %, and NPV 61 %. For the anterior cruciate ligament the values were: SE 90 %, SP 94 %, PPV 77 %, and NPV 98 %, while all values were 100 % for the posterior cruciate ligament. For loose bodies the values were: SE 48 %, SP 96 %, PPV 62 %, and NPV 93 %, and for the medial patellar plicae the values were: SE 57 %, SP 88 %, PPV 18 %, and NPV 98 %. A knee MRI examination with parallel imaging and a scan time of less than 9 minutes delivers reliable results with high diagnostic accuracy. · An accelerated knee MRI protocol with parallel imaging allows for high diagnostic accuracy.. · Especially meniscal and cruciate ligament injuries are well depicted.. · Cartilage injuries seem to be overestimated.. · Schnaiter JW, Roemer F, McKenna-Kuettner A et al. Diagnostic Accuracy of an MRI Protocol of the Knee Accelerated Through Parallel Imaging in Correlation to Arthroscopy. Fortschr Röntgenstr 2018; 190: 265 - 272. © Georg Thieme Verlag KG Stuttgart · New York.
Xu, Jian; Kim, Daniel; Otazo, Ricardo; Srichai, Monvadi B; Lim, Ruth P; Axel, Leon; Mcgorty, Kelly Anne; Niendorf, Thoralf; Sodickson, Daniel K
2013-07-01
To evaluate the feasibility and perform initial comparative evaluations of a 5-minute comprehensive whole-heart magnetic resonance imaging (MRI) protocol with four image acquisition types: perfusion (PERF), function (CINE), coronary artery imaging (CAI), and late gadolinium enhancement (LGE). This study protocol was Health Insurance Portability and Accountability Act (HIPAA)-compliant and Institutional Review Board-approved. A 5-minute comprehensive whole-heart MRI examination protocol (Accelerated) using 6-8-fold-accelerated volumetric parallel imaging was incorporated into and compared with a standard 2D clinical routine protocol (Standard). Following informed consent, 20 patients were imaged with both protocols. Datasets were reviewed for image quality using a 5-point Likert scale (0 = non-diagnostic, 4 = excellent) in blinded fashion by two readers. Good image quality with full whole-heart coverage was achieved using the accelerated protocol, particularly for CAI, although significant degradations in quality, as compared with traditional lengthy examinations, were observed for the other image types. Mean total scan time was significantly lower for the Accelerated as compared to Standard protocols (28.99 ± 4.59 min vs. 1.82 ± 0.05 min, P < 0.05). Overall image quality for the Standard vs. Accelerated protocol was 3.67 ± 0.29 vs. 1.5 ± 0.51 (P < 0.005) for PERF, 3.48 ± 0.64 vs. 2.6 ± 0.68 (P < 0.005) for CINE, 2.35 ± 1.01 vs. 2.48 ± 0.68 (P = 0.75) for CAI, and 3.67 ± 0.42 vs. 2.67 ± 0.84 (P < 0.005) for LGE. Diagnostic image quality for Standard vs. Accelerated protocols was 20/20 (100%) vs. 10/20 (50%) for PERF, 20/20 (100%) vs. 18/20 (90%) for CINE, 18/20 (90%) vs. 18/20 (90%) for CAI, and 20/20 (100%) vs. 18/20 (90%) for LGE. This study demonstrates the technical feasibility and promising image quality of 5-minute comprehensive whole-heart cardiac examinations, with simplified scan prescription and high spatial and temporal resolution enabled by highly parallel imaging technology. The study also highlights technical hurdles that remain to be addressed. Although image quality remained diagnostic for most scan types, the reduced image quality of PERF, CINE, and LGE scans in the Accelerated protocol remain a concern. Copyright © 2012 Wiley Periodicals, Inc.
3D hyperpolarized C-13 EPI with calibrationless parallel imaging
NASA Astrophysics Data System (ADS)
Gordon, Jeremy W.; Hansen, Rie B.; Shin, Peter J.; Feng, Yesu; Vigneron, Daniel B.; Larson, Peder E. Z.
2018-04-01
With the translation of metabolic MRI with hyperpolarized 13C agents into the clinic, imaging approaches will require large volumetric FOVs to support clinical applications. Parallel imaging techniques will be crucial to increasing volumetric scan coverage while minimizing RF requirements and temporal resolution. Calibrationless parallel imaging approaches are well-suited for this application because they eliminate the need to acquire coil profile maps or auto-calibration data. In this work, we explored the utility of a calibrationless parallel imaging method (SAKE) and corresponding sampling strategies to accelerate and undersample hyperpolarized 13C data using 3D blipped EPI acquisitions and multichannel receive coils, and demonstrated its application in a human study of [1-13C]pyruvate metabolism.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Chuan, E-mail: chuan.huang@stonybrookmedicine.edu; Department of Radiology, Harvard Medical School, Boston, Massachusetts 02115; Departments of Radiology, Psychiatry, Stony Brook Medicine, Stony Brook, New York 11794
2015-02-15
Purpose: Degradation of image quality caused by cardiac and respiratory motions hampers the diagnostic quality of cardiac PET. It has been shown that improved diagnostic accuracy of myocardial defect can be achieved by tagged MR (tMR) based PET motion correction using simultaneous PET-MR. However, one major hurdle for the adoption of tMR-based PET motion correction in the PET-MR routine is the long acquisition time needed for the collection of fully sampled tMR data. In this work, the authors propose an accelerated tMR acquisition strategy using parallel imaging and/or compressed sensing and assess the impact on the tMR-based motion corrected PETmore » using phantom and patient data. Methods: Fully sampled tMR data were acquired simultaneously with PET list-mode data on two simultaneous PET-MR scanners for a cardiac phantom and a patient. Parallel imaging and compressed sensing were retrospectively performed by GRAPPA and kt-FOCUSS algorithms with various acceleration factors. Motion fields were estimated using nonrigid B-spline image registration from both the accelerated and fully sampled tMR images. The motion fields were incorporated into a motion corrected ordered subset expectation maximization reconstruction algorithm with motion-dependent attenuation correction. Results: Although tMR acceleration introduced image artifacts into the tMR images for both phantom and patient data, motion corrected PET images yielded similar image quality as those obtained using the fully sampled tMR images for low to moderate acceleration factors (<4). Quantitative analysis of myocardial defect contrast over ten independent noise realizations showed similar results. It was further observed that although the image quality of the motion corrected PET images deteriorates for high acceleration factors, the images were still superior to the images reconstructed without motion correction. Conclusions: Accelerated tMR images obtained with more than 4 times acceleration can still provide relatively accurate motion fields and yield tMR-based motion corrected PET images with similar image quality as those reconstructed using fully sampled tMR data. The reduction of tMR acquisition time makes it more compatible with routine clinical cardiac PET-MR studies.« less
A New Joint-Blade SENSE Reconstruction for Accelerated PROPELLER MRI
Lyu, Mengye; Liu, Yilong; Xie, Victor B.; Feng, Yanqiu; Guo, Hua; Wu, Ed X.
2017-01-01
PROPELLER technique is widely used in MRI examinations for being motion insensitive, but it prolongs scan time and is restricted mainly to T2 contrast. Parallel imaging can accelerate PROPELLER and enable more flexible contrasts. Here, we propose a multi-step joint-blade (MJB) SENSE reconstruction to reduce the noise amplification in parallel imaging accelerated PROPELLER. MJB SENSE utilizes the fact that PROPELLER blades contain sharable information and blade-combined images can serve as regularization references. It consists of three steps. First, conventional blade-combined images are obtained using the conventional simple single-blade (SSB) SENSE, which reconstructs each blade separately. Second, the blade-combined images are employed as regularization for blade-wise noise reduction. Last, with virtual high-frequency data resampled from the previous step, all blades are jointly reconstructed to form the final images. Simulations were performed to evaluate the proposed MJB SENSE for noise reduction and motion correction. MJB SENSE was also applied to both T2-weighted and T1-weighted in vivo brain data. Compared to SSB SENSE, MJB SENSE greatly reduced the noise amplification at various acceleration factors, leading to increased image SNR in all simulation and in vivo experiments, including T1-weighted imaging with short echo trains. Furthermore, it preserved motion correction capability and was computationally efficient. PMID:28205602
A New Joint-Blade SENSE Reconstruction for Accelerated PROPELLER MRI.
Lyu, Mengye; Liu, Yilong; Xie, Victor B; Feng, Yanqiu; Guo, Hua; Wu, Ed X
2017-02-16
PROPELLER technique is widely used in MRI examinations for being motion insensitive, but it prolongs scan time and is restricted mainly to T2 contrast. Parallel imaging can accelerate PROPELLER and enable more flexible contrasts. Here, we propose a multi-step joint-blade (MJB) SENSE reconstruction to reduce the noise amplification in parallel imaging accelerated PROPELLER. MJB SENSE utilizes the fact that PROPELLER blades contain sharable information and blade-combined images can serve as regularization references. It consists of three steps. First, conventional blade-combined images are obtained using the conventional simple single-blade (SSB) SENSE, which reconstructs each blade separately. Second, the blade-combined images are employed as regularization for blade-wise noise reduction. Last, with virtual high-frequency data resampled from the previous step, all blades are jointly reconstructed to form the final images. Simulations were performed to evaluate the proposed MJB SENSE for noise reduction and motion correction. MJB SENSE was also applied to both T2-weighted and T1-weighted in vivo brain data. Compared to SSB SENSE, MJB SENSE greatly reduced the noise amplification at various acceleration factors, leading to increased image SNR in all simulation and in vivo experiments, including T1-weighted imaging with short echo trains. Furthermore, it preserved motion correction capability and was computationally efficient.
Novel 16-channel receive coil array for accelerated upper airway MRI at 3 Tesla.
Kim, Yoon-Chul; Hayes, Cecil E; Narayanan, Shrikanth S; Nayak, Krishna S
2011-06-01
Upper airway MRI can provide a noninvasive assessment of speech and swallowing disorders and sleep apnea. Recent work has demonstrated the value of high-resolution three-dimensional imaging and dynamic two-dimensional imaging and the importance of further improvements in spatio-temporal resolution. The purpose of the study was to describe a novel 16-channel 3 Tesla receive coil that is highly sensitive to the human upper airway and investigate the performance of accelerated upper airway MRI with the coil. In three-dimensional imaging of the upper airway during static posture, 6-fold acceleration is demonstrated using parallel imaging, potentially leading to capturing a whole three-dimensional vocal tract with 1.25 mm isotropic resolution within 9 sec of sustained sound production. Midsagittal spiral parallel imaging of vocal tract dynamics during natural speech production is demonstrated with 2 × 2 mm(2) in-plane spatial and 84 ms temporal resolution. Copyright © 2010 Wiley-Liss, Inc.
Accelerated Adaptive MGS Phase Retrieval
NASA Technical Reports Server (NTRS)
Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang
2011-01-01
The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Parallel image registration with a thin client interface
NASA Astrophysics Data System (ADS)
Saiprasad, Ganesh; Lo, Yi-Jung; Plishker, William; Lei, Peng; Ahmad, Tabassum; Shekhar, Raj
2010-03-01
Despite its high significance, the clinical utilization of image registration remains limited because of its lengthy execution time and a lack of easy access. The focus of this work was twofold. First, we accelerated our course-to-fine, volume subdivision-based image registration algorithm by a novel parallel implementation that maintains the accuracy of our uniprocessor implementation. Second, we developed a thin-client computing model with a user-friendly interface to perform rigid and nonrigid image registration. Our novel parallel computing model uses the message passing interface model on a 32-core cluster. The results show that, compared with the uniprocessor implementation, the parallel implementation of our image registration algorithm is approximately 5 times faster for rigid image registration and approximately 9 times faster for nonrigid registration for the images used. To test the viability of such systems for clinical use, we developed a thin client in the form of a plug-in in OsiriX, a well-known open source PACS workstation and DICOM viewer, and used it for two applications. The first application registered the baseline and follow-up MR brain images, whose subtraction was used to track progression of multiple sclerosis. The second application registered pretreatment PET and intratreatment CT of radiofrequency ablation patients to demonstrate a new capability of multimodality imaging guidance. The registration acceleration coupled with the remote implementation using a thin client should ultimately increase accuracy, speed, and access of image registration-based interpretations in a number of diagnostic and interventional applications.
Hardware implementation of hierarchical volume subdivision-based elastic registration.
Dandekar, Omkar; Walimbe, Vivek; Shekhar, Raj
2006-01-01
Real-time, elastic and fully automated 3D image registration is critical to the efficiency and effectiveness of many image-guided diagnostic and treatment procedures relying on multimodality image fusion or serial image comparison. True, real-time performance will make many 3D image registration-based techniques clinically viable. Hierarchical volume subdivision-based image registration techniques are inherently faster than most elastic registration techniques, e.g. free-form deformation (FFD)-based techniques, and are more amenable for achieving real-time performance through hardware acceleration. Our group has previously reported an FPGA-based architecture for accelerating FFD-based image registration. In this article we show how our existing architecture can be adapted to support hierarchical volume subdivision-based image registration. A proof-of-concept implementation of the architecture achieved speedups of 100 for elastic registration against an optimized software implementation on a 3.2 GHz Pentium III Xeon workstation. Due to inherent parallel nature of the hierarchical volume subdivision-based image registration techniques further speedup can be achieved by using several computing modules in parallel.
Xu, Jian; Mcgorty, Kelly Anne; Lim, Ruth. P.; Bruno, Mary; Babb, James S.; Srichai, Monvadi B.; Kim, Daniel; Sodickson, Daniel K.
2011-01-01
OBJECTIVE To evaluate the feasibility of performing single breath-hold 3D thoracic non-contrast magnetic resonance angiography (NC-MRA) using highly-accelerated parallel imaging. MATERIALS AND METHODS We developed a single breath-hold NC MRA pulse sequence using balanced steady state free precession (SSFP) readout and highly-accelerated parallel imaging. In 17 subjects, highly-accelerated non-contrast MRA was compared against electrocardiogram (ECG)-triggered contrast-enhanced MRA. Anonymized images were randomized for blinded review by two independent readers for image quality, artifact severity in 8 defined vessel segments and aortic dimensions in 6 standard sites. NC-MRA and CE-MRA were compared in terms of these measures using paired sample t and Wilcoxon tests. RESULTS The overall image quality (3.21±0.68 for NC-MRA vs. 3.12±0.71 for CE-MRA) and artifact (2.87±1.01 for NC-MRA vs. 2.92±0.87 for CE-MRA) scores were not significantly different, but there were significant differences for the great vessel and coronary artery origins. NC-MRA demonstrated significantly lower aortic diameter measurements compared to CE-MRA; however, this difference was not considered clinically relevant (>3 mm difference) for less than 12% of segments, most commonly at the sinotubular junction. Mean total scan time was significantly lower for NC-MRA compared to CE-MRA (18.2 ± 6.0s vs. 28.1 ± 5.4s, respectively; p < 0.05). CONCLUSION Single breath-hold NC-MRA is feasible and can be a useful alternative for evaluation and follow-up of thoracic aortic diseases. PMID:22147589
Benz, Matthias R; Bongartz, Georg; Froehlich, Johannes M; Winkel, David; Boll, Daniel T; Heye, Tobias
2018-07-01
The aim was to investigate the variation of the arterial input function (AIF) within and between various DCE MRI sequences. A dynamic flow-phantom and steady signal reference were scanned on a 3T MRI using fast low angle shot (FLASH) 2d, FLASH3d (parallel imaging factor (P) = P0, P2, P4), volumetric interpolated breath-hold examination (VIBE) (P = P0, P3, P2 × 2, P2 × 3, P3 × 2), golden-angle radial sparse parallel imaging (GRASP), and time-resolved imaging with stochastic trajectories (TWIST). Signal over time curves were normalized and quantitatively analyzed by full width half maximum (FWHM) measurements to assess variation within and between sequences. The coefficient of variation (CV) for the steady signal reference ranged from 0.07-0.8%. The non-accelerated gradient echo FLASH2d, FLASH3d, and VIBE sequences showed low within sequence variation with 2.1%, 1.0%, and 1.6%. The maximum FWHM CV was 3.2% for parallel imaging acceleration (VIBE P2 × 3), 2.7% for GRASP and 9.1% for TWIST. The FWHM CV between sequences ranged from 8.5-14.4% for most non-accelerated/accelerated gradient echo sequences except 6.2% for FLASH3d P0 and 0.3% for FLASH3d P2; GRASP FWHM CV was 9.9% versus 28% for TWIST. MRI acceleration techniques vary in reproducibility and quantification of the AIF. Incomplete coverage of the k-space with TWIST as a representative of view-sharing techniques showed the highest variation within sequences and might be less suited for reproducible quantification of the AIF. Copyright © 2018 Elsevier B.V. All rights reserved.
Single-Shot MR Spectroscopic Imaging with Partial Parallel Imaging
Posse, Stefan; Otazo, Ricardo; Tsai, Shang-Yueh; Yoshimoto, Akio Ernesto; Lin, Fa-Hsuan
2010-01-01
An MR spectroscopic imaging (MRSI) pulse sequence based on Proton-Echo-Planar-Spectroscopic-Imaging (PEPSI) is introduced that measures 2-dimensional metabolite maps in a single excitation. Echo-planar spatial-spectral encoding was combined with interleaved phase encoding and parallel imaging using SENSE to reconstruct absorption mode spectra. The symmetrical k-space trajectory compensates phase errors due to convolution of spatial and spectral encoding. Single-shot MRSI at short TE was evaluated in phantoms and in vivo on a 3 T whole body scanner equipped with 12-channel array coil. Four-step interleaved phase encoding and 4-fold SENSE acceleration were used to encode a 16×16 spatial matrix with 390 Hz spectral width. Comparison with conventional PEPSI and PEPSI with 4-fold SENSE acceleration demonstrated comparable sensitivity per unit time when taking into account g-factor related noise increases and differences in sampling efficiency. LCModel fitting enabled quantification of Inositol, Choline, Creatine and NAA in vivo with concentration values in the ranges measured with conventional PEPSI and SENSE-accelerated PEPSI. Cramer-Rao lower bounds were comparable to those obtained with conventional SENSE-accelerated PEPSI at the same voxel size and measurement time. This single-shot MRSI method is therefore suitable for applications that require high temporal resolution to monitor temporal dynamics or to reduce sensitivity to tissue movement. PMID:19097245
NASA Astrophysics Data System (ADS)
Lyu, Jingyuan; Nakarmi, Ukash; Zhang, Chaoyi; Ying, Leslie
2016-05-01
This paper presents a new approach to highly accelerated dynamic parallel MRI using low rank matrix completion, partial separability (PS) model. In data acquisition, k-space data is moderately randomly undersampled at the center kspace navigator locations, but highly undersampled at the outer k-space for each temporal frame. In reconstruction, the navigator data is reconstructed from undersampled data using structured low-rank matrix completion. After all the unacquired navigator data is estimated, the partial separable model is used to obtain partial k-t data. Then the parallel imaging method is used to acquire the entire dynamic image series from highly undersampled data. The proposed method has shown to achieve high quality reconstructions with reduction factors up to 31, and temporal resolution of 29ms, when the conventional PS method fails.
Calibrationless parallel magnetic resonance imaging: a joint sparsity model.
Majumdar, Angshul; Chaudhury, Kunal Narayan; Ward, Rabab
2013-12-05
State-of-the-art parallel MRI techniques either explicitly or implicitly require certain parameters to be estimated, e.g., the sensitivity map for SENSE, SMASH and interpolation weights for GRAPPA, SPIRiT. Thus all these techniques are sensitive to the calibration (parameter estimation) stage. In this work, we have proposed a parallel MRI technique that does not require any calibration but yields reconstruction results that are at par with (or even better than) state-of-the-art methods in parallel MRI. Our proposed method required solving non-convex analysis and synthesis prior joint-sparsity problems. This work also derives the algorithms for solving them. Experimental validation was carried out on two datasets-eight channel brain and eight channel Shepp-Logan phantom. Two sampling methods were used-Variable Density Random sampling and non-Cartesian Radial sampling. For the brain data, acceleration factor of 4 was used and for the other an acceleration factor of 6 was used. The reconstruction results were quantitatively evaluated based on the Normalised Mean Squared Error between the reconstructed image and the originals. The qualitative evaluation was based on the actual reconstructed images. We compared our work with four state-of-the-art parallel imaging techniques; two calibrated methods-CS SENSE and l1SPIRiT and two calibration free techniques-Distributed CS and SAKE. Our method yields better reconstruction results than all of them.
NASA Technical Reports Server (NTRS)
Diner, Daniel B. (Inventor)
1991-01-01
Methods for providing stereoscopic image presentation and stereoscopic configurations using stereoscopic viewing systems having converged or parallel cameras may be set up to reduce or eliminate erroneously perceived accelerations and decelerations by proper selection of parameters, such as an image magnification factor, q, and intercamera distance, 2w. For converged cameras, q is selected to be equal to Ve - qwl = 0, where V is the camera distance, e is half the interocular distance of an observer, w is half the intercamera distance, and l is the actual distance from the first nodal point of each camera to the convergence point, and for parallel cameras, q is selected to be equal to e/w. While converged cameras cannot be set up to provide fully undistorted three-dimensional views, they can be set up to provide a linear relationship between real and apparent depth and thus minimize erroneously perceived accelerations and decelerations for three sagittal planes, x = -w, x = 0, and x = +w which are indicated to the observer. Parallel cameras can be set up to provide fully undistorted three-dimensional views by controlling the location of the observer and by magnification and shifting of left and right images. In addition, the teachings of this disclosure can be used to provide methods of stereoscopic image presentation and stereoscopic camera configurations to produce a nonlinear relation between perceived and real depth, and erroneously produce or enhance perceived accelerations and decelerations in order to provide special effects for entertainment, training, or educational purposes.
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-06-01
We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
GPU accelerated fuzzy connected image segmentation by using CUDA.
Zhuge, Ying; Cao, Yong; Miller, Robert W
2009-01-01
Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present a parallel fuzzy connected image segmentation algorithm on Nvidia's Compute Unified Device Architecture (CUDA) platform for segmenting large medical image data sets. Our experiments based on three data sets with small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 7.2x, 7.3x, and 14.4x, correspondingly, for the three data sets over the sequential implementation of fuzzy connected image segmentation algorithm on CPU.
Single-shot magnetic resonance spectroscopic imaging with partial parallel imaging.
Posse, Stefan; Otazo, Ricardo; Tsai, Shang-Yueh; Yoshimoto, Akio Ernesto; Lin, Fa-Hsuan
2009-03-01
A magnetic resonance spectroscopic imaging (MRSI) pulse sequence based on proton-echo-planar-spectroscopic-imaging (PEPSI) is introduced that measures two-dimensional metabolite maps in a single excitation. Echo-planar spatial-spectral encoding was combined with interleaved phase encoding and parallel imaging using SENSE to reconstruct absorption mode spectra. The symmetrical k-space trajectory compensates phase errors due to convolution of spatial and spectral encoding. Single-shot MRSI at short TE was evaluated in phantoms and in vivo on a 3-T whole-body scanner equipped with a 12-channel array coil. Four-step interleaved phase encoding and fourfold SENSE acceleration were used to encode a 16 x 16 spatial matrix with a 390-Hz spectral width. Comparison with conventional PEPSI and PEPSI with fourfold SENSE acceleration demonstrated comparable sensitivity per unit time when taking into account g-factor-related noise increases and differences in sampling efficiency. LCModel fitting enabled quantification of inositol, choline, creatine, and N-acetyl-aspartate (NAA) in vivo with concentration values in the ranges measured with conventional PEPSI and SENSE-accelerated PEPSI. Cramer-Rao lower bounds were comparable to those obtained with conventional SENSE-accelerated PEPSI at the same voxel size and measurement time. This single-shot MRSI method is therefore suitable for applications that require high temporal resolution to monitor temporal dynamics or to reduce sensitivity to tissue movement.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.
2014-01-01
The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-01-01
We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Parallel imaging of knee cartilage at 3 Tesla.
Zuo, Jin; Li, Xiaojuan; Banerjee, Suchandrima; Han, Eric; Majumdar, Sharmila
2007-10-01
To evaluate the feasibility and reproducibility of quantitative cartilage imaging with parallel imaging at 3T and to determine the impact of the acceleration factor (AF) on morphological and relaxation measurements. An eight-channel phased-array knee coil was employed for conventional and parallel imaging on a 3T scanner. The imaging protocol consisted of a T2-weighted fast spin echo (FSE), a 3D-spoiled gradient echo (SPGR), a custom 3D-SPGR T1rho, and a 3D-SPGR T2 sequence. Parallel imaging was performed with an array spatial sensitivity technique (ASSET). The left knees of six healthy volunteers were scanned with both conventional and parallel imaging (AF = 2). Morphological parameters and relaxation maps from parallel imaging methods (AF = 2) showed comparable results with conventional method. The intraclass correlation coefficient (ICC) of the two methods for cartilage volume, mean cartilage thickness, T1rho, and T2 were 0.999, 0.977, 0.964, and 0.969, respectively, while demonstrating excellent reproducibility. No significant measurement differences were found when AF reached 3 despite the low signal-to-noise ratio (SNR). The study demonstrated that parallel imaging can be applied to current knee cartilage quantification at AF = 2 without degrading measurement accuracy with good reproducibility while effectively reducing scan time. Shorter imaging times can be achieved with higher AF at the cost of SNR. (c) 2007 Wiley-Liss, Inc.
Evaluation of slice accelerations using multiband echo planar imaging at 3 Tesla
Xu, Junqian; Moeller, Steen; Auerbach, Edward J.; Strupp, John; Smith, Stephen M.; Feinberg, David A.; Yacoub, Essa; Uğurbil, Kâmil
2013-01-01
We evaluate residual aliasing among simultaneously excited and acquired slices in slice accelerated multiband (MB) echo planar imaging (EPI). No in-plane accelerations were used in order to maximize and evaluate achievable slice acceleration factors at 3 Tesla. We propose a novel leakage (L-) factor to quantify the effects of signal leakage between simultaneously acquired slices. With a standard 32-channel receiver coil at 3 Tesla, we demonstrate that slice acceleration factors of up to eight (MB = 8) with blipped controlled aliasing in parallel imaging (CAIPI), in the absence of in-plane accelerations, can be used routinely with acceptable image quality and integrity for whole brain imaging. Spectral analyses of single-shot fMRI time series demonstrate that temporal fluctuations due to both neuronal and physiological sources were distinguishable and comparable up to slice-acceleration factors of nine (MB = 9). The increased temporal efficiency could be employed to achieve, within a given acquisition period, higher spatial resolution, increased fMRI statistical power, multiple TEs, faster sampling of temporal events in a resting state fMRI time series, increased sampling of q-space in diffusion imaging, or more quiet time during a scan. PMID:23899722
Medical image processing on the GPU - past, present and future.
Eklund, Anders; Dufort, Paul; Forsberg, Daniel; LaConte, Stephen M
2013-12-01
Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing, are affordable and energy efficient. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms. This review presents the past and present work on GPU accelerated medical image processing, and is meant to serve as an overview and introduction to existing GPU implementations. The review covers GPU acceleration of basic image processing operations (filtering, interpolation, histogram estimation and distance transforms), the most commonly used algorithms in medical imaging (image registration, image segmentation and image denoising) and algorithms that are specific to individual modalities (CT, PET, SPECT, MRI, fMRI, DTI, ultrasound, optical imaging and microscopy). The review ends by highlighting some future possibilities and challenges. Copyright © 2013 Elsevier B.V. All rights reserved.
Kadlecek, Stephen; Hamedani, Hooman; Xu, Yinan; Emami, Kiarash; Xin, Yi; Ishii, Masaru; Rizi, Rahim
2013-10-01
Alveolar oxygen tension (Pao2) is sensitive to the interplay between local ventilation, perfusion, and alveolar-capillary membrane permeability, and thus reflects physiologic heterogeneity of healthy and diseased lung function. Several hyperpolarized helium ((3)He) magnetic resonance imaging (MRI)-based Pao2 mapping techniques have been reported, and considerable effort has gone toward reducing Pao2 measurement error. We present a new Pao2 imaging scheme, using parallel accelerated MRI, which significantly reduces measurement error. The proposed Pao2 mapping scheme was computer-simulated and was tested on both phantoms and five human subjects. Where possible, correspondence between actual local oxygen concentration and derived values was assessed for both bias (deviation from the true mean) and imaging artifact (deviation from the true spatial distribution). Phantom experiments demonstrated a significantly reduced coefficient of variation using the accelerated scheme. Simulation results support this observation and predict that correspondence between the true spatial distribution and the derived map is always superior using the accelerated scheme, although the improvement becomes less significant as the signal-to-noise ratio increases. Paired measurements in the human subjects, comparing accelerated and fully sampled schemes, show a reduced Pao2 distribution width for 41 of 46 slices. In contrast to proton MRI, acceleration of hyperpolarized imaging has no signal-to-noise penalty; its use in Pao2 measurement is therefore always beneficial. Comparison of multiple schemes shows that the benefit arises from a longer time-base during which oxygen-induced depolarization modifies the signal strength. Demonstration of the accelerated technique in human studies shows the feasibility of the method and suggests that measurement error is reduced here as well, particularly at low signal-to-noise levels. Copyright © 2013 AUR. Published by Elsevier Inc. All rights reserved.
Accelerated Fractional Ventilation Imaging with Hyperpolarized Gas MRI
Emami, Kiarash; Xu, Yinan; Hamedani, Hooman; Profka, Harrilla; Kadlecek, Stephen; Xin, Yi; Ishii, Masaru; Rizi, Rahim R.
2013-01-01
PURPOSE To investigate the utility of accelerated imaging to enhance multi-breath fractional ventilation (r) measurement accuracy using HP gas MRI. Undersampling shortens the breath-hold time, thereby reducing the O2-induced signal decay and allows subjects to maintain a more physiologically relevant breathing pattern. Additionally it may improve r estimation accuracy by reducing RF destruction of HP gas. METHODS Image acceleration was achieved by using an 8-channel phased array coil. Undersampled image acquisition was simulated in a series of ventilation images and images were reconstructed for various matrix sizes (48–128) using GRAPPA. Parallel accelerated r imaging was also performed on five mechanically ventilated pigs. RESULTS Optimal acceleration factor was fairly invariable (2.0–2.2×) over the range of simulated resolutions. Estimation accuracy progressively improved with higher resolutions (39–51% error reduction). In vivo r values were not significantly different between the two methods: 0.27±0.09, 0.35±0.06, 0.40±0.04 (standard) versus 0.23±0.05, 0.34±0.03, 0.37±0.02 (accelerated); for anterior, medial and posterior slices, respectively, whereas the corresponding vertical r gradients were significant (P < 0.001): 0.021±0.007 (standard) versus 0.019±0.005 (accelerated) [cm−1]. CONCLUSION Quadruple phased array coil simulations resulted in an optimal acceleration factor of ~2× independent of imaging resolution. Results advocate undersampled image acceleration to improve accuracy of fractional ventilation measurement with HP gas MRI. PMID:23400938
Bayer image parallel decoding based on GPU
NASA Astrophysics Data System (ADS)
Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua
2012-11-01
In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Jeong, Kyeong-Min; Kim, Hee-Seung; Hong, Sung-In; Lee, Sung-Keun; Jo, Na-Young; Kim, Yong-Soo; Lim, Hong-Gi; Park, Jae-Hyeung
2012-10-08
Speed enhancement of integral imaging based incoherent Fourier hologram capture using a graphic processing unit is reported. Integral imaging based method enables exact hologram capture of real-existing three-dimensional objects under regular incoherent illumination. In our implementation, we apply parallel computation scheme using the graphic processing unit, accelerating the processing speed. Using enhanced speed of hologram capture, we also implement a pseudo real-time hologram capture and optical reconstruction system. The overall operation speed is measured to be 1 frame per second.
Hingerl, Lukas; Moser, Philipp; Považan, Michal; Hangel, Gilbert; Heckova, Eva; Gruber, Stephan; Trattnig, Siegfried; Strasser, Bernhard
2017-01-01
Purpose Full‐slice magnetic resonance spectroscopic imaging at ≥7 T is especially vulnerable to lipid contaminations arising from regions close to the skull. This contamination can be mitigated by improving the point spread function via higher spatial resolution sampling and k‐space filtering, but this prolongs scan times and reduces the signal‐to‐noise ratio (SNR) efficiency. Currently applied parallel imaging methods accelerate magnetic resonance spectroscopic imaging scans at 7T, but increase lipid artifacts and lower SNR‐efficiency further. In this study, we propose an SNR‐efficient spatial‐spectral sampling scheme using concentric circle echo planar trajectories (CONCEPT), which was adapted to intrinsically acquire a Hamming‐weighted k‐space, thus termed density‐weighted‐CONCEPT. This minimizes voxel bleeding, while preserving an optimal SNR. Theory and Methods Trajectories were theoretically derived and verified in phantoms as well as in the human brain via measurements of five volunteers (single‐slice, field‐of‐view 220 × 220 mm2, matrix 64 × 64, scan time 6 min) with free induction decay magnetic resonance spectroscopic imaging. Density‐weighted‐CONCEPT was compared to (a) the originally proposed CONCEPT with equidistant circles (here termed e‐CONCEPT), (b) elliptical phase‐encoding, and (c) 5‐fold Controlled Aliasing In Parallel Imaging Results IN Higher Acceleration accelerated elliptical phase‐encoding. Results By intrinsically sampling a Hamming‐weighted k‐space, density‐weighted‐CONCEPT removed Gibbs‐ringing artifacts and had in vivo +9.5%, +24.4%, and +39.7% higher SNR than e‐CONCEPT, elliptical phase‐encoding, and the Controlled Aliasing In Parallel Imaging Results IN Higher Acceleration accelerated elliptical phase‐encoding (all P < 0.05), respectively, which lead to improved metabolic maps. Conclusion Density‐weighted‐CONCEPT provides clinically attractive full‐slice high‐resolution magnetic resonance spectroscopic imaging with optimal SNR at 7T. Magn Reson Med 79:2874–2885, 2018. © 2017 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. PMID:29106742
Kamesh Iyer, Srikant; Tasdizen, Tolga; Burgon, Nathan; Kholmovski, Eugene; Marrouche, Nassir; Adluru, Ganesh; DiBella, Edward
2016-09-01
Current late gadolinium enhancement (LGE) imaging of left atrial (LA) scar or fibrosis is relatively slow and requires 5-15min to acquire an undersampled (R=1.7) 3D navigated dataset. The GeneRalized Autocalibrating Partially Parallel Acquisitions (GRAPPA) based parallel imaging method is the current clinical standard for accelerating 3D LGE imaging of the LA and permits an acceleration factor ~R=1.7. Two compressed sensing (CS) methods have been developed to achieve higher acceleration factors: a patch based collaborative filtering technique tested with acceleration factor R~3, and a technique that uses a 3D radial stack-of-stars acquisition pattern (R~1.8) with a 3D total variation constraint. The long reconstruction time of these CS methods makes them unwieldy to use, especially the patch based collaborative filtering technique. In addition, the effect of CS techniques on the quantification of percentage of scar/fibrosis is not known. We sought to develop a practical compressed sensing method for imaging the LA at high acceleration factors. In order to develop a clinically viable method with short reconstruction time, a Split Bregman (SB) reconstruction method with 3D total variation (TV) constraints was developed and implemented. The method was tested on 8 atrial fibrillation patients (4 pre-ablation and 4 post-ablation datasets). Blur metric, normalized mean squared error and peak signal to noise ratio were used as metrics to analyze the quality of the reconstructed images, Quantification of the extent of LGE was performed on the undersampled images and compared with the fully sampled images. Quantification of scar from post-ablation datasets and quantification of fibrosis from pre-ablation datasets showed that acceleration factors up to R~3.5 gave good 3D LGE images of the LA wall, using a 3D TV constraint and constrained SB methods. This corresponds to reducing the scan time by half, compared to currently used GRAPPA methods. Reconstruction of 3D LGE images using the SB method was over 20 times faster than standard gradient descent methods. Copyright © 2016 Elsevier Inc. All rights reserved.
Liao, Congyu; Chen, Ying; Cao, Xiaozhi; Chen, Song; He, Hongjian; Mani, Merry; Jacob, Mathews; Magnotta, Vincent; Zhong, Jianhui
2017-03-01
To propose a novel reconstruction method using parallel imaging with low rank constraint to accelerate high resolution multishot spiral diffusion imaging. The undersampled high resolution diffusion data were reconstructed based on a low rank (LR) constraint using similarities between the data of different interleaves from a multishot spiral acquisition. The self-navigated phase compensation using the low resolution phase data in the center of k-space was applied to correct shot-to-shot phase variations induced by motion artifacts. The low rank reconstruction was combined with sensitivity encoding (SENSE) for further acceleration. The efficiency of the proposed joint reconstruction framework, dubbed LR-SENSE, was evaluated through error quantifications and compared with ℓ1 regularized compressed sensing method and conventional iterative SENSE method using the same datasets. It was shown that with a same acceleration factor, the proposed LR-SENSE method had the smallest normalized sum-of-squares errors among all the compared methods in all diffusion weighted images and DTI-derived index maps, when evaluated with different acceleration factors (R = 2, 3, 4) and for all the acquired diffusion directions. Robust high resolution diffusion weighted image can be efficiently reconstructed from highly undersampled multishot spiral data with the proposed LR-SENSE method. Magn Reson Med 77:1359-1366, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Huellner, Martin W; Appenzeller, Philippe; Kuhn, Félix P; Husmann, Lars; Pietsch, Carsten M; Burger, Irene A; Porto, Miguel; Delso, Gaspar; von Schulthess, Gustav K; Veit-Haibach, Patrick
2014-12-01
To assess the diagnostic performance of whole-body non-contrast material-enhanced positron emission tomography (PET)/magnetic resonance (MR) imaging and PET/computed tomography (CT) for staging and restaging of cancers and provide guidance for modality and sequence selection. This study was approved by the institutional review board and national government authorities. One hundred six consecutive patients (median age, 68 years; 46 female and 60 male patients) referred for staging or restaging of oncologic malignancies underwent whole-body imaging with a sequential trimodality PET/CT/MR system. The MR protocol included short inversion time inversion-recovery ( STIR short inversion time inversion-recovery ), Dixon-type liver accelerated volume acquisition ( LAVA liver accelerated volume acquisition ; GE Healthcare, Waukesha, Wis), and respiratory-gated periodically rotated overlapping parallel lines with enhanced reconstruction ( PROPELLER periodically rotated overlapping parallel lines with enhanced reconstruction ; GE Healthcare) sequences. Primary tumors (n = 43), local lymph node metastases (n = 74), and distant metastases (n = 66) were evaluated for conspicuity (scored 0-4), artifacts (scored 0-2), and reader confidence on PET/CT and PET/MR images. Subanalysis for lung lesions (n = 46) was also performed. Relevant incidental findings with both modalities were compared. Interreader agreement was analyzed with intraclass correlation coefficients and κ statistics. Lesion conspicuity, image artifacts, and incidental findings were analyzed with nonparametric tests. Primary tumors were less conspicuous on STIR short inversion time inversion-recovery (3.08, P = .016) and LAVA liver accelerated volume acquisition (2.64, P = .002) images than on CT images (3.49), while findings with the PROPELLER periodically rotated overlapping parallel lines with enhanced reconstruction sequence (3.70, P = .436) were comparable to those at CT. In distant metastases, the PROPELLER periodically rotated overlapping parallel lines with enhanced reconstruction sequence (3.84) yielded better results than CT (2.88, P < .001). Subanalysis for lung lesions yielded similar results (primary lung tumors: CT, 3.71; STIR short inversion time inversion-recovery , 3.32 [P = .014]; LAVA liver accelerated volume acquisition , 2.52 [P = .002]; PROPELLER periodically rotated overlapping parallel lines with enhanced reconstruction , 3.64 [P = .546]). Readers classified lesions more confidently with PET/MR than PET/CT. However, PET/CT showed more incidental findings than PET/MR (P = .039), especially in the lung (P < .001). MR images had more artifacts than CT images. PET/MR performs comparably to PET/CT in whole-body oncology and neoplastic lung disease, with the use of appropriate sequences. Further studies are needed to define regionalized PET/MR protocols with sequences tailored to specific tumor entities. © RSNA, 2014 Online supplemental material is available for this article.
Interleaved EPI diffusion imaging using SPIRiT-based reconstruction with virtual coil compression.
Dong, Zijing; Wang, Fuyixue; Ma, Xiaodong; Zhang, Zhe; Dai, Erpeng; Yuan, Chun; Guo, Hua
2018-03-01
To develop a novel diffusion imaging reconstruction framework based on iterative self-consistent parallel imaging reconstruction (SPIRiT) for multishot interleaved echo planar imaging (iEPI), with computation acceleration by virtual coil compression. As a general approach for autocalibrating parallel imaging, SPIRiT improves the performance of traditional generalized autocalibrating partially parallel acquisitions (GRAPPA) methods in that the formulation with self-consistency is better conditioned, suggesting SPIRiT to be a better candidate in k-space-based reconstruction. In this study, a general SPIRiT framework is adopted to incorporate both coil sensitivity and phase variation information as virtual coils and then is applied to 2D navigated iEPI diffusion imaging. To reduce the reconstruction time when using a large number of coils and shots, a novel shot-coil compression method is proposed for computation acceleration in Cartesian sampling. Simulations and in vivo experiments were conducted to evaluate the performance of the proposed method. Compared with the conventional coil compression, the shot-coil compression achieved higher compression rates with reduced errors. The simulation and in vivo experiments demonstrate that the SPIRiT-based reconstruction outperformed the existing method, realigned GRAPPA, and provided superior images with reduced artifacts. The SPIRiT-based reconstruction with virtual coil compression is a reliable method for high-resolution iEPI diffusion imaging. Magn Reson Med 79:1525-1531, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Ardekani, Siamak; Selva, Luis; Sayre, James; Sinha, Usha
2006-11-01
Single-shot echo-planar based diffusion tensor imaging is prone to geometric and intensity distortions. Parallel imaging is a means of reducing these distortions while preserving spatial resolution. A quantitative comparison at 3 T of parallel imaging for diffusion tensor images (DTI) using k-space (generalized auto-calibrating partially parallel acquisitions; GRAPPA) and image domain (sensitivity encoding; SENSE) reconstructions at different acceleration factors, R, is reported here. Images were evaluated using 8 human subjects with repeated scans for 2 subjects to estimate reproducibility. Mutual information (MI) was used to assess the global changes in geometric distortions. The effects of parallel imaging techniques on random noise and reconstruction artifacts were evaluated by placing 26 regions of interest and computing the standard deviation of apparent diffusion coefficient and fractional anisotropy along with the error of fitting the data to the diffusion model (residual error). The larger positive values in mutual information index with increasing R values confirmed the anticipated decrease in distortions. Further, the MI index of GRAPPA sequences for a given R factor was larger than the corresponding mSENSE images. The residual error was lowest in the images acquired without parallel imaging and among the parallel reconstruction methods, the R = 2 acquisitions had the least error. The standard deviation, accuracy, and reproducibility of the apparent diffusion coefficient and fractional anisotropy in homogenous tissue regions showed that GRAPPA acquired with R = 2 had the least amount of systematic and random noise and of these, significant differences with mSENSE, R = 2 were found only for the fractional anisotropy index. Evaluation of the current implementation of parallel reconstruction algorithms identified GRAPPA acquired with R = 2 as optimal for diffusion tensor imaging.
Concentric Rings K-Space Trajectory for Hyperpolarized 13C MR Spectroscopic Imaging
Jiang, Wenwen; Lustig, Michael; Larson, Peder E.Z.
2014-01-01
Purpose To develop a robust and rapid imaging technique for hyperpolarized 13C MR Spectroscopic Imaging (MRSI) and investigate its performance. Methods A concentric rings readout trajectory with constant angular velocity is proposed for hyperpolarized 13C spectroscopic imaging and its properties are analyzed. Quantitative analyses of design tradeoffs are presented for several imaging scenarios. The first application of concentric rings on 13C phantoms and in vivo animal hyperpolarized 13C MRSI studies were performed to demonstrate the feasibility of the proposed method. Finally, a parallel imaging accelerated concentric rings study is presented. Results The concentric rings MRSI trajectory has the advantages of acquisition timesaving compared to echo-planar spectroscopic imaging (EPSI). It provides sufficient spectral bandwidth with relatively high SNR efficiency compared to EPSI and spiral techniques. Phantom and in vivo animal studies showed good image quality with half the scan time and reduced pulsatile flow artifacts compared to EPSI. Parallel imaging accelerated concentric rings showed advantages over Cartesian sampling in g-factor simulations and demonstrated aliasing-free image quality in a hyperpolarized 13C in vivo study. Conclusion The concentric rings trajectory is a robust and rapid imaging technique that fits very well with the speed, bandwidth, and resolution requirements of hyperpolarized 13C MRSI. PMID:25533653
Three Element Phased Array Coil for Imaging of Rat Spinal Cord at 7T
Mogatadakala, Kishore V.; Bankson, James A.; Narayana, Ponnada A.
2008-01-01
In order to overcome some of the limitations of an implantable coil, including its invasive nature and limited spatial coverage, a three element phased array coil is described for high resolution magnetic resonance imaging (MRI) of rat spinal cord. This coil allows imaging both thoracic and cervical segments of rat spinal cord. In the current design, coupling between the nearest neighbors was minimized by overlapping the coil elements. A simple capacitive network was used for decoupling the next neighbor elements. The dimensions of individual coils in the array were determined based on the signal-to-noise ratio (SNR) measurements performed on a phantom with three different surface coils. SNR measurements on a phantom demonstrated higher SNR of the phased array coil relative to two different volume coils. In-vivo images acquired on rat spinal cord with our coil demonstrated excellent gray and white matter contrast. To evaluate the performance of the phased array coil under parallel imaging, g-factor maps were obtained for two different acceleration factors of 2 and 3. These simulations indicate that parallel imaging with acceleration factor of 2 would be possible without significant image reconstruction related noise amplifications. PMID:19025892
Self-calibrated correlation imaging with k-space variant correlation functions.
Li, Yu; Edalati, Masoud; Du, Xingfu; Wang, Hui; Cao, Jie J
2018-03-01
Correlation imaging is a previously developed high-speed MRI framework that converts parallel imaging reconstruction into the estimate of correlation functions. The presented work aims to demonstrate this framework can provide a speed gain over parallel imaging by estimating k-space variant correlation functions. Because of Fourier encoding with gradients, outer k-space data contain higher spatial-frequency image components arising primarily from tissue boundaries. As a result of tissue-boundary sparsity in the human anatomy, neighboring k-space data correlation varies from the central to the outer k-space. By estimating k-space variant correlation functions with an iterative self-calibration method, correlation imaging can benefit from neighboring k-space data correlation associated with both coil sensitivity encoding and tissue-boundary sparsity, thereby providing a speed gain over parallel imaging that relies only on coil sensitivity encoding. This new approach is investigated in brain imaging and free-breathing neonatal cardiac imaging. Correlation imaging performs better than existing parallel imaging techniques in simulated brain imaging acceleration experiments. The higher speed enables real-time data acquisition for neonatal cardiac imaging in which physiological motion is fast and non-periodic. With k-space variant correlation functions, correlation imaging gives a higher speed than parallel imaging and offers the potential to image physiological motion in real-time. Magn Reson Med 79:1483-1494, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Parallel transformation of K-SVD solar image denoising algorithm
NASA Astrophysics Data System (ADS)
Liang, Youwen; Tian, Yu; Li, Mei
2017-02-01
The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
ACCELERATING MR PARAMETER MAPPING USING SPARSITY-PROMOTING REGULARIZATION IN PARAMETRIC DIMENSION
Velikina, Julia V.; Alexander, Andrew L.; Samsonov, Alexey
2013-01-01
MR parameter mapping requires sampling along additional (parametric) dimension, which often limits its clinical appeal due to a several-fold increase in scan times compared to conventional anatomic imaging. Data undersampling combined with parallel imaging is an attractive way to reduce scan time in such applications. However, inherent SNR penalties of parallel MRI due to noise amplification often limit its utility even at moderate acceleration factors, requiring regularization by prior knowledge. In this work, we propose a novel regularization strategy, which utilizes smoothness of signal evolution in the parametric dimension within compressed sensing framework (p-CS) to provide accurate and precise estimation of parametric maps from undersampled data. The performance of the method was demonstrated with variable flip angle T1 mapping and compared favorably to two representative reconstruction approaches, image space-based total variation regularization and an analytical model-based reconstruction. The proposed p-CS regularization was found to provide efficient suppression of noise amplification and preservation of parameter mapping accuracy without explicit utilization of analytical signal models. The developed method may facilitate acceleration of quantitative MRI techniques that are not suitable to model-based reconstruction because of complex signal models or when signal deviations from the expected analytical model exist. PMID:23213053
Parallel computing in experimental mechanics and optical measurement: A review (II)
NASA Astrophysics Data System (ADS)
Wang, Tianyi; Kemao, Qian
2018-05-01
With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
Denoising Sparse Images from GRAPPA using the Nullspace Method (DESIGN)
Weller, Daniel S.; Polimeni, Jonathan R.; Grady, Leo; Wald, Lawrence L.; Adalsteinsson, Elfar; Goyal, Vivek K
2011-01-01
To accelerate magnetic resonance imaging using uniformly undersampled (nonrandom) parallel imaging beyond what is achievable with GRAPPA alone, the Denoising of Sparse Images from GRAPPA using the Nullspace method (DESIGN) is developed. The trade-off between denoising and smoothing the GRAPPA solution is studied for different levels of acceleration. Several brain images reconstructed from uniformly undersampled k-space data using DESIGN are compared against reconstructions using existing methods in terms of difference images (a qualitative measure), PSNR, and noise amplification (g-factors) as measured using the pseudo-multiple replica method. Effects of smoothing, including contrast loss, are studied in synthetic phantom data. In the experiments presented, the contrast loss and spatial resolution are competitive with existing methods. Results for several brain images demonstrate significant improvements over GRAPPA at high acceleration factors in denoising performance with limited blurring or smoothing artifacts. In addition, the measured g-factors suggest that DESIGN mitigates noise amplification better than both GRAPPA and L1 SPIR-iT (the latter limited here by uniform undersampling). PMID:22213069
Li, Mingyan; Zuo, Zhentao; Jin, Jin; Xue, Rong; Trakic, Adnan; Weber, Ewald; Liu, Feng; Crozier, Stuart
2014-03-01
Parallel imaging (PI) is widely used for imaging acceleration by means of coil spatial sensitivities associated with phased array coils (PACs). By employing a time-division multiplexing technique, a single-channel rotating radiofrequency coil (RRFC) provides an alternative method to reduce scan time. Strategically combining these two concepts could provide enhanced acceleration and efficiency. In this work, the imaging acceleration ability and homogeneous image reconstruction strategy of 4-element rotating radiofrequency coil array (RRFCA) was numerically investigated and experimental validated at 7T with a homogeneous phantom. Each coil of RRFCA was capable of acquiring a large number of sensitivity profiles, leading to a better acceleration performance illustrated by the improved geometry-maps that have lower maximum values and more uniform distributions compared to 4- and 8-element stationary arrays. A reconstruction algorithm, rotating SENSitivity Encoding (rotating SENSE), was proposed to provide image reconstruction. Additionally, by optimally choosing the angular sampling positions and transmit profiles under the rotating scheme, phantom images could be faithfully reconstructed. The results indicate that, the proposed technique is able to provide homogeneous reconstructions with overall higher and more uniform signal-to-noise ratio (SNR) distributions at high reduction factors. It is hoped that, by employing the high imaging acceleration and homogeneous imaging reconstruction ability of RRFCA, the proposed method will facilitate human imaging for ultra high field MRI. Copyright © 2013 Elsevier Inc. All rights reserved.
Hielscher, Andreas H; Bartel, Sebastian
2004-02-01
Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
GPU Accelerated Ultrasonic Tomography Using Propagation and Back Propagation Method
2015-09-28
the medical imaging field using GPUs has been done for many years. In [1], Copeland et al. used 2D images , obtained by X - ray projections, to...Index Terms— Medical Imaging , Ultrasonic Tomography, GPU, CUDA, Parallel Computing I. INTRODUCTION GRAPHIC Processing Units (GPUs) are computation... Imaging Algorithm The process of reconstructing images from ultrasonic infor- mation starts with the following acoustical wave equation: ∂2 ∂t2 u ( x
Performance enhancement of various real-time image processing techniques via speculative execution
NASA Astrophysics Data System (ADS)
Younis, Mohamed F.; Sinha, Purnendu; Marlowe, Thomas J.; Stoyenko, Alexander D.
1996-03-01
In real-time image processing, an application must satisfy a set of timing constraints while ensuring the semantic correctness of the system. Because of the natural structure of digital data, pure data and task parallelism have been used extensively in real-time image processing to accelerate the handling time of image data. These types of parallelism are based on splitting the execution load performed by a single processor across multiple nodes. However, execution of all parallel threads is mandatory for correctness of the algorithm. On the other hand, speculative execution is an optimistic execution of part(s) of the program based on assumptions on program control flow or variable values. Rollback may be required if the assumptions turn out to be invalid. Speculative execution can enhance average, and sometimes worst-case, execution time. In this paper, we target various image processing techniques to investigate applicability of speculative execution. We identify opportunities for safe and profitable speculative execution in image compression, edge detection, morphological filters, and blob recognition.
A survey of GPU-based acceleration techniques in MRI reconstructions
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou
2018-01-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community. PMID:29675361
A survey of GPU-based acceleration techniques in MRI reconstructions.
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou; Liang, Dong
2018-03-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community.
Watanabe, Yuuki; Takahashi, Yuhei; Numazawa, Hiroshi
2014-02-01
We demonstrate intensity-based optical coherence tomography (OCT) angiography using the squared difference of two sequential frames with bulk-tissue-motion (BTM) correction. This motion correction was performed by minimization of the sum of the pixel values using axial- and lateral-pixel-shifted structural OCT images. We extract the BTM-corrected image from a total of 25 calculated OCT angiographic images. Image processing was accelerated by a graphics processing unit (GPU) with many stream processors to optimize the parallel processing procedure. The GPU processing rate was faster than that of a line scan camera (46.9 kHz). Our OCT system provides the means of displaying structural OCT images and BTM-corrected OCT angiographic images in real time.
GPU-based parallel algorithm for blind image restoration using midfrequency-based methods
NASA Astrophysics Data System (ADS)
Xie, Lang; Luo, Yi-han; Bao, Qi-liang
2013-08-01
GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.
Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems
Wang, Kaibo; Huai, Yin; Lee, Rubao; Wang, Fusheng; Zhang, Xiaodong; Saltz, Joel H.
2012-01-01
As an important application of spatial databases in pathology imaging analysis, cross-comparing the spatial boundaries of a huge amount of segmented micro-anatomic objects demands extremely data- and compute-intensive operations, requiring high throughput at an affordable cost. However, the performance of spatial database systems has not been satisfactory since their implementations of spatial operations cannot fully utilize the power of modern parallel hardware. In this paper, we provide a customized software solution that exploits GPUs and multi-core CPUs to accelerate spatial cross-comparison in a cost-effective way. Our solution consists of an efficient GPU algorithm and a pipelined system framework with task migration support. Extensive experiments with real-world data sets demonstrate the effectiveness of our solution, which improves the performance of spatial cross-comparison by over 18 times compared with a parallelized spatial database approach. PMID:23355955
Tsai, Shang-Yueh; Otazo, Ricardo; Posse, Stefan; Lin, Yi-Ru; Chung, Hsiao-Wen; Wald, Lawrence L; Wiggins, Graham C; Lin, Fa-Hsuan
2008-05-01
Parallel imaging has been demonstrated to reduce the encoding time of MR spectroscopic imaging (MRSI). Here we investigate up to 5-fold acceleration of 2D proton echo planar spectroscopic imaging (PEPSI) at 3T using generalized autocalibrating partial parallel acquisition (GRAPPA) with a 32-channel coil array, 1.5 cm(3) voxel size, TR/TE of 15/2000 ms, and 2.1 Hz spectral resolution. Compared to an 8-channel array, the smaller RF coil elements in this 32-channel array provided a 3.1-fold and 2.8-fold increase in signal-to-noise ratio (SNR) in the peripheral region and the central region, respectively, and more spatial modulated information. Comparison of sensitivity-encoding (SENSE) and GRAPPA reconstruction using an 8-channel array showed that both methods yielded similar quantitative metabolite measures (P > 0.1). Concentration values of N-acetyl-aspartate (NAA), total creatine (tCr), choline (Cho), myo-inositol (mI), and the sum of glutamate and glutamine (Glx) for both methods were consistent with previous studies. Using the 32-channel array coil the mean Cramer-Rao lower bounds (CRLB) were less than 8% for NAA, tCr, and Cho and less than 15% for mI and Glx at 2-fold acceleration. At 4-fold acceleration the mean CRLB for NAA, tCr, and Cho was less than 11%. In conclusion, the use of a 32-channel coil array and GRAPPA reconstruction can significantly reduce the measurement time for mapping brain metabolites. (c) 2008 Wiley-Liss, Inc.
The Role of Nonlinear Gradients in Parallel Imaging: A k-Space Based Analysis.
Galiana, Gigi; Stockmann, Jason P; Tam, Leo; Peters, Dana; Tagare, Hemant; Constable, R Todd
2012-09-01
Sequences that encode the spatial information of an object using nonlinear gradient fields are a new frontier in MRI, with potential to provide lower peripheral nerve stimulation, windowed fields of view, tailored spatially-varying resolution, curved slices that mirror physiological geometry, and, most importantly, very fast parallel imaging with multichannel coils. The acceleration for multichannel images is generally explained by the fact that curvilinear gradient isocontours better complement the azimuthal spatial encoding provided by typical receiver arrays. However, the details of this complementarity have been more difficult to specify. We present a simple and intuitive framework for describing the mechanics of image formation with nonlinear gradients, and we use this framework to review some the main classes of nonlinear encoding schemes.
NASA Astrophysics Data System (ADS)
Xue, Xinwei; Cheryauka, Arvi; Tubbs, David
2006-03-01
CT imaging in interventional and minimally-invasive surgery requires high-performance computing solutions that meet operational room demands, healthcare business requirements, and the constraints of a mobile C-arm system. The computational requirements of clinical procedures using CT-like data are increasing rapidly, mainly due to the need for rapid access to medical imagery during critical surgical procedures. The highly parallel nature of Radon transform and CT algorithms enables embedded computing solutions utilizing a parallel processing architecture to realize a significant gain of computational intensity with comparable hardware and program coding/testing expenses. In this paper, using a sample 2D and 3D CT problem, we explore the programming challenges and the potential benefits of embedded computing using commodity hardware components. The accuracy and performance results obtained on three computational platforms: a single CPU, a single GPU, and a solution based on FPGA technology have been analyzed. We have shown that hardware-accelerated CT image reconstruction can be achieved with similar levels of noise and clarity of feature when compared to program execution on a CPU, but gaining a performance increase at one or more orders of magnitude faster. 3D cone-beam or helical CT reconstruction and a variety of volumetric image processing applications will benefit from similar accelerations.
NASA Astrophysics Data System (ADS)
Fiandrotti, Attilio; Fosson, Sophie M.; Ravazzi, Chiara; Magli, Enrico
2018-04-01
Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain.
Tao, Shengzhen; Trzasko, Joshua D; Shu, Yunhong; Weavers, Paul T; Huston, John; Gray, Erin M; Bernstein, Matt A
2016-06-01
To describe how integrated gradient nonlinearity (GNL) correction can be used within noniterative partial Fourier (homodyne) and parallel (SENSE and GRAPPA) MR image reconstruction strategies, and demonstrate that performing GNL correction during, rather than after, these routines mitigates the image blurring and resolution loss caused by postreconstruction image domain based GNL correction. Starting from partial Fourier and parallel magnetic resonance imaging signal models that explicitly account for GNL, noniterative image reconstruction strategies for each accelerated acquisition technique are derived under the same core mathematical assumptions as their standard counterparts. A series of phantom and in vivo experiments on retrospectively undersampled data were performed to investigate the spatial resolution benefit of integrated GNL correction over conventional postreconstruction correction. Phantom and in vivo results demonstrate that the integrated GNL correction reduces the image blurring introduced by the conventional GNL correction, while still correcting GNL-induced coarse-scale geometrical distortion. Images generated from undersampled data using the proposed integrated GNL strategies offer superior depiction of fine image detail, for example, phantom resolution inserts and anatomical tissue boundaries. Noniterative partial Fourier and parallel imaging reconstruction methods with integrated GNL correction reduce the resolution loss that occurs during conventional postreconstruction GNL correction while preserving the computational efficiency of standard reconstruction techniques. Magn Reson Med 75:2534-2544, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
UNFOLD-SENSE: a parallel MRI method with self-calibration and artifact suppression.
Madore, Bruno
2004-08-01
This work aims at improving the performance of parallel imaging by using it with our "unaliasing by Fourier-encoding the overlaps in the temporal dimension" (UNFOLD) temporal strategy. A self-calibration method called "self, hybrid referencing with UNFOLD and GRAPPA" (SHRUG) is presented. SHRUG combines the UNFOLD-based sensitivity mapping strategy introduced in the TSENSE method by Kellman et al. (5), with the strategy introduced in the GRAPPA method by Griswold et al. (10). SHRUG merges the two approaches to alleviate their respective limitations, and provides fast self-calibration at any given acceleration factor. UNFOLD-SENSE further includes an UNFOLD artifact suppression scheme to significantly suppress artifacts and amplified noise produced by parallel imaging. This suppression scheme, which was published previously (4), is related to another method that was presented independently as part of TSENSE. While the two are equivalent at accelerations < or = 2.0, the present approach is shown here to be significantly superior at accelerations > 2.0, with up to double the artifact suppression at high accelerations. Furthermore, a slight modification of Cartesian SENSE is introduced, which allows departures from purely Cartesian sampling grids. This technique, termed variable-density SENSE (vdSENSE), allows the variable-density data required by SHRUG to be reconstructed with the simplicity and fast processing of Cartesian SENSE. UNFOLD-SENSE is given by the combination of SHRUG for sensitivity mapping, vdSENSE for reconstruction, and UNFOLD for artifact/amplified noise suppression. The method was implemented, with online reconstruction, on both an SSFP and a myocardium-perfusion sequence. The results from six patients scanned with UNFOLD-SENSE are presented.
High resolution human diffusion tensor imaging using 2-D navigated multi-shot SENSE EPI at 7 Tesla
Jeong, Ha-Kyu; Gore, John C.; Anderson, Adam W.
2012-01-01
The combination of parallel imaging with partial Fourier acquisition has greatly improved the performance of diffusion-weighted single-shot EPI and is the preferred method for acquisitions at low to medium magnetic field strength such as 1.5 or 3 Tesla. Increased off-resonance effects and reduced transverse relaxation times at 7 Tesla, however, generate more significant artifacts than at lower magnetic field strength and limit data acquisition. Additional acceleration of k-space traversal using a multi-shot approach, which acquires a subset of k-space data after each excitation, reduces these artifacts relative to conventional single-shot acquisitions. However, corrections for motion-induced phase errors are not straightforward in accelerated, diffusion-weighted multi-shot EPI because of phase aliasing. In this study, we introduce a simple acquisition and corresponding reconstruction method for diffusion-weighted multi-shot EPI with parallel imaging suitable for use at high field. The reconstruction uses a simple modification of the standard SENSE algorithm to account for shot-to-shot phase errors; the method is called Image Reconstruction using Image-space Sampling functions (IRIS). Using this approach, reconstruction from highly aliased in vivo image data using 2-D navigator phase information is demonstrated for human diffusion-weighted imaging studies at 7 Tesla. The final reconstructed images show submillimeter in-plane resolution with no ghosts and much reduced blurring and off-resonance artifacts. PMID:22592941
NASA Astrophysics Data System (ADS)
Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie
2018-01-01
The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.
Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu
2015-07-01
Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Chemical Shift Encoded Water–Fat Separation Using Parallel Imaging and Compressed Sensing
Sharma, Samir D.; Hu, Houchun H.; Nayak, Krishna S.
2013-01-01
Chemical shift encoded techniques have received considerable attention recently because they can reliably separate water and fat in the presence of off-resonance. The insensitivity to off-resonance requires that data be acquired at multiple echo times, which increases the scan time as compared to a single echo acquisition. The increased scan time often requires that a compromise be made between the spatial resolution, the volume coverage, and the tolerance to artifacts from subject motion. This work describes a combined parallel imaging and compressed sensing approach for accelerated water–fat separation. In addition, the use of multiscale cubic B-splines for B0 field map estimation is introduced. The water and fat images and the B0 field map are estimated via an alternating minimization. Coil sensitivity information is derived from a calculated k-space convolution kernel and l1-regularization is imposed on the coil-combined water and fat image estimates. Uniform water–fat separation is demonstrated from retrospectively undersampled data in the liver, brachial plexus, ankle, and knee as well as from a prospectively undersampled acquisition of the knee at 8.6x acceleration. PMID:22505285
Coil Compression for Accelerated Imaging with Cartesian Sampling
Zhang, Tao; Pauly, John M.; Vasanawala, Shreyas S.; Lustig, Michael
2012-01-01
MRI using receiver arrays with many coil elements can provide high signal-to-noise ratio and increase parallel imaging acceleration. At the same time, the growing number of elements results in larger datasets and more computation in the reconstruction. This is of particular concern in 3D acquisitions and in iterative reconstructions. Coil compression algorithms are effective in mitigating this problem by compressing data from many channels into fewer virtual coils. In Cartesian sampling there often are fully sampled k-space dimensions. In this work, a new coil compression technique for Cartesian sampling is presented that exploits the spatially varying coil sensitivities in these non-subsampled dimensions for better compression and computation reduction. Instead of directly compressing in k-space, coil compression is performed separately for each spatial location along the fully-sampled directions, followed by an additional alignment process that guarantees the smoothness of the virtual coil sensitivities. This important step provides compatibility with autocalibrating parallel imaging techniques. Its performance is not susceptible to artifacts caused by a tight imaging fieldof-view. High quality compression of in-vivo 3D data from a 32 channel pediatric coil into 6 virtual coils is demonstrated. PMID:22488589
Weavers, Paul T; Borisch, Eric A; Hulshizer, Tom C; Rossman, Phillip J; Young, Phillip M; Johnson, Casey P; McKay, Jessica; Cline, Christopher C; Riederer, Stephen J
2016-04-01
Three-station stepping-table time-resolved 3D contrast-enhanced magnetic resonance angiography has conflicting demands in the need to limit acquisition time in proximal stations to match the speed of the advancing contrast bolus and in the distal-most station to avoid venous contamination while still providing clinically useful spatial resolution. This work describes improved receiver coil arrays which address this issue by allowing increased acceleration factors, providing increased spatial resolution per unit time. Receiver coil arrays were constructed for each station (pelvis, thigh, calf) and then integrated into a 48-element array for three-station peripheral CE-MRA. Coil element sizes and array configurations for these three stations were designed to improve SENSE-type parallel imaging taking advantage of an increase in coil count for all stations versus the previous 32 channel capability. At each station either acceleration apportionment or optimal CAIPIRINHA selection was used to choose the optimum acceleration parameters for each subject. Results were evaluated in both single- and multi-station studies. Single-station studies showed that SENSE acceleration in the thigh station could be readily increased from R=8 to R=10, allowing reduction of the frame time from 2.5 to 2.1 s to better image the typically rapidly advancing bolus at this station. Similarly, the improved coil array for the calf station permitted acceleration increase from R=8 to R=12, providing a 4.0 vs. 5.2 s frame time. Results in three-station studies suggest an improved ability to track the contrast bolus in peripheral CE-MRA. Modified receiver coil arrays and individualized parameter optimization have been used to provide improved acceleration at all stations in multi-station peripheral CE-MRA and provide high spatial resolution with frame times as short as 2.1 s. Copyright © 2015 Elsevier Inc. All rights reserved.
Parallel processing approach to transform-based image coding
NASA Astrophysics Data System (ADS)
Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.
1991-06-01
This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
Smith, Matthew R.; Artz, Nathan S.; Koch, Kevin M.; Samsonov, Alexey; Reeder, Scott B.
2014-01-01
Purpose To demonstrate feasibility of exploiting the spatial distribution of off-resonance surrounding metallic implants for accelerating multispectral imaging techniques. Theory Multispectral imaging (MSI) techniques perform time-consuming independent 3D acquisitions with varying RF frequency offsets to address the extreme off-resonance from metallic implants. Each off-resonance bin provides a unique spatial sensitivity that is analogous to the sensitivity of a receiver coil, and therefore provides a unique opportunity for acceleration. Methods Fully sampled MSI was performed to demonstrate retrospective acceleration. A uniform sampling pattern across off-resonance bins was compared to several adaptive sampling strategies using a total hip replacement phantom. Monte Carlo simulations were performed to compare noise propagation of two of these strategies. With a total knee replacement phantom, positive and negative off-resonance bins were strategically sampled with respect to the B0 field to minimize aliasing. Reconstructions were performed with a parallel imaging framework to demonstrate retrospective acceleration. Results An adaptive sampling scheme dramatically improved reconstruction quality, which was supported by the noise propagation analysis. Independent acceleration of negative and positive off-resonance bins demonstrated reduced overlapping of aliased signal to improve the reconstruction. Conclusion This work presents the feasibility of acceleration in the presence of metal by exploiting the spatial sensitivities of off-resonance bins. PMID:24431210
NASA Astrophysics Data System (ADS)
O'Connor, A. S.; Justice, B.; Harris, A. T.
2013-12-01
Graphics Processing Units (GPUs) are high-performance multiple-core processors capable of very high computational speeds and large data throughput. Modern GPUs are inexpensive and widely available commercially. These are general-purpose parallel processors with support for a variety of programming interfaces, including industry standard languages such as C. GPU implementations of algorithms that are well suited for parallel processing can often achieve speedups of several orders of magnitude over optimized CPU codes. Significant improvements in speeds for imagery orthorectification, atmospheric correction, target detection and image transformations like Independent Components Analsyis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x - 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA based GPU processing frame work for accelerating ENVI and IDL processes that can best take advantage of parallelization. Testing Exelis VIS has performed shows that orthorectification can take as long as two hours with a WorldView1 35,0000 x 35,000 pixel image. With GPU orthorecification, the same orthorectification process takes three minutes. By speeding up image processing, imagery can successfully be used by first responders, scientists making rapid discoveries with near real time data, and provides an operational component to data centers needing to quickly process and disseminate data.
Benkert, Thomas; Tian, Ye; Huang, Chenchan; DiBella, Edward V R; Chandarana, Hersh; Feng, Li
2018-07-01
Golden-angle radial sparse parallel (GRASP) MRI reconstruction requires gridding and regridding to transform data between radial and Cartesian k-space. These operations are repeatedly performed in each iteration, which makes the reconstruction computationally demanding. This work aimed to accelerate GRASP reconstruction using self-calibrating GRAPPA operator gridding (GROG) and to validate its performance in clinical imaging. GROG is an alternative gridding approach based on parallel imaging, in which k-space data acquired on a non-Cartesian grid are shifted onto a Cartesian k-space grid using information from multicoil arrays. For iterative non-Cartesian image reconstruction, GROG is performed only once as a preprocessing step. Therefore, the subsequent iterative reconstruction can be performed directly in Cartesian space, which significantly reduces computational burden. Here, a framework combining GROG with GRASP (GROG-GRASP) is first optimized and then compared with standard GRASP reconstruction in 22 prostate patients. GROG-GRASP achieved approximately 4.2-fold reduction in reconstruction time compared with GRASP (∼333 min versus ∼78 min) while maintaining image quality (structural similarity index ≈ 0.97 and root mean square error ≈ 0.007). Visual image quality assessment by two experienced radiologists did not show significant differences between the two reconstruction schemes. With a graphics processing unit implementation, image reconstruction time can be further reduced to approximately 14 min. The GRASP reconstruction can be substantially accelerated using GROG. This framework is promising toward broader clinical application of GRASP and other iterative non-Cartesian reconstruction methods. Magn Reson Med 80:286-293, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
NASA Astrophysics Data System (ADS)
Yu, H.; Wang, Z.; Zhang, C.; Chen, N.; Zhao, Y.; Sawchuk, A. P.; Dalsing, M. C.; Teague, S. D.; Cheng, Y.
2014-11-01
Existing research of patient-specific computational hemodynamics (PSCH) heavily relies on software for anatomical extraction of blood arteries. Data reconstruction and mesh generation have to be done using existing commercial software due to the gap between medical image processing and CFD, which increases computation burden and introduces inaccuracy during data transformation thus limits the medical applications of PSCH. We use lattice Boltzmann method (LBM) to solve the level-set equation over an Eulerian distance field and implicitly and dynamically segment the artery surfaces from radiological CT/MRI imaging data. The segments seamlessly feed to the LBM based CFD computation of PSCH thus explicit mesh construction and extra data management are avoided. The LBM is ideally suited for GPU (graphic processing unit)-based parallel computing. The parallel acceleration over GPU achieves excellent performance in PSCH computation. An application study will be presented which segments an aortic artery from a chest CT dataset and models PSCH of the segmented artery.
2017-04-13
modelling code, a parallel benchmark , and a communication avoiding version of the QR algorithm. Further, several improvements to the OmpSs model were...movement; and a port of the dynamic load balancing library to OmpSs. Finally, several updates to the tools infrastructure were accomplished, including: an...OmpSs: a basic algorithm on image processing applications, a mini application representative of an ocean modelling code, a parallel benchmark , and a
Sharif, Behzad; Derbyshire, J. Andrew; Faranesh, Anthony Z.; Bresler, Yoram
2010-01-01
MR imaging of the human heart without explicit cardiac synchronization promises to extend the applicability of cardiac MR to a larger patient population and potentially expand its diagnostic capabilities. However, conventional non-gated imaging techniques typically suffer from low image quality or inadequate spatio-temporal resolution and fidelity. Patient-Adaptive Reconstruction and Acquisition in Dynamic Imaging with Sensitivity Encoding (PARADISE) is a highly-accelerated non-gated dynamic imaging method that enables artifact-free imaging with high spatio-temporal resolutions by utilizing novel computational techniques to optimize the imaging process. In addition to using parallel imaging, the method gains acceleration from a physiologically-driven spatio-temporal support model; hence, it is doubly accelerated. The support model is patient-adaptive, i.e., its geometry depends on dynamics of the imaged slice, e.g., subject’s heart-rate and heart location within the slice. The proposed method is also doubly adaptive as it adapts both the acquisition and reconstruction schemes. Based on the theory of time-sequential sampling, the proposed framework explicitly accounts for speed limitations of gradient encoding and provides performance guarantees on achievable image quality. The presented in-vivo results demonstrate the effectiveness and feasibility of the PARADISE method for high resolution non-gated cardiac MRI during a short breath-hold. PMID:20665794
Localized Spatio-Temporal Constraints for Accelerated CMR Perfusion
Akçakaya, Mehmet; Basha, Tamer A.; Pflugi, Silvio; Foppa, Murilo; Kissinger, Kraig V.; Hauser, Thomas H.; Nezafat, Reza
2013-01-01
Purpose To develop and evaluate an image reconstruction technique for cardiac MRI (CMR)perfusion that utilizes localized spatio-temporal constraints. Methods CMR perfusion plays an important role in detecting myocardial ischemia in patients with coronary artery disease. Breath-hold k-t based image acceleration techniques are typically used in CMR perfusion for superior spatial/temporal resolution, and improved coverage. In this study, we propose a novel compressed sensing based image reconstruction technique for CMR perfusion, with applicability to free-breathing examinations. This technique uses local spatio-temporal constraints by regularizing image patches across a small number of dynamics. The technique is compared to conventional dynamic-by-dynamic reconstruction, and sparsity regularization using a temporal principal-component (pc) basis, as well as zerofilled data in multi-slice 2D and 3D CMR perfusion. Qualitative image scores are used (1=poor, 4=excellent) to evaluate the technique in 3D perfusion in 10 patients and 5 healthy subjects. On 4 healthy subjects, the proposed technique was also compared to a breath-hold multi-slice 2D acquisition with parallel imaging in terms of signal intensity curves. Results The proposed technique results in images that are superior in terms of spatial and temporal blurring compared to the other techniques, even in free-breathing datasets. The image scores indicate a significant improvement compared to other techniques in 3D perfusion (2.8±0.5 vs. 2.3±0.5 for x-pc regularization, 1.7±0.5 for dynamic-by-dynamic, 1.1±0.2 for zerofilled). Signal intensity curves indicate similar dynamics of uptake between the proposed method with a 3D acquisition and the breath-hold multi-slice 2D acquisition with parallel imaging. Conclusion The proposed reconstruction utilizes sparsity regularization based on localized information in both spatial and temporal domains for highly-accelerated CMR perfusion with potential utility in free-breathing 3D acquisitions. PMID:24123058
QR-decomposition based SENSE reconstruction using parallel architecture.
Ullah, Irfan; Nisar, Habab; Raza, Haseeb; Qasim, Malik; Inam, Omair; Omer, Hammad
2018-04-01
Magnetic Resonance Imaging (MRI) is a powerful medical imaging technique that provides essential clinical information about the human body. One major limitation of MRI is its long scan time. Implementation of advance MRI algorithms on a parallel architecture (to exploit inherent parallelism) has a great potential to reduce the scan time. Sensitivity Encoding (SENSE) is a Parallel Magnetic Resonance Imaging (pMRI) algorithm that utilizes receiver coil sensitivities to reconstruct MR images from the acquired under-sampled k-space data. At the heart of SENSE lies inversion of a rectangular encoding matrix. This work presents a novel implementation of GPU based SENSE algorithm, which employs QR decomposition for the inversion of the rectangular encoding matrix. For a fair comparison, the performance of the proposed GPU based SENSE reconstruction is evaluated against single and multicore CPU using openMP. Several experiments against various acceleration factors (AFs) are performed using multichannel (8, 12 and 30) phantom and in-vivo human head and cardiac datasets. Experimental results show that GPU significantly reduces the computation time of SENSE reconstruction as compared to multi-core CPU (approximately 12x speedup) and single-core CPU (approximately 53x speedup) without any degradation in the quality of the reconstructed images. Copyright © 2018 Elsevier Ltd. All rights reserved.
Afshar, Yaser; Sbalzarini, Ivo F.
2016-01-01
Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 1010 pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. PMID:27046144
Afshar, Yaser; Sbalzarini, Ivo F
2016-01-01
Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10) pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.
IceT users' guide and reference.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.
2011-01-01
The Image Composition Engine for Tiles (IceT) is a high-performance sort-last parallel rendering library. In addition to providing accelerated rendering for a standard display, IceT provides the unique ability to generate images for tiled displays. The overall resolution of the display may be several times larger than any viewport that may be rendered by a single machine. This document is an overview of the user interface to IceT.
Choi, Hyungsuk; Choi, Woohyuk; Quan, Tran Minh; Hildebrand, David G C; Pfister, Hanspeter; Jeong, Won-Ki
2014-12-01
As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and even experienced programmers often fail to fully leverage the computing power of new parallel architectures due to their steep learning curve and programming complexity. In this paper, we propose Vivaldi, a new domain-specific language for volume processing and visualization on distributed heterogeneous computing systems. Vivaldi's Python-like grammar and parallel processing abstractions provide flexible programming tools for non-experts to easily write high-performance parallel computing code. Vivaldi provides commonly used functions and numerical operators for customized visualization and high-throughput image processing applications. We demonstrate the performance and usability of Vivaldi on several examples ranging from volume rendering to image segmentation.
CUDA-based acceleration of collateral filtering in brain MR images
NASA Astrophysics Data System (ADS)
Li, Cheng-Yuan; Chang, Herng-Hua
2017-02-01
Image denoising is one of the fundamental and essential tasks within image processing. In medical imaging, finding an effective algorithm that can remove random noise in MR images is important. This paper proposes an effective noise reduction method for brain magnetic resonance (MR) images. Our approach is based on the collateral filter which is a more powerful method than the bilateral filter in many cases. However, the computation of the collateral filter algorithm is quite time-consuming. To solve this problem, we improved the collateral filter algorithm with parallel computing using GPU. We adopted CUDA, an application programming interface for GPU by NVIDIA, to accelerate the computation. Our experimental evaluation on an Intel Xeon CPU E5-2620 v3 2.40GHz with a NVIDIA Tesla K40c GPU indicated that the proposed implementation runs dramatically faster than the traditional collateral filter. We believe that the proposed framework has established a general blueprint for achieving fast and robust filtering in a wide variety of medical image denoising applications.
Super-resolved Parallel MRI by Spatiotemporal Encoding
Schmidt, Rita; Baishya, Bikash; Ben-Eliezer, Noam; Seginer, Amir; Frydman, Lucio
2016-01-01
Recent studies described an alternative “ultrafast” scanning method based on spatiotemporal (SPEN) principles. SPEN demonstrates numerous potential advantages over EPI-based alternatives, at no additional expense in experimental complexity. An important aspect that SPEN still needs to achieve for providing a competitive acquisition alternative entails exploiting parallel imaging algorithms, without compromising its proven capabilities. The present work introduces a combination of multi-band frequency-swept pulses simultaneously encoding multiple, partial fields-of-view; together with a new algorithm merging a Super-Resolved SPEN image reconstruction and SENSE multiple-receiving methods. The ensuing approach enables one to reduce both the excitation and acquisition times of ultrafast SPEN acquisitions by the customary acceleration factor R, without compromises in either the ensuing spatial resolution, SAR deposition, or the capability to operate in multi-slice mode. The performance of these new single-shot imaging sequences and their ancillary algorithms were explored on phantoms and human volunteers at 3T. The gains of the parallelized approach were particularly evident when dealing with heterogeneous systems subject to major T2/T2* effects, as is the case upon single-scan imaging near tissue/air interfaces. PMID:24120293
The effects of SENSE on PROPELLER imaging.
Chang, Yuchou; Pipe, James G; Karis, John P; Gibbs, Wende N; Zwart, Nicholas R; Schär, Michael
2015-12-01
To study how sensitivity encoding (SENSE) impacts periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) image quality, including signal-to-noise ratio (SNR), robustness to motion, precision of motion estimation, and image quality. Five volunteers were imaged by three sets of scans. A rapid method for generating the g-factor map was proposed and validated via Monte Carlo simulations. Sensitivity maps were extrapolated to increase the area over which SENSE can be performed and therefore enhance the robustness to head motion. The precision of motion estimation of PROPELLER blades that are unfolded with these sensitivity maps was investigated. An interleaved R-factor PROPELLER sequence was used to acquire data with similar amounts of motion with and without SENSE acceleration. Two neuroradiologists independently and blindly compared 214 image pairs. The proposed method of g-factor calculation was similar to that provided by the Monte Carlo methods. Extrapolation and rotation of the sensitivity maps allowed for continued robustness of SENSE unfolding in the presence of motion. SENSE-widened blades improved the precision of rotation and translation estimation. PROPELLER images with a SENSE factor of 3 outperformed the traditional PROPELLER images when reconstructing the same number of blades. SENSE not only accelerates PROPELLER but can also improve robustness and precision of head motion correction, which improves overall image quality even when SNR is lost due to acceleration. The reduction of SNR, as a penalty of acceleration, is characterized by the proposed g-factor method. © 2014 Wiley Periodicals, Inc.
Hernández, Moisés; Guerrero, Ginés D.; Cecilia, José M.; García, José M.; Inuggi, Alberto; Jbabdi, Saad; Behrens, Timothy E. J.; Sotiropoulos, Stamatios N.
2013-01-01
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation. PMID:23658616
Autocalibrating motion-corrected wave-encoding for highly accelerated free-breathing abdominal MRI.
Chen, Feiyu; Zhang, Tao; Cheng, Joseph Y; Shi, Xinwei; Pauly, John M; Vasanawala, Shreyas S
2017-11-01
To develop a motion-robust wave-encoding technique for highly accelerated free-breathing abdominal MRI. A comprehensive 3D wave-encoding-based method was developed to enable fast free-breathing abdominal imaging: (a) auto-calibration for wave-encoding was designed to avoid extra scan for coil sensitivity measurement; (b) intrinsic butterfly navigators were used to track respiratory motion; (c) variable-density sampling was included to enable compressed sensing; (d) golden-angle radial-Cartesian hybrid view-ordering was incorporated to improve motion robustness; and (e) localized rigid motion correction was combined with parallel imaging compressed sensing reconstruction to reconstruct the highly accelerated wave-encoded datasets. The proposed method was tested on six subjects and image quality was compared with standard accelerated Cartesian acquisition both with and without respiratory triggering. Inverse gradient entropy and normalized gradient squared metrics were calculated, testing whether image quality was improved using paired t-tests. For respiratory-triggered scans, wave-encoding significantly reduced residual aliasing and blurring compared with standard Cartesian acquisition (metrics suggesting P < 0.05). For non-respiratory-triggered scans, the proposed method yielded significantly better motion correction compared with standard motion-corrected Cartesian acquisition (metrics suggesting P < 0.01). The proposed methods can reduce motion artifacts and improve overall image quality of highly accelerated free-breathing abdominal MRI. Magn Reson Med 78:1757-1766, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Coristine, Andrew J.; Yerly, Jerome; Stuber, Matthias
2016-01-01
Background Two-dimensional (2D) spatially selective radiofrequency (RF) pulses may be used to excite restricted volumes. By incorporating a "pencil beam" 2D pulse into a T2-Prep, one may create a "2D-T2-Prep" that combines T2-weighting with an intrinsic outer volume suppression. This may particularly benefit parallel imaging techniques, where artefacts typically originate from residual foldover signal. By suppressing foldover signal with a 2D-T2-Prep, image quality may therefore improve. We present numerical simulations, phantom and in vivo validations to address this hypothesis. Methods A 2D-T2-Prep and a conventional T2-Prep were used with GRAPPA-accelerated MRI (R = 1.6). The techniques were first compared in numerical phantoms, where per pixel maps of SNR (SNRmulti), noise, and g-factor were predicted for idealized sequences. Physical phantoms, with compartments doped to mimic blood, myocardium, fat, and coronary vasculature, were scanned with both T2-Preparation techniques to determine the actual SNRmulti and vessel sharpness. For in vivo experiments, the right coronary artery (RCA) was imaged in 10 healthy adults, using accelerations of R = 1,3, and 6, and vessel sharpness was measured for each. Results In both simulations and phantom experiments, the 2D-T2-Prep improved SNR relative to the conventional T2-Prep, by an amount that depended on both the acceleration factor and the degree of outer volume suppression. For in vivo images of the RCA, vessel sharpness improved most at higher acceleration factors, demonstrating that the 2D-T2-Prep especially benefits accelerated coronary MRA. Conclusion Suppressing outer volume signal with a 2D-T2-Prep improves image quality particularly well in GRAPPA-accelerated acquisitions in simulations, phantoms, and volunteers, demonstrating that it should be considered when performing accelerated coronary MRA. PMID:27736866
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms
Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein
2017-01-01
Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts’ Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2–100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms. PMID:28487831
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms.
Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein
2017-01-01
Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts' Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2-100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms.
Reese, Timothy G.; Jackowski, Marcel P.; Cauley, Stephen F.; Setsompop, Kawin; Bhat, Himanshu; Sosnovik, David E.
2017-01-01
Purpose To develop a clinically feasible whole-heart free-breathing diffusion-tensor (DT) magnetic resonance (MR) imaging approach with an imaging time of approximately 15 minutes to enable three-dimensional (3D) tractography. Materials and Methods The study was compliant with HIPAA and the institutional review board and required written consent from the participants. DT imaging was performed in seven healthy volunteers and three patients with pulmonary hypertension by using a stimulated echo sequence. Twelve contiguous short-axis sections and six four-chamber sections that covered the entire left ventricle were acquired by using simultaneous multisection (SMS) excitation with a blipped-controlled aliasing in parallel imaging readout. Rate 2 and rate 3 SMS excitation was defined as two and three times accelerated in the section axis, respectively. Breath-hold and free-breathing images with and without SMS acceleration were acquired. Diffusion-encoding directions were acquired sequentially, spatiotemporally registered, and retrospectively selected by using an entropy-based approach. Myofiber helix angle, mean diffusivity, fractional anisotropy, and 3D tractograms were analyzed by using paired t tests and analysis of variance. Results No significant differences (P > .63) were seen between breath-hold rate 3 SMS and free-breathing rate 2 SMS excitation in transmural myofiber helix angle, mean diffusivity (mean ± standard deviation, [0.89 ± 0.09] × 10−3 mm2/sec vs [0.9 ± 0.09] × 10−3 mm2/sec), or fractional anisotropy (0.43 ± 0.05 vs 0.42 ± 0.06). Three-dimensional tractograms of the left ventricle with no SMS and rate 2 and rate 3 SMS excitation were qualitatively similar. Conclusion Free-breathing DT imaging of the entire human heart can be performed in approximately 15 minutes without section gaps by using SMS excitation with a blipped-controlled aliasing in parallel imaging readout, followed by spatiotemporal registration and entropy-based retrospective image selection. This method may lead to clinical translation of whole-heart DT imaging, enabling broad application in patients with cardiac disease. © RSNA, 2016 Online supplemental material is available for this article. PMID:27681278
Mekkaoui, Choukri; Reese, Timothy G; Jackowski, Marcel P; Cauley, Stephen F; Setsompop, Kawin; Bhat, Himanshu; Sosnovik, David E
2017-03-01
Purpose To develop a clinically feasible whole-heart free-breathing diffusion-tensor (DT) magnetic resonance (MR) imaging approach with an imaging time of approximately 15 minutes to enable three-dimensional (3D) tractography. Materials and Methods The study was compliant with HIPAA and the institutional review board and required written consent from the participants. DT imaging was performed in seven healthy volunteers and three patients with pulmonary hypertension by using a stimulated echo sequence. Twelve contiguous short-axis sections and six four-chamber sections that covered the entire left ventricle were acquired by using simultaneous multisection (SMS) excitation with a blipped-controlled aliasing in parallel imaging readout. Rate 2 and rate 3 SMS excitation was defined as two and three times accelerated in the section axis, respectively. Breath-hold and free-breathing images with and without SMS acceleration were acquired. Diffusion-encoding directions were acquired sequentially, spatiotemporally registered, and retrospectively selected by using an entropy-based approach. Myofiber helix angle, mean diffusivity, fractional anisotropy, and 3D tractograms were analyzed by using paired t tests and analysis of variance. Results No significant differences (P > .63) were seen between breath-hold rate 3 SMS and free-breathing rate 2 SMS excitation in transmural myofiber helix angle, mean diffusivity (mean ± standard deviation, [0.89 ± 0.09] × 10 -3 mm 2 /sec vs [0.9 ± 0.09] × 10 -3 mm 2 /sec), or fractional anisotropy (0.43 ± 0.05 vs 0.42 ± 0.06). Three-dimensional tractograms of the left ventricle with no SMS and rate 2 and rate 3 SMS excitation were qualitatively similar. Conclusion Free-breathing DT imaging of the entire human heart can be performed in approximately 15 minutes without section gaps by using SMS excitation with a blipped-controlled aliasing in parallel imaging readout, followed by spatiotemporal registration and entropy-based retrospective image selection. This method may lead to clinical translation of whole-heart DT imaging, enabling broad application in patients with cardiac disease. © RSNA, 2016 Online supplemental material is available for this article.
Specialized Computer Systems for Environment Visualization
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Bashkov, Evgeniy A.; Zori, Sergii A.
2018-06-01
The need for real time image generation of landscapes arises in various fields as part of tasks solved by virtual and augmented reality systems, as well as geographic information systems. Such systems provide opportunities for collecting, storing, analyzing and graphically visualizing geographic data. Algorithmic and hardware software tools for increasing the realism and efficiency of the environment visualization in 3D visualization systems are proposed. This paper discusses a modified path tracing algorithm with a two-level hierarchy of bounding volumes and finding intersections with Axis-Aligned Bounding Box. The proposed algorithm eliminates the branching and hence makes the algorithm more suitable to be implemented on the multi-threaded CPU and GPU. A modified ROAM algorithm is used to solve the qualitative visualization of reliefs' problems and landscapes. The algorithm is implemented on parallel systems—cluster and Compute Unified Device Architecture-networks. Results show that the implementation on MPI clusters is more efficient than Graphics Processing Unit/Graphics Processing Clusters and allows real-time synthesis. The organization and algorithms of the parallel GPU system for the 3D pseudo stereo image/video synthesis are proposed. With realizing possibility analysis on a parallel GPU-architecture of each stage, 3D pseudo stereo synthesis is performed. An experimental prototype of a specialized hardware-software system 3D pseudo stereo imaging and video was developed on the CPU/GPU. The experimental results show that the proposed adaptation of 3D pseudo stereo imaging to the architecture of GPU-systems is efficient. Also it accelerates the computational procedures of 3D pseudo-stereo synthesis for the anaglyph and anamorphic formats of the 3D stereo frame without performing optimization procedures. The acceleration is on average 11 and 54 times for test GPUs.
Mitra, Ayan; Politte, David G; Whiting, Bruce R; Williamson, Jeffrey F; O'Sullivan, Joseph A
2017-01-01
Model-based image reconstruction (MBIR) techniques have the potential to generate high quality images from noisy measurements and a small number of projections which can reduce the x-ray dose in patients. These MBIR techniques rely on projection and backprojection to refine an image estimate. One of the widely used projectors for these modern MBIR based technique is called branchless distance driven (DD) projection and backprojection. While this method produces superior quality images, the computational cost of iterative updates keeps it from being ubiquitous in clinical applications. In this paper, we provide several new parallelization ideas for concurrent execution of the DD projectors in multi-GPU systems using CUDA programming tools. We have introduced some novel schemes for dividing the projection data and image voxels over multiple GPUs to avoid runtime overhead and inter-device synchronization issues. We have also reduced the complexity of overlap calculation of the algorithm by eliminating the common projection plane and directly projecting the detector boundaries onto image voxel boundaries. To reduce the time required for calculating the overlap between the detector edges and image voxel boundaries, we have proposed a pre-accumulation technique to accumulate image intensities in perpendicular 2D image slabs (from a 3D image) before projection and after backprojection to ensure our DD kernels run faster in parallel GPU threads. For the implementation of our iterative MBIR technique we use a parallel multi-GPU version of the alternating minimization (AM) algorithm with penalized likelihood update. The time performance using our proposed reconstruction method with Siemens Sensation 16 patient scan data shows an average of 24 times speedup using a single TITAN X GPU and 74 times speedup using 3 TITAN X GPUs in parallel for combined projection and backprojection.
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Reconstruction of multiple-pinhole micro-SPECT data using origin ensembles.
Lyon, Morgan C; Sitek, Arkadiusz; Metzler, Scott D; Moore, Stephen C
2016-10-01
The authors are currently developing a dual-resolution multiple-pinhole microSPECT imaging system based on three large NaI(Tl) gamma cameras. Two multiple-pinhole tungsten collimator tubes will be used sequentially for whole-body "scout" imaging of a mouse, followed by high-resolution (hi-res) imaging of an organ of interest, such as the heart or brain. Ideally, the whole-body image will be reconstructed in real time such that data need only be acquired until the area of interest can be visualized well-enough to determine positioning for the hi-res scan. The authors investigated the utility of the origin ensemble (OE) algorithm for online and offline reconstructions of the scout data. This algorithm operates directly in image space, and can provide estimates of image uncertainty, along with reconstructed images. Techniques for accelerating the OE reconstruction were also introduced and evaluated. System matrices were calculated for our 39-pinhole scout collimator design. SPECT projections were simulated for a range of count levels using the MOBY digital mouse phantom. Simulated data were used for a comparison of OE and maximum-likelihood expectation maximization (MLEM) reconstructions. The OE algorithm convergence was evaluated by calculating the total-image entropy and by measuring the counts in a volume-of-interest (VOI) containing the heart. Total-image entropy was also calculated for simulated MOBY data reconstructed using OE with various levels of parallelization. For VOI measurements in the heart, liver, bladder, and soft-tissue, MLEM and OE reconstructed images agreed within 6%. Image entropy converged after ∼2000 iterations of OE, while the counts in the heart converged earlier at ∼200 iterations of OE. An accelerated version of OE completed 1000 iterations in <9 min for a 6.8M count data set, with some loss of image entropy performance, whereas the same dataset required ∼79 min to complete 1000 iterations of conventional OE. A combination of the two methods showed decreased reconstruction time and no loss of performance when compared to conventional OE alone. OE-reconstructed images were found to be quantitatively and qualitatively similar to MLEM, yet OE also provided estimates of image uncertainty. Some acceleration of the reconstruction can be gained through the use of parallel computing. The OE algorithm is useful for reconstructing multiple-pinhole SPECT data and can be easily modified for real-time reconstruction.
Fenchel, Michael; Nael, Kambiz; Deshpande, Vibhas S; Finn, J Paul; Kramer, Ulrich; Miller, Stephan; Ruehm, Stefan; Laub, Gerhard
2006-09-01
The aim of the present study was to assess the feasibility of renal magnetic resonance angiography at 3.0 T using a phased-array coil system with 32-coil elements. Specifically, high parallel imaging factors were used for an increased spatial resolution and anatomic coverage of the whole abdomen. Signal-to-noise values and the g-factor distribution of the 32 element coil were examined in phantom studies for the magnetic resonance angiography (MRA) sequence. Eleven volunteers (6 men, median age of 30.0 years) were examined on a 3.0-T MR scanner (Magnetom Trio, Siemens Medical Solutions, Malvern, PA) using a 32-element phased-array coil (prototype from In vivo Corp.). Contrast-enhanced 3D-MRA (TR 2.95 milliseconds, TE 1.12 milliseconds, flip angle 25-30 degrees , bandwidth 650 Hz/pixel) was acquired with integrated generalized autocalibrating partially parallel acquisition (GRAPPA), in both phase- and slice-encoding direction. Images were assessed by 2 independent observers with regard to image quality, noise and presence of artifacts. Signal-to-noise levels of 22.2 +/- 22.0 and 57.9 +/- 49.0 were measured with (GRAPPAx6) and without parallel-imaging, respectively. The mean g-factor of the 32-element coil for GRAPPA with an acceleration of 3 and 2 in the phase-encoding and slice-encoding direction, respectively, was 1.61. High image quality was found in 9 of 11 volunteers (2.6 +/- 0.8) with good overall interobserver agreement (k = 0.87). Relatively low image quality with higher noise levels were encountered in 2 volunteers. MRA at 3.0 T using a 32-element phased-array coil is feasible in healthy volunteers. High diagnostic image quality and extended anatomic coverage could be achieved with application of high parallel imaging factors.
Lingala, Sajan Goud; Zhu, Yinghua; Lim, Yongwan; Toutios, Asterios; Ji, Yunhua; Lo, Wei-Ching; Seiberlich, Nicole; Narayanan, Shrikanth; Nayak, Krishna S
2017-12-01
To evaluate the feasibility of through-time spiral generalized autocalibrating partial parallel acquisition (GRAPPA) for low-latency accelerated real-time MRI of speech. Through-time spiral GRAPPA (spiral GRAPPA), a fast linear reconstruction method, is applied to spiral (k-t) data acquired from an eight-channel custom upper-airway coil. Fully sampled data were retrospectively down-sampled to evaluate spiral GRAPPA at undersampling factors R = 2 to 6. Pseudo-golden-angle spiral acquisitions were used for prospective studies. Three subjects were imaged while performing a range of speech tasks that involved rapid articulator movements, including fluent speech and beat-boxing. Spiral GRAPPA was compared with view sharing, and a parallel imaging and compressed sensing (PI-CS) method. Spiral GRAPPA captured spatiotemporal dynamics of vocal tract articulators at undersampling factors ≤4. Spiral GRAPPA at 18 ms/frame and 2.4 mm 2 /pixel outperformed view sharing in depicting rapidly moving articulators. Spiral GRAPPA and PI-CS provided equivalent temporal fidelity. Reconstruction latency per frame was 14 ms for view sharing and 116 ms for spiral GRAPPA, using a single processor. Spiral GRAPPA kept up with the MRI data rate of 18ms/frame with eight processors. PI-CS required 17 minutes to reconstruct 5 seconds of dynamic data. Spiral GRAPPA enabled 4-fold accelerated real-time MRI of speech with a low reconstruction latency. This approach is applicable to wide range of speech RT-MRI experiments that benefit from real-time feedback while visualizing rapid articulator movement. Magn Reson Med 78:2275-2282, 2017. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Observational Study of Particle Acceleration in the 2006 December 13 Flare
NASA Astrophysics Data System (ADS)
Minoshima, T.; Morimoto, T.; Kawate, T.; Imada, S.; Koshiishi, H.; Masuda, S.; Kubo, M.; Inoue, S.; Isobe, H.; Krucker, S.; Yokoyama, T.
2008-12-01
We study the particle acceleration in a flare on 2006 December 13, by using the Hinode, RHESSI, Nobeyama Radio Polarimeters (NoRP) and Nobeyama Radioheliograph (NoRH) observations. For technical reasons, both RHESSI and NoRH have a problem in imaging in this flare. Since we have succeeded in solving the problem, it is now possible to discuss the particle acceleration mechanism from an image analysis. This flare shows very long-lasting (1 hour) non-thermal emissions, consisting of many spikes. We focus on the second major spike at 02:29 UT, because the RHESSI image is available only in this period. The RHESSI 35-100 keV HXR image shows double sources located at the footpoints of the western soft X-ray (SXR) loop seen by the Hinode/XRT. The non-linear force-free (NLFF) modeling based on a magnetogram data by Inoue et al. shows the NLFF to potential magnetic transition of the loop, which would induce the electric field and then accelerate particles. Overlaying the HXR image on the photospheric three-dimensional magnetic field map taken by the Hinode Spectro-Polarimeter, we find that the HXR sources are located at the region where the horizontal magnetic fields invert. The NoRH 34 GHz microwave images show the loop structure connecting the HXR sources. The microwave peaks do not located at the top of the loop but between the loop top and the footpoints. The NoRP microwave spectrum shows the soft-hard-soft pattern in the period, same as the HXR spectrum (Ning 2008). From these observational results we suggest that the electrons were accelerated parallel to the magnetic field line near the magnetic separatrix.
High-resolution whole-brain diffusion MRI at 7T using radiofrequency parallel transmission.
Wu, Xiaoping; Auerbach, Edward J; Vu, An T; Moeller, Steen; Lenglet, Christophe; Schmitter, Sebastian; Van de Moortele, Pierre-François; Yacoub, Essa; Uğurbil, Kâmil
2018-03-30
Investigating the utility of RF parallel transmission (pTx) for Human Connectome Project (HCP)-style whole-brain diffusion MRI (dMRI) data at 7 Tesla (7T). Healthy subjects were scanned in pTx and single-transmit (1Tx) modes. Multiband (MB), single-spoke pTx pulses were designed to image sagittal slices. HCP-style dMRI data (i.e., 1.05-mm resolutions, MB2, b-values = 1000/2000 s/mm 2 , 286 images and 40-min scan) and data with higher accelerations (MB3 and MB4) were acquired with pTx. pTx significantly improved flip-angle detected signal uniformity across the brain, yielding ∼19% increase in temporal SNR (tSNR) averaged over the brain relative to 1Tx. This allowed significantly enhanced estimation of multiple fiber orientations (with ∼21% decrease in dispersion) in HCP-style 7T dMRI datasets. Additionally, pTx pulses achieved substantially lower power deposition, permitting higher accelerations, enabling collection of the same data in 2/3 and 1/2 the scan time or of more data in the same scan time. pTx provides a solution to two major limitations for slice-accelerated high-resolution whole-brain dMRI at 7T; it improves flip-angle uniformity, and enables higher slice acceleration relative to current state-of-the-art. As such, pTx provides significant advantages for rapid acquisition of high-quality, high-resolution truly whole-brain dMRI data. © 2018 International Society for Magnetic Resonance in Medicine.
A 128-channel receive-only cardiac coil for highly accelerated cardiac MRI at 3 Tesla.
Schmitt, Melanie; Potthast, Andreas; Sosnovik, David E; Polimeni, Jonathan R; Wiggins, Graham C; Triantafyllou, Christina; Wald, Lawrence L
2008-06-01
A 128-channel receive-only array coil is described and tested for cardiac imaging at 3T. The coil is closely contoured to the body with a "clam-shell" geometry with 68 posterior and 60 anterior elements, each 75 mm in diameter, and arranged in a continuous overlapped array of hexagonal symmetry to minimize nearest neighbor coupling. Signal-to-noise ratio (SNR) and noise amplification for parallel imaging (G-factor) were evaluated in phantom and volunteer experiments. These results were compared to those of commercially available 24-channel and 32-channel coils in routine use for cardiac imaging. The in vivo measurements with the 128-channel coil resulted in SNR gains compared to the 24-channel coil (up to 2.2-fold in the apex). The 128- and 32-channel coils showed similar SNR in the heart, likely dominated by the similar element diameters of these coils. The maximum G-factor values were up to seven times better for a seven-fold acceleration factor (R=7) compared to the 24-channel coil and up to two-fold improved compared to the 32-channel coil. The ability of the 128-channel coil to facilitate highly accelerated cardiac imaging was demonstrated in four volunteers using acceleration factors up to seven-fold (R=7) in a single spatial dimension. Copyright (c) 2008 Wiley-Liss, Inc.
Li, Jian; Bloch, Pavel; Xu, Jing; Sarunic, Marinko V; Shannon, Lesley
2011-05-01
Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not "share" memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform's memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.
Li, Bingyi; Chen, Liang; Yu, Wenyue; Xie, Yizhuang; Bian, Mingming; Zhang, Qingjun; Pang, Long
2018-01-01
With the development of satellite load technology and very large-scale integrated (VLSI) circuit technology, on-board real-time synthetic aperture radar (SAR) imaging systems have facilitated rapid response to disasters. A key goal of the on-board SAR imaging system design is to achieve high real-time processing performance under severe size, weight, and power consumption constraints. This paper presents a multi-node prototype system for real-time SAR imaging processing. We decompose the commonly used chirp scaling (CS) SAR imaging algorithm into two parts according to the computing features. The linearization and logic-memory optimum allocation methods are adopted to realize the nonlinear part in a reconfigurable structure, and the two-part bandwidth balance method is used to realize the linear part. Thus, float-point SAR imaging processing can be integrated into a single Field Programmable Gate Array (FPGA) chip instead of relying on distributed technologies. A single-processing node requires 10.6 s and consumes 17 W to focus on 25-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384. The design methodology of the multi-FPGA parallel accelerating system under the real-time principle is introduced. As a proof of concept, a prototype with four processing nodes and one master node is implemented using a Xilinx xc6vlx315t FPGA. The weight and volume of one single machine are 10 kg and 32 cm × 24 cm × 20 cm, respectively, and the power consumption is under 100 W. The real-time performance of the proposed design is demonstrated on Chinese Gaofen-3 stripmap continuous imaging. PMID:29495637
Zhang, Tao; Yousaf, Ufra; Hsiao, Albert; Cheng, Joseph Y; Alley, Marcus T; Lustig, Michael; Pauly, John M; Vasanawala, Shreyas S
2015-10-01
Pediatric contrast-enhanced MR angiography is often limited by respiration, other patient motion and compromised spatiotemporal resolution. To determine the reliability of a free-breathing spatiotemporally accelerated 3-D time-resolved contrast-enhanced MR angiography method for depicting abdominal arterial anatomy in young children. With IRB approval and informed consent, we retrospectively identified 27 consecutive children (16 males and 11 females; mean age: 3.8 years, range: 14 days to 8.4 years) referred for contrast-enhanced MR angiography at our institution, who had undergone free-breathing spatiotemporally accelerated time-resolved contrast-enhanced MR angiography studies. A radio-frequency-spoiled gradient echo sequence with Cartesian variable density k-space sampling and radial view ordering, intrinsic motion navigation and intermittent fat suppression was developed. Images were reconstructed with soft-gated parallel imaging locally low-rank method to achieve both motion correction and high spatiotemporal resolution. Quality of delineation of 13 abdominal arteries in the reconstructed images was assessed independently by two radiologists on a five-point scale. Ninety-five percent confidence intervals of the proportion of diagnostically adequate cases were calculated. Interobserver agreements were also analyzed. Eleven out of 13 arteries achieved acceptable image quality (mean score range: 3.9-5.0) for both readers. Fair to substantial interobserver agreement was reached on nine arteries. Free-breathing spatiotemporally accelerated 3-D time-resolved contrast-enhanced MR angiography frequently yields diagnostic image quality for most abdominal arteries in young children.
Pryor, Alan; Ophus, Colin; Miao, Jianwei
2017-10-25
Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. In this paper, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditionalmore » multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic, using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic.« less
Pryor, Alan; Ophus, Colin; Miao, Jianwei
2017-01-01
Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. Here, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditional multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic , using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pryor, Alan; Ophus, Colin; Miao, Jianwei
Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. In this paper, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditionalmore » multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic, using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Qiaofeng; Sawatzky, Alex; Anastasio, Mark A., E-mail: anastasio@wustl.edu
Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that ismore » solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated accelerated FISTAs for use with two nonsmooth penalty functions that will lead to further reductions in image reconstruction times while preserving image quality. Moreover, with the help of a mixed sparsity-regularization, better preservation of soft-tissue structures can be potentially obtained. The algorithms were systematically evaluated by use of computer-simulated and clinical data sets.« less
Xu, Qiaofeng; Yang, Deshan; Tan, Jun; Sawatzky, Alex; Anastasio, Mark A
2016-04-01
The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated accelerated FISTAs for use with two nonsmooth penalty functions that will lead to further reductions in image reconstruction times while preserving image quality. Moreover, with the help of a mixed sparsity-regularization, better preservation of soft-tissue structures can be potentially obtained. The algorithms were systematically evaluated by use of computer-simulated and clinical data sets.
Xu, Qiaofeng; Yang, Deshan; Tan, Jun; Sawatzky, Alex; Anastasio, Mark A.
2016-01-01
Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated accelerated FISTAs for use with two nonsmooth penalty functions that will lead to further reductions in image reconstruction times while preserving image quality. Moreover, with the help of a mixed sparsity-regularization, better preservation of soft-tissue structures can be potentially obtained. The algorithms were systematically evaluated by use of computer-simulated and clinical data sets. PMID:27036582
Arcmancer: Geodesics and polarized radiative transfer library
NASA Astrophysics Data System (ADS)
Pihajoki, Pauli; Mannerkoski, Matias; Nättilä, Joonas; Johansson, Peter H.
2018-05-01
Arcmancer computes geodesics and performs polarized radiative transfer in user-specified spacetimes. The library supports Riemannian and semi-Riemannian spaces of any dimension and metric; it also supports multiple simultaneous coordinate charts, embedded geometric shapes, local coordinate systems, and automatic parallel propagation. Arcmancer can be used to solve various problems in numerical geometry, such as solving the curve equation of motion using adaptive integration with configurable tolerances and differential equations along precomputed curves. It also provides support for curves with an arbitrary acceleration term and generic tools for generating ray initial conditions and performing parallel computation over the image, among other tools.
Whole-brain background-suppressed pCASL MRI with 1D-accelerated 3D RARE Stack-Of-Spirals readout
Vidorreta, Marta; Wang, Ze; Chang, Yulin V.; Wolk, David A.; Fernández-Seara, María A.; Detre, John A.
2017-01-01
Arterial Spin Labeled (ASL) perfusion MRI enables non-invasive, quantitative measurements of tissue perfusion, and has a broad range of applications including brain functional imaging. However, ASL suffers from low signal-to-noise ratio (SNR), limiting image resolution. Acquisitions using 3D readouts are optimal for background-suppression of static signals, but can be SAR intensive and typically suffer from through-plane blurring. In this study, we investigated the use of accelerated 3D readouts to obtain whole-brain, high-SNR ASL perfusion maps and reduce SAR deposition. Parallel imaging was implemented along the partition-encoding direction in a pseudo-continuous ASL sequence with background-suppression and 3D RARE Stack-Of-Spirals readout, and its performance was evaluated in three small cohorts. First, both non-accelerated and two-fold accelerated single-shot versions of the sequence were evaluated in healthy volunteers during a motor-photic task, and the performance was compared in terms of temporal SNR, GM-WM contrast, and statistical significance of the detected activation. Secondly, single-shot 1D-accelerated imaging was compared to a two-shot accelerated version to assess benefits of SNR and spatial resolution for applications in which temporal resolution is not paramount. Third, the efficacy of this approach in clinical populations was assessed by applying the single-shot 1D-accelerated version to a larger cohort of elderly volunteers. Accelerated data demonstrated the ability to detect functional activation at the subject level, including cerebellar activity, without loss in the perfusion signal temporal stability and the statistical power of the activations. The use of acceleration also resulted in increased GM-WM contrast, likely due to reduced through-plane partial volume effects, that were further attenuated with the use of two-shot readouts. In a clinical cohort, image quality remained excellent, and expected effects of age and sex on cerebral blood flow could be detected. The sequence is freely available upon request for academic use and could benefit a broad range of cognitive and clinical neuroscience research. PMID:28837640
Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam
2013-01-01
Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness. PMID:23867746
Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam
2013-07-17
Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness.
GPU-Accelerated Hybrid Algorithm for 3D Localization of Fluorescent Emitters in Dense Clusters
NASA Astrophysics Data System (ADS)
Jung, Yoon; Barsic, Anthony; Piestun, Rafael; Fakhri, Nikta
In stochastic switching-based super-resolution imaging, a random subset of fluorescent emitters are imaged and localized for each frame to construct a single high resolution image. However, the condition of non-overlapping point spread functions (PSFs) imposes constraints on experimental parameters. Recent development in post processing methods such as dictionary-based sparse support recovery using compressive sensing has shown up to an order of magnitude higher recall rate than single emitter fitting methods. However, the computational complexity of this approach scales poorly with the grid size and requires long runtime. Here, we introduce a fast and accurate compressive sensing algorithm for localizing fluorescent emitters in high density in 3D, namely sparse support recovery using Orthogonal Matching Pursuit (OMP) and L1-Homotopy algorithm for reconstructing STORM images (SOLAR STORM). SOLAR STORM combines OMP with L1-Homotopy to reduce computational complexity, which is further accelerated by parallel implementation using GPUs. This method can be used in a variety of experimental conditions for both in vitro and live cell fluorescence imaging.
Ion acceleration and heating by kinetic Alfvén waves associated with magnetic reconnection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liang, Ji; Lin, Yu; Johnson, Jay R.
In a previous study on the generation and signatures of kinetic Alfv en waves (KAWs) associated with magnetic reconnection in a current sheet revealed that KAWs are a common feature during reconnection [Liang et al. J. Geophys. Res.: Space Phys. 121, 6526 (2016)]. In this paper, ion acceleration and heating by the KAWs generated during magnetic reconnection are investigated with a three-dimensional (3-D) hybrid model. It is found that in the outflow region, a fraction of inflow ions are accelerated by the KAWs generated in the leading bulge region of reconnection, and their parallel velocities gradually increase up to slightly super-Alfv enic. As a result of waveparticle interactions, an accelerated ion beam forms in the direction of the anti-parallel magnetic field, in addition to the core ion population, leading to the development of non-Maxwellian velocity distributions, which include a trapped population with parallel velocities consistent with the wave speed. We then heat ions in both parallel and perpendicular directions. In the parallel direction, the heating results from nonlinear Landau resonance of trapped ions. In the perpendicular direction, however, evidence of stochastic heating by the KAWs is found during the acceleration stage, with an increase of magnetic moment μ. The coherence in the T more » $$\\perp$$ ion temperature and the perpendicular electric and magnetic fields of KAWs also provides evidence for perpendicular heating by KAWs. The parallel and perpendicular heating of the accelerated beam occur simultaneously, leading to the development of temperature anisotropy with the perpendicular temperature T $$\\perp$$>T $$\\parallel$$ temperature. The heating rate agrees with the damping rate of the KAWs, and the heating is dominated by the accelerated ion beam. In the later stage, with the increase of the fraction of the accelerated ions, interaction between the accelerated beam and the core population also contributes to the ion heating, ultimately leading to overlap of the beams and an overall anisotropy with T $$\\perp$$>T $$\\parallel$$.« less
Ion acceleration and heating by kinetic Alfvén waves associated with magnetic reconnection
Liang, Ji; Lin, Yu; Johnson, Jay R.; ...
2017-09-19
In a previous study on the generation and signatures of kinetic Alfv en waves (KAWs) associated with magnetic reconnection in a current sheet revealed that KAWs are a common feature during reconnection [Liang et al. J. Geophys. Res.: Space Phys. 121, 6526 (2016)]. In this paper, ion acceleration and heating by the KAWs generated during magnetic reconnection are investigated with a three-dimensional (3-D) hybrid model. It is found that in the outflow region, a fraction of inflow ions are accelerated by the KAWs generated in the leading bulge region of reconnection, and their parallel velocities gradually increase up to slightly super-Alfv enic. As a result of waveparticle interactions, an accelerated ion beam forms in the direction of the anti-parallel magnetic field, in addition to the core ion population, leading to the development of non-Maxwellian velocity distributions, which include a trapped population with parallel velocities consistent with the wave speed. We then heat ions in both parallel and perpendicular directions. In the parallel direction, the heating results from nonlinear Landau resonance of trapped ions. In the perpendicular direction, however, evidence of stochastic heating by the KAWs is found during the acceleration stage, with an increase of magnetic moment μ. The coherence in the T more » $$\\perp$$ ion temperature and the perpendicular electric and magnetic fields of KAWs also provides evidence for perpendicular heating by KAWs. The parallel and perpendicular heating of the accelerated beam occur simultaneously, leading to the development of temperature anisotropy with the perpendicular temperature T $$\\perp$$>T $$\\parallel$$ temperature. The heating rate agrees with the damping rate of the KAWs, and the heating is dominated by the accelerated ion beam. In the later stage, with the increase of the fraction of the accelerated ions, interaction between the accelerated beam and the core population also contributes to the ion heating, ultimately leading to overlap of the beams and an overall anisotropy with T $$\\perp$$>T $$\\parallel$$.« less
(2 + 1)D-CAIPIRINHA accelerated MR spectroscopic imaging of the brain at 7T.
Strasser, B; Považan, M; Hangel, G; Hingerl, L; Chmelik, M; Gruber, S; Trattnig, S; Bogner, W
2017-08-01
To compare a new parallel imaging (PI) method for multislice proton magnetic resonance spectroscopic imaging ( 1 H-MRSI), termed (2 + 1)D-CAIPIRINHA, with two standard PI methods: 2D-GRAPPA and 2D-CAIPIRINHA at 7 Tesla (T). (2 + 1)D-CAIPIRINHA is a combination of 2D-CAIPIRINHA and slice-CAIPIRINHA. Eight healthy volunteers were measured on a 7T MR scanner using a 32-channel head coil. The best undersampling patterns were estimated for all three PI methods. The artifact powers, g-factors, Cramér-Rao lower bounds (CRLB), and root mean square errors (RMSE) were compared quantitatively among the three PI methods. Metabolic maps and spectra were compared qualitatively. (2 + 1)D-CAIPIRINHA allows acceleration in three spatial dimensions in contrast to 2D-GRAPPA and 2D-CAIPIRINHA. Thus, this sequence significantly decreased the RMSE of the metabolic maps by 12.1 and 6.9%, on average, for 4 < R < 11, compared with 2D-GRAPPA and 2D-CAIPIRINHA, respectively. The artifact power was 22.6 and 8.4% lower, and the CRLB were 3.4 and 0.6% lower, respectively. (2 + 1)-CAIPIRINHA can be implemented for multislice MRSI in the brain, enabling higher accelerations than possible with two-dimensional (2D) parallel imaging methods. An eight-fold acceleration was still feasible in vivo with negligible PI artifacts with lipid decontamination, thus decreasing the measurement time from 120 to 15 min for a 64 × 64 × 4 matrix. Magn Reson Med 78:429-440, 2017. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine. This is an open access article under the terms of the Creative Commons Attribution NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.
Li, Bingyi; Chen, Liang; Wei, Chunpeng; Xie, Yizhuang; Chen, He; Yu, Wenyue
2017-01-01
With the development of satellite load technology and very large scale integrated (VLSI) circuit technology, onboard real-time synthetic aperture radar (SAR) imaging systems have become a solution for allowing rapid response to disasters. A key goal of the onboard SAR imaging system design is to achieve high real-time processing performance with severe size, weight, and power consumption constraints. In this paper, we analyse the computational burden of the commonly used chirp scaling (CS) SAR imaging algorithm. To reduce the system hardware cost, we propose a partial fixed-point processing scheme. The fast Fourier transform (FFT), which is the most computation-sensitive operation in the CS algorithm, is processed with fixed-point, while other operations are processed with single precision floating-point. With the proposed fixed-point processing error propagation model, the fixed-point processing word length is determined. The fidelity and accuracy relative to conventional ground-based software processors is verified by evaluating both the point target imaging quality and the actual scene imaging quality. As a proof of concept, a field- programmable gate array—application-specific integrated circuit (FPGA-ASIC) hybrid heterogeneous parallel accelerating architecture is designed and realized. The customized fixed-point FFT is implemented using the 130 nm complementary metal oxide semiconductor (CMOS) technology as a co-processor of the Xilinx xc6vlx760t FPGA. A single processing board requires 12 s and consumes 21 W to focus a 50-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384. PMID:28672813
Yang, Chen; Li, Bingyi; Chen, Liang; Wei, Chunpeng; Xie, Yizhuang; Chen, He; Yu, Wenyue
2017-06-24
With the development of satellite load technology and very large scale integrated (VLSI) circuit technology, onboard real-time synthetic aperture radar (SAR) imaging systems have become a solution for allowing rapid response to disasters. A key goal of the onboard SAR imaging system design is to achieve high real-time processing performance with severe size, weight, and power consumption constraints. In this paper, we analyse the computational burden of the commonly used chirp scaling (CS) SAR imaging algorithm. To reduce the system hardware cost, we propose a partial fixed-point processing scheme. The fast Fourier transform (FFT), which is the most computation-sensitive operation in the CS algorithm, is processed with fixed-point, while other operations are processed with single precision floating-point. With the proposed fixed-point processing error propagation model, the fixed-point processing word length is determined. The fidelity and accuracy relative to conventional ground-based software processors is verified by evaluating both the point target imaging quality and the actual scene imaging quality. As a proof of concept, a field- programmable gate array-application-specific integrated circuit (FPGA-ASIC) hybrid heterogeneous parallel accelerating architecture is designed and realized. The customized fixed-point FFT is implemented using the 130 nm complementary metal oxide semiconductor (CMOS) technology as a co-processor of the Xilinx xc6vlx760t FPGA. A single processing board requires 12 s and consumes 21 W to focus a 50-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384.
NASA Astrophysics Data System (ADS)
Lee, Minsuk; Won, Youngjae; Park, Byungjun; Lee, Seungrag
2017-02-01
Not only static characteristics but also dynamic characteristics of the red blood cell (RBC) contains useful information for the blood diagnosis. Quantitative phase imaging (QPI) can capture sample images with subnanometer scale depth resolution and millisecond scale temporal resolution. Various researches have been used QPI for the RBC diagnosis, and recently many researches has been developed to decrease the process time of RBC information extraction using QPI by the parallel computing algorithm, however previous studies are interested in the static parameters such as morphology of the cells or simple dynamic parameters such as root mean square (RMS) of the membrane fluctuations. Previously, we presented a practical blood test method using the time series correlation analysis of RBC membrane flickering with QPI. However, this method has shown that there is a limit to the clinical application because of the long computation time. In this study, we present an accelerated time series correlation analysis of RBC membrane flickering using the parallel computing algorithm. This method showed consistent fractal scaling exponent results of the surrounding medium and the normal RBC with our previous research.
Supercomputer algorithms for efficient linear octree encoding of three-dimensional brain images.
Berger, S B; Reis, D J
1995-02-01
We designed and implemented algorithms for three-dimensional (3-D) reconstruction of brain images from serial sections using two important supercomputer architectures, vector and parallel. These architectures were represented by the Cray YMP and Connection Machine CM-2, respectively. The programs operated on linear octree representations of the brain data sets, and achieved 500-800 times acceleration when compared with a conventional laboratory workstation. As the need for higher resolution data sets increases, supercomputer algorithms may offer a means of performing 3-D reconstruction well above current experimental limits.
A Simple Application of Compressed Sensing to Further Accelerate Partially Parallel Imaging
Miao, Jun; Guo, Weihong; Narayan, Sreenath; Wilson, David L.
2012-01-01
Compressed Sensing (CS) and partially parallel imaging (PPI) enable fast MR imaging by reducing the amount of k-space data required for reconstruction. Past attempts to combine these two have been limited by the incoherent sampling requirement of CS, since PPI routines typically sample on a regular (coherent) grid. Here, we developed a new method, “CS+GRAPPA,” to overcome this limitation. We decomposed sets of equidistant samples into multiple random subsets. Then, we reconstructed each subset using CS, and averaging the results to get a final CS k-space reconstruction. We used both a standard CS, and an edge and joint-sparsity guided CS reconstruction. We tested these intermediate results on both synthetic and real MR phantom data, and performed a human observer experiment to determine the effectiveness of decomposition, and to optimize the number of subsets. We then used these CS reconstructions to calibrate the GRAPPA complex coil weights. In vivo parallel MR brain and heart data sets were used. An objective image quality evaluation metric, Case-PDM, was used to quantify image quality. Coherent aliasing and noise artifacts were significantly reduced using two decompositions. More decompositions further reduced coherent aliasing and noise artifacts but introduced blurring. However, the blurring was effectively minimized using our new edge and joint-sparsity guided CS using two decompositions. Numerical results on parallel data demonstrated that the combined method greatly improved image quality as compared to standard GRAPPA, on average halving Case-PDM scores across a range of sampling rates. The proposed technique allowed the same Case-PDM scores as standard GRAPPA, using about half the number of samples. We conclude that the new method augments GRAPPA by combining it with CS, allowing CS to work even when the k-space sampling pattern is equidistant. PMID:22902065
Bollache, Emilie; Barker, Alex J; Dolan, Ryan Scott; Carr, James C; van Ooij, Pim; Ahmadian, Rouzbeh; Powell, Alex; Collins, Jeremy D; Geiger, Julia; Markl, Michael
2018-01-01
To assess the performance of highly accelerated free-breathing aortic four-dimensional (4D) flow MRI acquired in under 2 minutes compared to conventional respiratory gated 4D flow. Eight k-t accelerated nongated 4D flow MRI (parallel MRI with extended and averaged generalized autocalibrating partially parallel acquisition kernels [PEAK GRAPPA], R = 5, TRes = 67.2 ms) using four k y -k z Cartesian sampling patterns (linear, center-out, out-center-out, random) and two spatial resolutions (SRes1 = 3.5 × 2.3 × 2.6 mm 3 , SRes2 = 4.5 × 2.3 × 2.6 mm 3 ) were compared in vitro (aortic coarctation flow phantom) and in 10 healthy volunteers, to conventional 4D flow (16 mm-navigator acceptance window; R = 2; TRes = 39.2 ms; SRes = 3.2 × 2.3 × 2.4 mm 3 ). The best k-t accelerated approach was further assessed in 10 patients with aortic disease. The k-t accelerated in vitro aortic peak flow (Qmax), net flow (Qnet), and peak velocity (Vmax) were lower than conventional 4D flow indices by ≤4.7%, ≤ 11%, and ≤22%, respectively. In vivo k-t accelerated acquisitions were significantly shorter but showed a trend to lower image quality compared to conventional 4D flow. Hemodynamic indices for linear and out-center-out k-space samplings were in agreement with conventional 4D flow (Qmax ≤ 13%, Qnet ≤ 13%, Vmax ≤ 17%, P > 0.05). Aortic 4D flow MRI in under 2 minutes is feasible with moderate underestimation of flow indices. Differences in k-space sampling patterns suggest an opportunity to mitigate image artifacts by an optimal trade-off between scan time, acceleration, and k-space sampling. Magn Reson Med 79:195-207, 2018. © 2018 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
96-Channel receive-only head coil for 3 Tesla: design optimization and evaluation.
Wiggins, Graham C; Polimeni, Jonathan R; Potthast, Andreas; Schmitt, Melanie; Alagappan, Vijay; Wald, Lawrence L
2009-09-01
The benefits and challenges of highly parallel array coils for head imaging were investigated through the development of a 3T receive-only phased-array head coil with 96 receive elements constructed on a close-fitting helmet-shaped former. We evaluated several designs for the coil elements and matching circuitry, with particular attention to sources of signal-to-noise ratio (SNR) loss, including various sources of coil loading and coupling between the array elements. The SNR and noise amplification (g-factor) in accelerated imaging were quantitatively evaluated in phantom and human imaging and compared to a 32-channel array built on an identical helmet-shaped former and to a larger commercial 12-channel head coil. The 96-channel coil provided substantial SNR gains in the distal cortex compared to the 12- and 32-channel coils. The central SNR for the 96-channel coil was similar to the 32-channel coil for optimum SNR combination and 20% lower for root-sum-of-squares combination. There was a significant reduction in the maximum g-factor for 96 channels compared to 32; for example, the 96-channel maximum g-factor was 65% of the 32-channel value for acceleration rate 4. The performance of the array is demonstrated in highly accelerated brain images.
Yousaf, Ufra; Hsiao, Albert; Cheng, Joseph Y.; Alley, Marcus T.; Lustig, Michael; Pauly, John M.; Vasanawala, Shreyas S.
2015-01-01
Background Pediatric contrast-enhanced MR angiography is often limited by respiration, other patient motion and compromised spatiotemporal resolution. Objective To determine the reliability of a free-breathing spatiotemporally accelerated 3-D time-resolved contrast enhanced MR angiography method for depicting abdominal arterial anatomy in young children. Materials and methods With IRB approval and informed consent, we retrospectively identified 27 consecutive children (16 males and 11 females; mean age: 3.8 years, range: 14 days to 8.4 years) referred for contrast enhanced MR angiography at our institution, who had undergone free-breathing spatiotemporally accelerated time-resolved contrast enhanced MR angiography studies. An radio-frequency-spoiled gradient echo sequence with Cartesian variable density k-space sampling and radial view ordering, intrinsic motion navigation and intermittent fat suppression was developed. Images were reconstructed with soft-gated parallel imaging locally low-rank method to achieve both motion correction and high spatiotemporal resolution. Quality of delineation of 13 abdominal arteries in the reconstructed images was assessed independently by two radiologists on a five-point scale. Ninety-five percent confidence intervals of the proportion of diagnostically adequate cases were calculated. Interobserver agreements were also analyzed. Results Eleven out of 13 arteries achieved acceptable image quality (mean score range: 3.9–5.0) for both readers. Fair to substantial interobserver agreement was reached on nine arteries. Conclusion Free-breathing spatiotemporally accelerated 3-D time-resolved contrast enhanced MR angiography frequently yields diagnostic image quality for most abdominal arteries for pediatric contrast enhanced MR angiography. PMID:26040509
Otazo, Ricardo; Tsai, Shang-Yueh; Lin, Fa-Hsuan; Posse, Stefan
2007-12-01
MR spectroscopic imaging (MRSI) with whole brain coverage in clinically feasible acquisition times still remains a major challenge. A combination of MRSI with parallel imaging has shown promise to reduce the long encoding times and 2D acceleration with a large array coil is expected to provide high acceleration capability. In this work a very high-speed method for 3D-MRSI based on the combination of proton echo planar spectroscopic imaging (PEPSI) with regularized 2D-SENSE reconstruction is developed. Regularization was performed by constraining the singular value decomposition of the encoding matrix to reduce the effect of low-value and overlapped coil sensitivities. The effects of spectral heterogeneity and discontinuities in coil sensitivity across the spectroscopic voxels were minimized by unaliasing the point spread function. As a result the contamination from extracranial lipids was reduced 1.6-fold on average compared to standard SENSE. We show that the acquisition of short-TE (15 ms) 3D-PEPSI at 3 T with a 32 x 32 x 8 spatial matrix using a 32-channel array coil can be accelerated 8-fold (R = 4 x 2) along y-z to achieve a minimum acquisition time of 1 min. Maps of the concentrations of N-acetyl-aspartate, creatine, choline, and glutamate were obtained with moderate reduction in spatial-spectral quality. The short acquisition time makes the method suitable for volumetric metabolite mapping in clinical studies. (c) 2007 Wiley-Liss, Inc.
Stirnberg, Rüdiger; Huijbers, Willem; Brenner, Daniel; Poser, Benedikt A; Breteler, Monique; Stöcker, Tony
2017-12-01
State-of-the-art simultaneous-multi-slice (SMS-)EPI and 3D-EPI share several properties that benefit functional MRI acquisition. Both sequences employ equivalent parallel imaging undersampling with controlled aliasing to achieve high temporal sampling rates. As a volumetric imaging sequence, 3D-EPI offers additional means of acceleration complementary to 2D-CAIPIRINHA sampling, such as fast water excitation and elliptical sampling. We performed an application-oriented comparison between a tailored, six-fold CAIPIRINHA-accelerated 3D-EPI protocol at 530 ms temporal and 2.4 mm isotropic spatial resolution and an SMS-EPI protocol with identical spatial and temporal resolution for whole-brain resting-state fMRI at 3 T. The latter required eight-fold slice acceleration to compensate for the lack of elliptical sampling and fast water excitation. Both sequences used vendor-supplied on-line image reconstruction. We acquired test/retest resting-state fMRI scans in ten volunteers, with simultaneous acquisition of cardiac and respiration data, subsequently used for optional physiological noise removal (nuisance regression). We found that the 3D-EPI protocol has significantly increased temporal signal-to-noise ratio throughout the brain as compared to the SMS-EPI protocol, especially when employing motion and nuisance regression. Both sequence types reliably identified known functional networks with stronger functional connectivity values for the 3D-EPI protocol. We conclude that the more time-efficient 3D-EPI primarily benefits from reduced parallel imaging noise due to a higher, actual k-space sampling density compared to SMS-EPI. The resultant BOLD sensitivity increase makes 3D-EPI a valuable alternative to SMS-EPI for whole-brain fMRI at 3 T, with voxel sizes well below 3 mm isotropic and sampling rates high enough to separate dominant cardiac signals from BOLD signals in the frequency domain. Copyright © 2017 Elsevier Inc. All rights reserved.
Wu, Zhe; Bilgic, Berkin; He, Hongjian; Tong, Qiqi; Sun, Yi; Du, Yiping; Setsompop, Kawin; Zhong, Jianhui
2018-09-01
This study introduces a highly accelerated whole-brain direct visualization of short transverse relaxation time component (ViSTa) imaging using a wave controlled aliasing in parallel imaging (CAIPI) technique, for acquisition within a clinically acceptable scan time, with the preservation of high image quality and sufficient spatial resolution, and reduced residual point spread function artifacts. Double inversion RF pulses were applied to preserve the signal from short T 1 components for directly extracting myelin water signal in ViSTa imaging. A 2D simultaneous multislice and a 3D acquisition of ViSTa images incorporating wave-encoding were used for data acquisition. Improvements brought by a zero-padding method in wave-CAIPI reconstruction were also investigated. The zero-padding method in wave-CAIPI reconstruction reduced the root-mean-square errors between the wave-encoded and Cartesian gradient echoes for all wave gradient configurations in simulation, and reduced the side-main lobe intensity ratio from 34.5 to 16% in the thin-slab in vivo ViSTa images. In a 4 × acceleration simultaneous-multislice scenario, wave-CAIPI ViSTa achieved negligible g-factors (g mean /g max = 1.03/1.10), while retaining minimal interslice artifacts. An 8 × accelerated acquisition of 3D wave-CAIPI ViSTa imaging covering the whole brain with 1.1 × 1.1 × 3 mm 3 voxel size was achieved within 15 minutes, and only incurred a small g-factor penalty (g mean /g max = 1.05/1.16). Whole-brain ViSTa images were obtained within 15 minutes with negligible g-factor penalty by using wave-CAIPI acquisition and zero-padding reconstruction. The proposed zero-padding method was shown to be effective in reducing residual point spread function for wave-encoded images, particularly for ViSTa. © 2018 International Society for Magnetic Resonance in Medicine.
Parallelization and checkpointing of GPU applications through program transformation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Solano-Quinde, Lizandro Damian
2012-01-01
GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less
An extended algebraic reconstruction technique (E-ART) for dual spectral CT.
Zhao, Yunsong; Zhao, Xing; Zhang, Peng
2015-03-01
Compared with standard computed tomography (CT), dual spectral CT (DSCT) has many advantages for object separation, contrast enhancement, artifact reduction, and material composition assessment. But it is generally difficult to reconstruct images from polychromatic projections acquired by DSCT, because of the nonlinear relation between the polychromatic projections and the images to be reconstructed. This paper first models the DSCT reconstruction problem as a nonlinear system problem; and then extend the classic ART method to solve the nonlinear system. One feature of the proposed method is its flexibility. It fits for any scanning configurations commonly used and does not require consistent rays for different X-ray spectra. Another feature of the proposed method is its high degree of parallelism, which means that the method is suitable for acceleration on GPUs (graphic processing units) or other parallel systems. The method is validated with numerical experiments from simulated noise free and noisy data. High quality images are reconstructed with the proposed method from the polychromatic projections of DSCT. The reconstructed images are still satisfactory even if there are certain errors in the estimated X-ray spectra.
Speculation and replication in temperature accelerated dynamics
Zamora, Richard J.; Perez, Danny; Voter, Arthur F.
2018-02-12
Accelerated Molecular Dynamics (AMD) is a class of MD-based algorithms for the long-time scale simulation of atomistic systems that are characterized by rare-event transitions. Temperature-Accelerated Dynamics (TAD), a traditional AMD approach, hastens state-to-state transitions by performing MD at an elevated temperature. Recently, Speculatively-Parallel TAD (SpecTAD) was introduced, allowing the TAD procedure to exploit parallel computing systems by concurrently executing in a dynamically generated list of speculative future states. Although speculation can be very powerful, it is not always the most efficient use of parallel resources. In this paper, we compare the performance of speculative parallelism with a replica-based technique, similarmore » to the Parallel Replica Dynamics method. A hybrid SpecTAD approach is also presented, in which each speculation process is further accelerated by a local set of replicas. Finally and overall, this work motivates the use of hybrid parallelism whenever possible, as some combination of speculation and replication is typically most efficient.« less
Speculation and replication in temperature accelerated dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Richard J.; Perez, Danny; Voter, Arthur F.
Accelerated Molecular Dynamics (AMD) is a class of MD-based algorithms for the long-time scale simulation of atomistic systems that are characterized by rare-event transitions. Temperature-Accelerated Dynamics (TAD), a traditional AMD approach, hastens state-to-state transitions by performing MD at an elevated temperature. Recently, Speculatively-Parallel TAD (SpecTAD) was introduced, allowing the TAD procedure to exploit parallel computing systems by concurrently executing in a dynamically generated list of speculative future states. Although speculation can be very powerful, it is not always the most efficient use of parallel resources. In this paper, we compare the performance of speculative parallelism with a replica-based technique, similarmore » to the Parallel Replica Dynamics method. A hybrid SpecTAD approach is also presented, in which each speculation process is further accelerated by a local set of replicas. Finally and overall, this work motivates the use of hybrid parallelism whenever possible, as some combination of speculation and replication is typically most efficient.« less
Sensitivity-encoded (SENSE) proton echo-planar spectroscopic imaging (PEPSI) in the human brain.
Lin, Fa-Hsuan; Tsai, Shang-Yueh; Otazo, Ricardo; Caprihan, Arvind; Wald, Lawrence L; Belliveau, John W; Posse, Stefan
2007-02-01
Magnetic resonance spectroscopic imaging (MRSI) provides spatially resolved metabolite information that is invaluable for both neuroscience studies and clinical applications. However, lengthy data acquisition times, which are a result of time-consuming phase encoding, represent a major challenge for MRSI. Fast MRSI pulse sequences that use echo-planar readout gradients, such as proton echo-planar spectroscopic imaging (PEPSI), are capable of fast spectral-spatial encoding and thus enable acceleration of image acquisition times. Combining PEPSI with recent advances in parallel MRI utilizing RF coil arrays can further accelerate MRSI data acquisition. Here we investigate the feasibility of ultrafast spectroscopic imaging at high field (3T and 4T) by combining PEPSI with sensitivity-encoded (SENSE) MRI using eight-channel head coil arrays. We show that the acquisition of single-average SENSE-PEPSI data at a short TE (15 ms) can be accelerated to 32 s or less, depending on the field strength, to obtain metabolic images of choline (Cho), creatine (Cre), N-acetyl-aspartate (NAA), and J-coupled metabolites (e.g., glutamate (Glu) and inositol (Ino)) with acceptable spectral quality and localization. The experimentally measured reductions in signal-to-noise ratio (SNR) and Cramer-Rao lower bounds (CRLBs) of metabolite resonances were well explained by both the g-factor and reduced measurement times. Thus, this technology is a promising means of reducing the scan times of 3D acquisitions and time-resolved 2D measurements. Copyright (c) 2007 Wiley-Liss, Inc.
González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J
2017-02-01
The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Simultaneous multi-slice combined with PROPELLER.
Norbeck, Ola; Avventi, Enrico; Engström, Mathias; Rydén, Henric; Skare, Stefan
2018-08-01
Simultaneous multi-slice (SMS) imaging is an advantageous method for accelerating MRI scans, allowing reduced scan time, increased slice coverage, or high temporal resolution with limited image quality penalties. In this work we combine the advantages of SMS acceleration with the motion correction and artifact reduction capabilities of the PROPELLER technique. A PROPELLER sequence was developed with support for CAIPIRINHA and phase optimized multiband radio frequency pulses. To minimize the time spent on acquiring calibration data, both in-plane-generalized autocalibrating partial parallel acquisition (GRAPPA) and slice-GRAPPA weights for all PROPELLER blade angles were calibrated on a single fully sampled PROPELLER blade volume. Therefore, the proposed acquisition included a single fully sampled blade volume, with the remaining blades accelerated in both the phase and slice encoding directions without additional auto calibrating signal lines. Comparison to 3D RARE was performed as well as demonstration of 3D motion correction performance on the SMS PROPELLER data. We show that PROPELLER acquisitions can be efficiently accelerated with SMS using a short embedded calibration. The potential in combining these two techniques was demonstrated with a high quality 1.0 × 1.0 × 1.0 mm 3 resolution T 2 -weighted volume, free from banding artifacts, and capable of 3D retrospective motion correction, with higher effective resolution compared to 3D RARE. With the combination of SMS acceleration and PROPELLER imaging, thin-sliced reformattable T 2 -weighted image volumes with 3D retrospective motion correction capabilities can be rapidly acquired with low sensitivity to flow and head motion. Magn Reson Med 80:496-506, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods
Smith, David S.; Gore, John C.; Yankeelov, Thomas E.; Welch, E. Brian
2012-01-01
Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 40962 or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 10242 and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images. PMID:22481908
Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods.
Smith, David S; Gore, John C; Yankeelov, Thomas E; Welch, E Brian
2012-01-01
Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 4096(2) or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 1024(2) and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images.
NASA Astrophysics Data System (ADS)
Doulgerakis, Matthaios; Eggebrecht, Adam; Wojtkiewicz, Stanislaw; Culver, Joseph; Dehghani, Hamid
2017-12-01
Parameter recovery in diffuse optical tomography is a computationally expensive algorithm, especially when used for large and complex volumes, as in the case of human brain functional imaging. The modeling of light propagation, also known as the forward problem, is the computational bottleneck of the recovery algorithm, whereby the lack of a real-time solution is impeding practical and clinical applications. The objective of this work is the acceleration of the forward model, within a diffusion approximation-based finite-element modeling framework, employing parallelization to expedite the calculation of light propagation in realistic adult head models. The proposed methodology is applicable for modeling both continuous wave and frequency-domain systems with the results demonstrating a 10-fold speed increase when GPU architectures are available, while maintaining high accuracy. It is shown that, for a very high-resolution finite-element model of the adult human head with ˜600,000 nodes, consisting of heterogeneous layers, light propagation can be calculated at ˜0.25 s/excitation source.
Goerner, Frank L.; Duong, Timothy; Stafford, R. Jason; Clarke, Geoffrey D.
2013-01-01
Purpose: To investigate the utility of five different standard measurement methods for determining image uniformity for partially parallel imaging (PPI) acquisitions in terms of consistency across a variety of pulse sequences and reconstruction strategies. Methods: Images were produced with a phantom using a 12-channel head matrix coil in a 3T MRI system (TIM TRIO, Siemens Medical Solutions, Erlangen, Germany). Images produced using echo-planar, fast spin echo, gradient echo, and balanced steady state free precession pulse sequences were evaluated. Two different PPI reconstruction methods were investigated, generalized autocalibrating partially parallel acquisition algorithm (GRAPPA) and modified sensitivity-encoding (mSENSE) with acceleration factors (R) of 2, 3, and 4. Additionally images were acquired with conventional, two-dimensional Fourier imaging methods (R = 1). Five measurement methods of uniformity, recommended by the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) were considered. The methods investigated were (1) an ACR method and a (2) NEMA method for calculating the peak deviation nonuniformity, (3) a modification of a NEMA method used to produce a gray scale uniformity map, (4) determining the normalized absolute average deviation uniformity, and (5) a NEMA method that focused on 17 areas of the image to measure uniformity. Changes in uniformity as a function of reconstruction method at the same R-value were also investigated. Two-way analysis of variance (ANOVA) was used to determine whether R-value or reconstruction method had a greater influence on signal intensity uniformity measurements for partially parallel MRI. Results: Two of the methods studied had consistently negative slopes when signal intensity uniformity was plotted against R-value. The results obtained comparing mSENSE against GRAPPA found no consistent difference between GRAPPA and mSENSE with regard to signal intensity uniformity. The results of the two-way ANOVA analysis suggest that R-value and pulse sequence type produce the largest influences on uniformity and PPI reconstruction method had relatively little effect. Conclusions: Two of the methods of measuring signal intensity uniformity, described by the (NEMA) MRI standards, consistently indicated a decrease in uniformity with an increase in R-value. Other methods investigated did not demonstrate consistent results for evaluating signal uniformity in MR images obtained by partially parallel methods. However, because the spatial distribution of noise affects uniformity, it is recommended that additional uniformity quality metrics be investigated for partially parallel MR images. PMID:23927345
Chang, Gregory; Friedrich, Klaus M; Wang, Ligong; Vieira, Renata L R; Schweitzer, Mark E; Recht, Michael P; Wiggins, Graham C; Regatte, Ravinder R
2010-03-01
To determine the feasibility of performing MRI of the wrist at 7 Tesla (T) with parallel imaging and to evaluate how acceleration factors (AF) affect signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), and image quality. This study had institutional review board approval. A four-transmit eight-receive channel array coil was constructed in-house. Nine healthy subjects were scanned on a 7T whole-body MR scanner. Coronal and axial images of cartilage and trabecular bone micro-architecture (3D-Fast Low Angle Shot (FLASH) with and without fat suppression, repetition time/echo time = 20 ms/4.5 ms, flip angle = 10 degrees , 0.169-0.195 x 0.169-0.195 mm, 0.5-1 mm slice thickness) were obtained with AF 1, 2, 3, 4. T1-weighted fast spin-echo (FSE), proton density-weighted FSE, and multiple-echo data image combination (MEDIC) sequences were also performed. SNR and CNR were measured. Three musculoskeletal radiologists rated image quality. Linear correlation analysis and paired t-tests were performed. At higher AF, SNR and CNR decreased linearly for cartilage, muscle, and trabecular bone (r < -0.98). At AF 4, reductions in SNR/CNR were:52%/60% (cartilage), 72%/63% (muscle), 45%/50% (trabecular bone). Radiologists scored images with AF 1 and 2 as near-excellent, AF 3 as good-to-excellent (P = 0.075), and AF 4 as average-to-good (P = 0.11). It is feasible to perform high resolution 7T MRI of the wrist with parallel imaging. SNR and CNR decrease with higher AF, but image quality remains above-average.
Muckley, Matthew J; Noll, Douglas C; Fessler, Jeffrey A
2015-02-01
Sparsity-promoting regularization is useful for combining compressed sensing assumptions with parallel MRI for reducing scan time while preserving image quality. Variable splitting algorithms are the current state-of-the-art algorithms for SENSE-type MR image reconstruction with sparsity-promoting regularization. These methods are very general and have been observed to work with almost any regularizer; however, the tuning of associated convergence parameters is a commonly-cited hindrance in their adoption. Conversely, majorize-minimize algorithms based on a single Lipschitz constant have been observed to be slow in shift-variant applications such as SENSE-type MR image reconstruction since the associated Lipschitz constants are loose bounds for the shift-variant behavior. This paper bridges the gap between the Lipschitz constant and the shift-variant aspects of SENSE-type MR imaging by introducing majorizing matrices in the range of the regularizer matrix. The proposed majorize-minimize methods (called BARISTA) converge faster than state-of-the-art variable splitting algorithms when combined with momentum acceleration and adaptive momentum restarting. Furthermore, the tuning parameters associated with the proposed methods are unitless convergence tolerances that are easier to choose than the constraint penalty parameters required by variable splitting algorithms.
Noll, Douglas C.; Fessler, Jeffrey A.
2014-01-01
Sparsity-promoting regularization is useful for combining compressed sensing assumptions with parallel MRI for reducing scan time while preserving image quality. Variable splitting algorithms are the current state-of-the-art algorithms for SENSE-type MR image reconstruction with sparsity-promoting regularization. These methods are very general and have been observed to work with almost any regularizer; however, the tuning of associated convergence parameters is a commonly-cited hindrance in their adoption. Conversely, majorize-minimize algorithms based on a single Lipschitz constant have been observed to be slow in shift-variant applications such as SENSE-type MR image reconstruction since the associated Lipschitz constants are loose bounds for the shift-variant behavior. This paper bridges the gap between the Lipschitz constant and the shift-variant aspects of SENSE-type MR imaging by introducing majorizing matrices in the range of the regularizer matrix. The proposed majorize-minimize methods (called BARISTA) converge faster than state-of-the-art variable splitting algorithms when combined with momentum acceleration and adaptive momentum restarting. Furthermore, the tuning parameters associated with the proposed methods are unitless convergence tolerances that are easier to choose than the constraint penalty parameters required by variable splitting algorithms. PMID:25330484
Rapid anatomical brain imaging using spiral acquisition and an expanded signal model.
Kasper, Lars; Engel, Maria; Barmet, Christoph; Haeberlin, Maximilian; Wilm, Bertram J; Dietrich, Benjamin E; Schmid, Thomas; Gross, Simon; Brunner, David O; Stephan, Klaas E; Pruessmann, Klaas P
2018-03-01
We report the deployment of spiral acquisition for high-resolution structural imaging at 7T. Long spiral readouts are rendered manageable by an expanded signal model including static off-resonance and B 0 dynamics along with k-space trajectories and coil sensitivity maps. Image reconstruction is accomplished by inversion of the signal model using an extension of the iterative non-Cartesian SENSE algorithm. Spiral readouts up to 25 ms are shown to permit whole-brain 2D imaging at 0.5 mm in-plane resolution in less than a minute. A range of options is explored, including proton-density and T 2 * contrast, acceleration by parallel imaging, different readout orientations, and the extraction of phase images. Results are shown to exhibit competitive image quality along with high geometric consistency. Copyright © 2017 Elsevier Inc. All rights reserved.
Single-shot EPI with Nyquist ghost compensation: Interleaved Dual-Echo with Acceleration (IDEA) EPI
Poser, Benedikt A; Barth, Markus; Goa, Pål-Erik; Deng, Weiran; Stenger, V Andrew
2012-01-01
Echo planar imaging is most commonly used for BOLD fMRI, owing to its sensitivity and acquisition speed. A major problem with EPI is Nyquist (N/2) ghosting, most notably at high field. EPI data are acquired under an oscillating readout gradient and hence vulnerable to gradient imperfections such as eddy current delays and off-resonance effects, as these cause inconsistencies between odd and even k-space lines after time reversal. We propose a straightforward and pragmatic method herein termed Interleaved Dual Echo with Acceleration (IDEA) EPI: Two k-spaces (echoes) are acquired under the positive and negative readout lobes, respectively, by performing phase blips only before alternate readout gradients. From these two k-spaces, two almost entirely ghost free images per shot can be constructed, without need for phase correction. The doubled echo train length can be compensated by parallel imaging and/or partial Fourier acquisition. The two k-spaces can either be complex-averaged during reconstruction, which results in near-perfect cancellation of residual phase errors, or reconstructed into separate images. We demonstrate the efficacy of IDEA EPI and show phantom and in vivo images at both 3 and 7 Tesla. PMID:22411762
NASA Astrophysics Data System (ADS)
Choi, Sunghoon; Lee, Haenghwa; Lee, Donghoon; Choi, Seungyeon; Shin, Jungwook; Jang, Woojin; Seo, Chang-Woo; Kim, Hee-Joung
2017-03-01
A compressed-sensing (CS) technique has been rapidly applied in medical imaging field for retrieving volumetric data from highly under-sampled projections. Among many variant forms, CS technique based on a total-variation (TV) regularization strategy shows fairly reasonable results in cone-beam geometry. In this study, we implemented the TV-based CS image reconstruction strategy in our prototype chest digital tomosynthesis (CDT) R/F system. Due to the iterative nature of time consuming processes in solving a cost function, we took advantage of parallel computing using graphics processing units (GPU) by the compute unified device architecture (CUDA) programming to accelerate our algorithm. In order to compare the algorithmic performance of our proposed CS algorithm, conventional filtered back-projection (FBP) and simultaneous algebraic reconstruction technique (SART) reconstruction schemes were also studied. The results indicated that the CS produced better contrast-to-noise ratios (CNRs) in the physical phantom images (Teflon region-of-interest) by factors of 3.91 and 1.93 than FBP and SART images, respectively. The resulted human chest phantom images including lung nodules with different diameters also showed better visual appearance in the CS images. Our proposed GPU-accelerated CS reconstruction scheme could produce volumetric data up to 80 times than CPU programming. Total elapsed time for producing 50 coronal planes with 1024×1024 image matrix using 41 projection views were 216.74 seconds for proposed CS algorithms on our GPU programming, which could match the clinically feasible time ( 3 min). Consequently, our results demonstrated that the proposed CS method showed a potential of additional dose reduction in digital tomosynthesis with reasonable image quality in a fast time.
A high-performance spatial database based approach for pathology imaging algorithm evaluation
Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.
2013-01-01
Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
Rowe, Daniel B; Bruce, Iain P; Nencka, Andrew S; Hyde, James S; Kociuba, Mary C
2016-04-01
Achieving a reduction in scan time with minimal inter-slice signal leakage is one of the significant obstacles in parallel MR imaging. In fMRI, multiband-imaging techniques accelerate data acquisition by simultaneously magnetizing the spatial frequency spectrum of multiple slices. The SPECS model eliminates the consequential inter-slice signal leakage from the slice unaliasing, while maintaining an optimal reduction in scan time and activation statistics in fMRI studies. When the combined k-space array is inverse Fourier reconstructed, the resulting aliased image is separated into the un-aliased slices through a least squares estimator. Without the additional spatial information from a phased array of receiver coils, slice separation in SPECS is accomplished with acquired aliased images in shifted FOV aliasing pattern, and a bootstrapping approach of incorporating reference calibration images in an orthogonal Hadamard pattern. The aliased slices are effectively separated with minimal expense to the spatial and temporal resolution. Functional activation is observed in the motor cortex, as the number of aliased slices is increased, in a bilateral finger tapping fMRI experiment. The SPECS model incorporates calibration reference images together with coefficients of orthogonal polynomials into an un-aliasing estimator to achieve separated images, with virtually no residual artifacts and functional activation detection in separated images. Copyright © 2015 Elsevier Inc. All rights reserved.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kononenko, Oleksiy
2015-03-26
ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.
NASA Astrophysics Data System (ADS)
Chernousov, Yu. D.; Shebolaev, I. V.; Ikryanov, I. M.
2018-01-01
An electron beam with a high (close to 100%) coefficient of electron capture into the regime of acceleration has been obtained in a linear electron accelerator based on a parallel coupled slow-wave structure, electron gun with microwave-controlled injection current, and permanent-magnet beam-focusing system. The high capture coefficient was due to the properties of the accelerating structure, beam-focusing system, and electron-injection system. Main characteristics of the proposed systems are presented.
On the relationship between collisionless shock structure and energetic particle acceleration
NASA Technical Reports Server (NTRS)
Kennel, C. F.
1983-01-01
Recent experimental research on bow shock structure and theoretical studies of quasi-parallel shock structure and shock acceleration of energetic particles were reviewed, to point out the relationship between structure and particle acceleration. The phenomenological distinction between quasi-parallel and quasi-perpendicular shocks that has emerged from bow shock research; present efforts to extend this work to interplanetary shocks; theories of particle acceleration by shocks; and particle acceleration to shock structures using multiple fluid models were discussed.
Density-based parallel skin lesion border detection with webCL
2015-01-01
Background Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Methods Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Results Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. Conclusions When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser. PMID:26423836
Density-based parallel skin lesion border detection with webCL.
Lemon, James; Kockara, Sinan; Halic, Tansel; Mete, Mutlu
2015-01-01
Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser.
Ion acceleration and heating by kinetic Alfvén waves associated with magnetic reconnection
NASA Astrophysics Data System (ADS)
Liang, Ji; Lin, Yu; Johnson, Jay R.; Wang, Zheng-Xiong; Wang, Xueyi
2017-10-01
Our previous study on the generation and signatures of kinetic Alfvén waves (KAWs) associated with magnetic reconnection in a current sheet revealed that KAWs are a common feature during reconnection [Liang et al. J. Geophys. Res.: Space Phys. 121, 6526 (2016)]. In this paper, ion acceleration and heating by the KAWs generated during magnetic reconnection are investigated with a three-dimensional (3-D) hybrid model. It is found that in the outflow region, a fraction of inflow ions are accelerated by the KAWs generated in the leading bulge region of reconnection, and their parallel velocities gradually increase up to slightly super-Alfvénic. As a result of wave-particle interactions, an accelerated ion beam forms in the direction of the anti-parallel magnetic field, in addition to the core ion population, leading to the development of non-Maxwellian velocity distributions, which include a trapped population with parallel velocities consistent with the wave speed. The ions are heated in both parallel and perpendicular directions. In the parallel direction, the heating results from nonlinear Landau resonance of trapped ions. In the perpendicular direction, however, evidence of stochastic heating by the KAWs is found during the acceleration stage, with an increase of magnetic moment μ. The coherence in the perpendicular ion temperature T⊥ and the perpendicular electric and magnetic fields of KAWs also provides evidence for perpendicular heating by KAWs. The parallel and perpendicular heating of the accelerated beam occur simultaneously, leading to the development of temperature anisotropy with T⊥>T∥ . The heating rate agrees with the damping rate of the KAWs, and the heating is dominated by the accelerated ion beam. In the later stage, with the increase of the fraction of the accelerated ions, interaction between the accelerated beam and the core population also contributes to the ion heating, ultimately leading to overlap of the beams and an overall anisotropy with T∥>T⊥ .
Accelerated Compressed Sensing Based CT Image Reconstruction.
Hashemi, SayedMasoud; Beheshti, Soosan; Gill, Patrick R; Paul, Narinder S; Cobbold, Richard S C
2015-01-01
In X-ray computed tomography (CT) an important objective is to reduce the radiation dose without significantly degrading the image quality. Compressed sensing (CS) enables the radiation dose to be reduced by producing diagnostic images from a limited number of projections. However, conventional CS-based algorithms are computationally intensive and time-consuming. We propose a new algorithm that accelerates the CS-based reconstruction by using a fast pseudopolar Fourier based Radon transform and rebinning the diverging fan beams to parallel beams. The reconstruction process is analyzed using a maximum-a-posterior approach, which is transformed into a weighted CS problem. The weights involved in the proposed model are calculated based on the statistical characteristics of the reconstruction process, which is formulated in terms of the measurement noise and rebinning interpolation error. Therefore, the proposed method not only accelerates the reconstruction, but also removes the rebinning and interpolation errors. Simulation results are shown for phantoms and a patient. For example, a 512 × 512 Shepp-Logan phantom when reconstructed from 128 rebinned projections using a conventional CS method had 10% error, whereas with the proposed method the reconstruction error was less than 1%. Moreover, computation times of less than 30 sec were obtained using a standard desktop computer without numerical optimization.
Accelerated Compressed Sensing Based CT Image Reconstruction
Hashemi, SayedMasoud; Beheshti, Soosan; Gill, Patrick R.; Paul, Narinder S.; Cobbold, Richard S. C.
2015-01-01
In X-ray computed tomography (CT) an important objective is to reduce the radiation dose without significantly degrading the image quality. Compressed sensing (CS) enables the radiation dose to be reduced by producing diagnostic images from a limited number of projections. However, conventional CS-based algorithms are computationally intensive and time-consuming. We propose a new algorithm that accelerates the CS-based reconstruction by using a fast pseudopolar Fourier based Radon transform and rebinning the diverging fan beams to parallel beams. The reconstruction process is analyzed using a maximum-a-posterior approach, which is transformed into a weighted CS problem. The weights involved in the proposed model are calculated based on the statistical characteristics of the reconstruction process, which is formulated in terms of the measurement noise and rebinning interpolation error. Therefore, the proposed method not only accelerates the reconstruction, but also removes the rebinning and interpolation errors. Simulation results are shown for phantoms and a patient. For example, a 512 × 512 Shepp-Logan phantom when reconstructed from 128 rebinned projections using a conventional CS method had 10% error, whereas with the proposed method the reconstruction error was less than 1%. Moreover, computation times of less than 30 sec were obtained using a standard desktop computer without numerical optimization. PMID:26167200
Liu, Jing; Koskas, Louise; Faraji, Farshid; Kao, Evan; Wang, Yan; Haraldsson, Henrik; Kefayati, Sarah; Zhu, Chengcheng; Ahn, Sinyeob; Laub, Gerhard; Saloner, David
2018-04-01
To evaluate an accelerated 4D flow MRI method that provides high temporal resolution in a clinically feasible acquisition time for intracranial velocity imaging. Accelerated 4D flow MRI was developed by using a pseudo-random variable-density Cartesian undersampling strategy (CIRCUS) with the combination of k-t, parallel imaging and compressed sensing image reconstruction techniques (k-t SPARSE-SENSE). Four-dimensional flow data were acquired on five healthy volunteers and eight patients with intracranial aneurysms using CIRCUS (acceleration factor of R = 4, termed CIRCUS4) and GRAPPA (R = 2, termed GRAPPA2) as the reference method. Images with three times higher temporal resolution (R = 12, CIRCUS12) were also reconstructed from the same acquisition as CIRCUS4. Qualitative and quantitative image assessment was performed on the images acquired with different methods, and complex flow patterns in the aneurysms were identified and compared. Four-dimensional flow MRI with CIRCUS was achieved in 5 min and allowed further improved temporal resolution of <30 ms. Volunteer studies showed similar qualitative and quantitative evaluation obtained with the proposed approach compared to the reference (overall image scores: GRAPPA2 3.2 ± 0.6; CIRCUS4 3.1 ± 0.7; CIRCUS12 3.3 ± 0.4; difference of the peak velocities: -3.83 ± 7.72 cm/s between CIRCUS4 and GRAPPA2, -1.72 ± 8.41 cm/s between CIRCUS12 and GRAPPA2). In patients with intracranial aneurysms, the higher temporal resolution improved capturing of the flow features in intracranial aneurysms (pathline visualization scores: GRAPPA2 2.2 ± 0.2; CIRCUS4 2.5 ± 0.5; CIRCUS12 2.7 ± 0.6). The proposed rapid 4D flow MRI with a high temporal resolution is a promising tool for evaluating intracranial aneurysms in a clinically feasible acquisition time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Masters, A.; Dougherty, M. K.; Sulaiman, A. H.
A leading explanation for the origin of Galactic cosmic rays is acceleration at high-Mach number shock waves in the collisionless plasma surrounding young supernova remnants. Evidence for this is provided by multi-wavelength non-thermal emission thought to be associated with ultrarelativistic electrons at these shocks. However, the dependence of the electron acceleration process on the orientation of the upstream magnetic field with respect to the local normal to the shock front (quasi-parallel/quasi-perpendicular) is debated. Cassini spacecraft observations at Saturn’s bow shock have revealed examples of electron acceleration under quasi-perpendicular conditions, and the first in situ evidence of electron acceleration at amore » quasi-parallel shock. Here we use Cassini data to make the first comparison between energy spectra of locally accelerated electrons under these differing upstream magnetic field regimes. We present data taken during a quasi-perpendicular shock crossing on 2008 March 8 and during a quasi-parallel shock crossing on 2007 February 3, highlighting that both were associated with electron acceleration to at least MeV energies. The magnetic signature of the quasi-perpendicular crossing has a relatively sharp upstream–downstream transition, and energetic electrons were detected close to the transition and immediately downstream. The magnetic transition at the quasi-parallel crossing is less clear, energetic electrons were encountered upstream and downstream, and the electron energy spectrum is harder above ∼100 keV. We discuss whether the acceleration is consistent with diffusive shock acceleration theory in each case, and suggest that the quasi-parallel spectral break is due to an energy-dependent interaction between the electrons and short, large-amplitude magnetic structures.« less
Feature Clustering for Accelerating Parallel Coordinate Descent
DOE Office of Scientific and Technical Information (OSTI.GOV)
Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh
2012-12-06
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
Lin, Jyh-Miin; Patterson, Andrew J; Chang, Hing-Chiu; Gillard, Jonathan H; Graves, Martin J
2015-10-01
To propose a new reduced field-of-view (rFOV) strategy for iterative reconstructions in a clinical environment. Iterative reconstructions can incorporate regularization terms to improve the image quality of periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) MRI. However, the large amount of calculations required for full FOV iterative reconstructions has posed a huge computational challenge for clinical usage. By subdividing the entire problem into smaller rFOVs, the iterative reconstruction can be accelerated on a desktop with a single graphic processing unit (GPU). This rFOV strategy divides the iterative reconstruction into blocks, based on the block-diagonal dominant structure. A near real-time reconstruction system was developed for the clinical MR unit, and parallel computing was implemented using the object-oriented model. In addition, the Toeplitz method was implemented on the GPU to reduce the time required for full interpolation. Using the data acquired from the PROPELLER MRI, the reconstructed images were then saved in the digital imaging and communications in medicine format. The proposed rFOV reconstruction reduced the gridding time by 97%, as the total iteration time was 3 s even with multiple processes running. A phantom study showed that the structure similarity index for rFOV reconstruction was statistically superior to conventional density compensation (p < 0.001). In vivo study validated the increased signal-to-noise ratio, which is over four times higher than with density compensation. Image sharpness index was improved using the regularized reconstruction implemented. The rFOV strategy permits near real-time iterative reconstruction to improve the image quality of PROPELLER images. Substantial improvements in image quality metrics were validated in the experiments. The concept of rFOV reconstruction may potentially be applied to other kinds of iterative reconstructions for shortened reconstruction duration.
NASA Astrophysics Data System (ADS)
Song, Y.; Lysak, R. L.
2015-12-01
Parallel E-fields play a crucial role for the acceleration of charged particles, creating discrete aurorae. However, once the parallel electric fields are produced, they will disappear right away, unless the electric fields can be continuously generated and sustained for a fairly long time. Thus, the crucial question in auroral physics is how to generate such a powerful and self-sustained parallel electric fields which can effectively accelerate charge particles to high energy during a fairly long time. We propose that nonlinear interaction of incident and reflected Alfven wave packets in inhomogeneous auroral acceleration region can produce quasi-stationary non-propagating electromagnetic plasma structures, such as Alfvenic double layers (DLs) and Charge Holes. Such Alfvenic quasi-static structures often constitute powerful high energy particle accelerators. The Alfvenic DL consists of localized self-sustained powerful electrostatic electric fields nested in a low density cavity and surrounded by enhanced magnetic and mechanical stresses. The enhanced magnetic and velocity fields carrying the free energy serve as a local dynamo, which continuously create the electrostatic parallel electric field for a fairly long time. The generated parallel electric fields will deepen the seed low density cavity, which then further quickly boosts the stronger parallel electric fields creating both Alfvenic and quasi-static discrete aurorae. The parallel electrostatic electric field can also cause ion outflow, perpendicular ion acceleration and heating, and may excite Auroral Kilometric Radiation.
A comparison of energetic ions in the plasma depletion layer and the quasi-parallel magnetosheath
NASA Technical Reports Server (NTRS)
Fuselier, Stephen A.
1994-01-01
Energetic ion spectra measured by the Active Magnetospheric Particle Tracer Explorers/Charge Composition Explorer (AMPTE/CCE) downstream from the Earth's quasi-parallel bow shock (in the quasi-parallel magnetosheath) and in the plasma depletion layer are compared. In the latter region, energetic ions are from a single source, leakage of magnetospheric ions across the magnetopause and into the plasma depletion layer. In the former region, both the magnetospheric source and shock acceleration of the thermal solar wind population at the quasi-parallel shock can contribute to the energetic ion spectra. The relative strengths of these two energetic ion sources are determined through the comparison of spectra from the two regions. It is found that magnetospheric leakage can provide an upper limit of 35% of the total energetic H(+) population in the quasi-parallel magnetosheath near the magnetopause in the energy range from approximately 10 to approximately 80 keV/e and substantially less than this limit for the energetic He(2+) population. The rest of the energetic H(+) population and nearly all of the energetic He(2+) population are accelerated out of the thermal solar wind population through shock acceleration processes. By comparing the energetic and thermal He(2+) and H(+) populations in the quasi-parallel magnetosheath, it is found that the quasi-parallel bow shock is 2 to 3 times more efficient at accelerating He(2+) than H(+). This result is consistent with previous estimates from shock acceleration theory and simulati ons.
NASA Astrophysics Data System (ADS)
Wang, Yihan; Lu, Tong; Wan, Wenbo; Liu, Lingling; Zhang, Songhe; Li, Jiao; Zhao, Huijuan; Gao, Feng
2018-02-01
To fully realize the potential of photoacoustic tomography (PAT) in preclinical and clinical applications, rapid measurements and robust reconstructions are needed. Sparse-view measurements have been adopted effectively to accelerate the data acquisition. However, since the reconstruction from the sparse-view sampling data is challenging, both of the effective measurement and the appropriate reconstruction should be taken into account. In this study, we present an iterative sparse-view PAT reconstruction scheme where a virtual parallel-projection concept matching for the proposed measurement condition is introduced to help to achieve the "compressive sensing" procedure of the reconstruction, and meanwhile the spatially adaptive filtering fully considering the a priori information of the mutually similar blocks existing in natural images is introduced to effectively recover the partial unknown coefficients in the transformed domain. Therefore, the sparse-view PAT images can be reconstructed with higher quality compared with the results obtained by the universal back-projection (UBP) algorithm in the same sparse-view cases. The proposed approach has been validated by simulation experiments, which exhibits desirable performances in image fidelity even from a small number of measuring positions.
High-Frequency Subband Compressed Sensing MRI Using Quadruplet Sampling
Sung, Kyunghyun; Hargreaves, Brian A
2013-01-01
Purpose To presents and validates a new method that formalizes a direct link between k-space and wavelet domains to apply separate undersampling and reconstruction for high- and low-spatial-frequency k-space data. Theory and Methods High- and low-spatial-frequency regions are defined in k-space based on the separation of wavelet subbands, and the conventional compressed sensing (CS) problem is transformed into one of localized k-space estimation. To better exploit wavelet-domain sparsity, CS can be used for high-spatial-frequency regions while parallel imaging can be used for low-spatial-frequency regions. Fourier undersampling is also customized to better accommodate each reconstruction method: random undersampling for CS and regular undersampling for parallel imaging. Results Examples using the proposed method demonstrate successful reconstruction of both low-spatial-frequency content and fine structures in high-resolution 3D breast imaging with a net acceleration of 11 to 12. Conclusion The proposed method improves the reconstruction accuracy of high-spatial-frequency signal content and avoids incoherent artifacts in low-spatial-frequency regions. This new formulation also reduces the reconstruction time due to the smaller problem size. PMID:23280540
Kalman filter techniques for accelerated Cartesian dynamic cardiac imaging.
Feng, Xue; Salerno, Michael; Kramer, Christopher M; Meyer, Craig H
2013-05-01
In dynamic MRI, spatial and temporal parallel imaging can be exploited to reduce scan time. Real-time reconstruction enables immediate visualization during the scan. Commonly used view-sharing techniques suffer from limited temporal resolution, and many of the more advanced reconstruction methods are either retrospective, time-consuming, or both. A Kalman filter model capable of real-time reconstruction can be used to increase the spatial and temporal resolution in dynamic MRI reconstruction. The original study describing the use of the Kalman filter in dynamic MRI was limited to non-Cartesian trajectories because of a limitation intrinsic to the dynamic model used in that study. Here the limitation is overcome, and the model is applied to the more commonly used Cartesian trajectory with fast reconstruction. Furthermore, a combination of the Kalman filter model with Cartesian parallel imaging is presented to further increase the spatial and temporal resolution and signal-to-noise ratio. Simulations and experiments were conducted to demonstrate that the Kalman filter model can increase the temporal resolution of the image series compared with view-sharing techniques and decrease the spatial aliasing compared with TGRAPPA. The method requires relatively little computation, and thus is suitable for real-time reconstruction. Copyright © 2012 Wiley Periodicals, Inc.
Kalman Filter Techniques for Accelerated Cartesian Dynamic Cardiac Imaging
Feng, Xue; Salerno, Michael; Kramer, Christopher M.; Meyer, Craig H.
2012-01-01
In dynamic MRI, spatial and temporal parallel imaging can be exploited to reduce scan time. Real-time reconstruction enables immediate visualization during the scan. Commonly used view-sharing techniques suffer from limited temporal resolution, and many of the more advanced reconstruction methods are either retrospective, time-consuming, or both. A Kalman filter model capable of real-time reconstruction can be used to increase the spatial and temporal resolution in dynamic MRI reconstruction. The original study describing the use of the Kalman filter in dynamic MRI was limited to non-Cartesian trajectories, because of a limitation intrinsic to the dynamic model used in that study. Here the limitation is overcome and the model is applied to the more commonly used Cartesian trajectory with fast reconstruction. Furthermore, a combination of the Kalman filter model with Cartesian parallel imaging is presented to further increase the spatial and temporal resolution and SNR. Simulations and experiments were conducted to demonstrate that the Kalman filter model can increase the temporal resolution of the image series compared with view sharing techniques and decrease the spatial aliasing compared with TGRAPPA. The method requires relatively little computation, and thus is suitable for real-time reconstruction. PMID:22926804
Kramer, Harald; Michaely, Henrik J; Matschl, Volker; Schmitt, Peter; Reiser, Maximilian F; Schoenberg, Stefan O
2007-06-01
Recent developments in hard- and software help to significantly increase image quality of magnetic resonance angiography (MRA). Parallel acquisition techniques (PAT) help to increase spatial resolution and to decrease acquisition time but also suffer from a decrease in signal-to-noise ratio (SNR). The movement to higher field strength and the use of dedicated angiography coils can further increase spatial resolution while decreasing acquisition times at the same SNR as it is known from contemporary exams. The goal of our study was to compare the image quality of MRA datasets acquired with a standard matrix coil in comparison to MRA datasets acquired with a dedicated peripheral angio matrix coil and higher factors of parallel imaging. Before the first volunteer examination, unaccelerated phantom measurements were performed with the different coils. After institutional review board approval, 15 healthy volunteers underwent MRA of the lower extremity on a 32 channel 3.0 Tesla MR System. In 5 of them MRA of the calves was performed with a PAT acceleration factor of 2 and a standard body-matrix surface coil placed at the legs. Ten volunteers underwent MRA of the calves with a dedicated 36-element angiography matrix coil: 5 with a PAT acceleration of 3 and 5 with a PAT acceleration factor of 4, respectively. The acquired volume and acquisition time was approximately the same in all examinations, only the spatial resolution was increased with the acceleration factor. The acquisition time per voxel was calculated. Image quality was rated independently by 2 readers in terms of vessel conspicuity, venous overlay, and occurrence of artifacts. The inter-reader agreement was calculated by the kappa-statistics. SNR and contrast-to-noise ratios from the different examinations were evaluated. All 15 volunteers completed the examination, no adverse events occurred. None of the examinations showed venous overlay; 70% of the examinations showed an excellent vessel conspicuity, whereas in 50% of the examinations artifacts occurred. All of these artifacts were judged as none disturbing. Inter-reader agreement was good with kappa values ranging between 0.65 and 0.74. SNR and contrast-to-noise ratios did not show significant differences. Implementation of a dedicated coil for peripheral MRA at 3.0 Tesla helps to increase spatial resolution and to decrease acquisition time while the image quality could be kept equal. Venous overlay can be effectively avoided despite the use of high-resolution scans.
NASA Astrophysics Data System (ADS)
Dey, T.; Rodrigue, P.
2015-07-01
We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination showed that a reasonable speedup of sensitivity map calculation could be achieved on the Xeon Phi either by a portable or a hardware specific implementation.
Nael, Kambiz; Fenchel, Michael; Krishnam, Mayil; Finn, J Paul; Laub, Gerhard; Ruehm, Stefan G
2007-06-01
To evaluate the technical feasibility of high spatial resolution contrast-enhanced magnetic resonance angiography (CE-MRA) with highly accelerated parallel acquisition at 3.0 T using a 32-channel phased array coil, and a high relaxivity contrast agent. Ten adult healthy volunteers (5 men, 5 women, aged 21-66 years) underwent high spatial resolution CE-MRA of the pulmonary circulation. Imaging was performed at 3 T using a 32-channel phase array coil. After intravenous injection of 1 mL of gadobenate dimeglumine (Gd-BOPTA) at 1.5 mL/s, a timing bolus was used to measure the transit time from the arm vein to the main pulmonary artery. Subsequently following intravenous injection of 0.1 mmol/kg of Gd-BOPTA at the same rate, isotropic high spatial resolution data sets (1 x 1 x 1 mm3) CE-MRA of the entire pulmonary circulation were acquired using a fast gradient-recalled echo sequence (TR/TE 3/1.2 milliseconds, FA 18 degrees) and highly accelerated parallel acquisition (GRAPPA x 6) during a 20-second breath hold. The presence of artifact, noise, and image quality of the pulmonary arterial segments were evaluated independently by 2 radiologists. Phantom measurements were performed to assess the signal-to-noise ratio (SNR). Statistical analysis of data was performed by using Wilcoxon rank sum test and 2-sample Student t test. The interobserver variability was tested by kappa coefficient. All studies were of diagnostic quality as determined by both observers. The pulmonary arteries were routinely identified up to fifth-order branches, with definition in the diagnostic range and excellent interobserver agreement (kappa = 0.84, 95% confidence interval 0.77-0.90). Phantom measurements showed significantly lower SNR (P < 0.01) using GRAPPA (17.3 +/- 18.8) compared with measurements without parallel acquisition (58 +/- 49.4). The described 3 T CE-MRA protocol in addition to high T1 relaxivity of Gd-BOPTA provides sufficient SNR to support highly accelerated parallel acquisition (GRAPPA x 6), resulting in acquisition of isotopic (1 x 1 x 1 mm3) voxels over the entire pulmonary circulation in 20 seconds.
Accelerating image recognition on mobile devices using GPGPU
NASA Astrophysics Data System (ADS)
Bordallo López, Miguel; Nykänen, Henri; Hannuksela, Jari; Silvén, Olli; Vehviläinen, Markku
2011-01-01
The future multi-modal user interfaces of battery-powered mobile devices are expected to require computationally costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for parallel processing and the addition of programmable stages and high precision arithmetic provide for opportunities to implement energy-efficient complete algorithms. At the moment the first mobile graphics accelerators with programmable pipelines are available, enabling the GPGPU implementation of several image processing algorithms. In this context, we consider a face tracking approach that uses efficient gray-scale invariant texture features and boosting. The solution is based on the Local Binary Pattern (LBP) features and makes use of the GPU on the pre-processing and feature extraction phase. We have implemented a series of image processing techniques in the shader language of OpenGL ES 2.0, compiled them for a mobile graphics processing unit and performed tests on a mobile application processor platform (OMAP3530). In our contribution, we describe the challenges of designing on a mobile platform, present the performance achieved and provide measurement results for the actual power consumption in comparison to using the CPU (ARM) on the same platform.
Jiansen Li; Jianqi Sun; Ying Song; Yanran Xu; Jun Zhao
2014-01-01
An effective way to improve the data acquisition speed of magnetic resonance imaging (MRI) is using under-sampled k-space data, and dictionary learning method can be used to maintain the reconstruction quality. Three-dimensional dictionary trains the atoms in dictionary in the form of blocks, which can utilize the spatial correlation among slices. Dual-dictionary learning method includes a low-resolution dictionary and a high-resolution dictionary, for sparse coding and image updating respectively. However, the amount of data is huge for three-dimensional reconstruction, especially when the number of slices is large. Thus, the procedure is time-consuming. In this paper, we first utilize the NVIDIA Corporation's compute unified device architecture (CUDA) programming model to design the parallel algorithms on graphics processing unit (GPU) to accelerate the reconstruction procedure. The main optimizations operate in the dictionary learning algorithm and the image updating part, such as the orthogonal matching pursuit (OMP) algorithm and the k-singular value decomposition (K-SVD) algorithm. Then we develop another version of CUDA code with algorithmic optimization. Experimental results show that more than 324 times of speedup is achieved compared with the CPU-only codes when the number of MRI slices is 24.
GPU-based Branchless Distance-Driven Projection and Backprojection
Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong
2017-01-01
Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm. PMID:29333480
GPU-based Branchless Distance-Driven Projection and Backprojection.
Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong
2017-12-01
Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm.
NASA Astrophysics Data System (ADS)
Takashima, Ichiro; Kajiwara, Riichi; Murano, Kiyo; Iijima, Toshio; Morinaka, Yasuhiro; Komobuchi, Hiroyoshi
2001-04-01
We have designed and built a high-speed CCD imaging system for monitoring neural activity in an exposed animal cortex stained with a voltage-sensitive dye. Two types of custom-made CCD sensors were developed for this system. The type I chip has a resolution of 2664 (H) X 1200 (V) pixels and a wide imaging area of 28.1 X 13.8 mm, while the type II chip has 1776 X 1626 pixels and an active imaging area of 20.4 X 18.7 mm. The CCD arrays were constructed with multiple output amplifiers in order to accelerate the readout rate. The two chips were divided into either 24 (I) or 16 (II) distinct areas that were driven in parallel. The parallel CCD outputs were digitized by 12-bit A/D converters and then stored in the frame memory. The frame memory was constructed with synchronous DRAM modules, which provided a capacity of 128 MB per channel. On-chip and on-memory binning methods were incorporated into the system, e.g., this enabled us to capture 444 X 200 pixel-images for periods of 36 seconds at a rate of 500 frames/second. This system was successfully used to visualize neural activity in the cortices of rats, guinea pigs, and monkeys.
Three-dimensional through-time radial GRAPPA for renal MR angiography.
Wright, Katherine L; Lee, Gregory R; Ehses, Philipp; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole
2014-10-01
To achieve high temporal and spatial resolution for contrast-enhanced time-resolved MR angiography exams (trMRAs), fast imaging techniques such as non-Cartesian parallel imaging must be used. In this study, the three-dimensional (3D) through-time radial generalized autocalibrating partially parallel acquisition (GRAPPA) method is used to reconstruct highly accelerated stack-of-stars data for time-resolved renal MRAs. Through-time radial GRAPPA has been recently introduced as a method for non-Cartesian GRAPPA weight calibration, and a similar concept can also be used in 3D acquisitions. By combining different sources of calibration information, acquisition time can be reduced. Here, different GRAPPA weight calibration schemes are explored in simulation, and the results are applied to reconstruct undersampled stack-of-stars data. Simulations demonstrate that an accurate and efficient approach to 3D calibration is to combine a small number of central partitions with as many temporal repetitions as exam time permits. These findings were used to reconstruct renal trMRA data with an in-plane acceleration factor as high as 12.6 with respect to the Nyquist sampling criterion, where the lowest root mean squared error value of 16.4% was achieved when using a calibration scheme with 8 partitions, 16 repetitions, and a 4 projection × 8 read point segment size. 3D through-time radial GRAPPA can be used to successfully reconstruct highly accelerated non-Cartesian data. By using in-plane radial undersampling, a trMRA can be acquired with a temporal footprint less than 4s/frame with a spatial resolution of approximately 1.5 mm × 1.5 mm × 3 mm. © 2014 Wiley Periodicals, Inc.
F3D Image Processing and Analysis for Many - and Multi-core Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
F3D is written in OpenCL, so it achieve[sic] platform-portable parallelism on modern mutli-core CPUs and many-core GPUs. The interface and mechanims to access F3D core are written in Java as a plugin for Fiji/ImageJ to deliver several key image-processing algorithms necessary to remove artifacts from micro-tomography data. The algorithms consist of data parallel aware filters that can efficiently utilizes[sic] resources and can work on out of core datasets and scale efficiently across multiple accelerators. Optimizing for data parallel filters, streaming out of core datasets, and efficient resource and memory and data managements over complex execution sequence of filters greatly expeditesmore » any scientific workflow with image processing requirements. F3D performs several different types of 3D image processing operations, such as non-linear filtering using bilateral filtering and/or median filtering and/or morphological operators (MM). F3D gray-level MM operators are one-pass constant time methods that can perform morphological transformations with a line-structuring element oriented in discrete directions. Additionally, MM operators can be applied to gray-scale images, and consist of two parts: (a) a reference shape or structuring element, which is translated over the image, and (b) a mechanism, or operation, that defines the comparisons to be performed between the image and the structuring element. This tool provides a critical component within many complex pipelines such as those for performing automated segmentation of image stacks. F3D is also called a "descendent" of Quant-CT, another software we developed in the past. These two modules are to be integrated in a next version. Further details were reported in: D.M. Ushizima, T. Perciano, H. Krishnan, B. Loring, H. Bale, D. Parkinson, and J. Sethian. Structure recognition from high-resolution images of ceramic composites. IEEE International Conference on Big Data, October 2014.« less
Electric currents and voltage drops along auroral field lines
NASA Technical Reports Server (NTRS)
Stern, D. P.
1983-01-01
An assessment is presented of the current state of knowledge concerning Birkeland currents and the parallel electric field, with discussions focusing on the Birkeland primary region 1 sheets, the region 2 sheets which parallel them and appear to close in the partial ring current, the cusp currents (which may be correlated with the interplanetary B(y) component), and the Harang filament. The energy required by the parallel electric field and the associated particle acceleration processes appears to be derived from the Birkeland currents, for which evidence is adduced from particles, inverted V spectra, rising ion beams and expanded loss cones. Conics may on the other hand signify acceleration by electrostatic ion cyclotron waves associated with beams accelerated by the parallel electric field.
Doulgerakis, Matthaios; Eggebrecht, Adam; Wojtkiewicz, Stanislaw; Culver, Joseph; Dehghani, Hamid
2017-12-01
Parameter recovery in diffuse optical tomography is a computationally expensive algorithm, especially when used for large and complex volumes, as in the case of human brain functional imaging. The modeling of light propagation, also known as the forward problem, is the computational bottleneck of the recovery algorithm, whereby the lack of a real-time solution is impeding practical and clinical applications. The objective of this work is the acceleration of the forward model, within a diffusion approximation-based finite-element modeling framework, employing parallelization to expedite the calculation of light propagation in realistic adult head models. The proposed methodology is applicable for modeling both continuous wave and frequency-domain systems with the results demonstrating a 10-fold speed increase when GPU architectures are available, while maintaining high accuracy. It is shown that, for a very high-resolution finite-element model of the adult human head with ∼600,000 nodes, consisting of heterogeneous layers, light propagation can be calculated at ∼0.25 s/excitation source. (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
NASA Astrophysics Data System (ADS)
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke
2018-01-01
Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.
Turbulence Evolution and Shock Acceleration of Solar Energetic Particles
NASA Technical Reports Server (NTRS)
Chee, Ng K.
2007-01-01
We model the effects of self-excitation/damping and shock transmission of Alfven waves on solar-energetic-particle (SEP) acceleration at a coronal-mass-ejection (CME) driven parallel shock. SEP-excited outward upstream waves speedily bootstrap acceleration. Shock transmission further raises the SEP-excited wave intensities at high wavenumbers but lowers them at low wavenumbers through wavenumber shift. Downstream, SEP excitation of inward waves and damping of outward waves tend to slow acceleration. Nevertheless, > 2000 km/s parallel shocks at approx. 3.5 solar radii can accelerate SEPs to 100 MeV in < 5 minutes.
GPU accelerated FDTD solver and its application in MRI.
Chi, J; Liu, F; Jin, J; Mason, D G; Crozier, S
2010-01-01
The finite difference time domain (FDTD) method is a popular technique for computational electromagnetics (CEM). The large computational power often required, however, has been a limiting factor for its applications. In this paper, we will present a graphics processing unit (GPU)-based parallel FDTD solver and its successful application to the investigation of a novel B1 shimming scheme for high-field magnetic resonance imaging (MRI). The optimized shimming scheme exhibits considerably improved transmit B(1) profiles. The GPU implementation dramatically shortened the runtime of FDTD simulation of electromagnetic field compared with its CPU counterpart. The acceleration in runtime has made such investigation possible, and will pave the way for other studies of large-scale computational electromagnetic problems in modern MRI which were previously impractical.
Gyrokinetic theory of turbulent acceleration and momentum conservation in tokamak plasmas
NASA Astrophysics Data System (ADS)
Lu, WANG; Shuitao, PENG; P, H. DIAMOND
2018-07-01
Understanding the generation of intrinsic rotation in tokamak plasmas is crucial for future fusion reactors such as ITER. We proposed a new mechanism named turbulent acceleration for the origin of the intrinsic parallel rotation based on gyrokinetic theory. The turbulent acceleration acts as a local source or sink of parallel rotation, i.e., volume force, which is different from the divergence of residual stress, i.e., surface force. However, the order of magnitude of turbulent acceleration can be comparable to that of the divergence of residual stress for electrostatic ion temperature gradient (ITG) turbulence. A possible theoretical explanation for the experimental observation of electron cyclotron heating induced decrease of co-current rotation was also proposed via comparison between the turbulent acceleration driven by ITG turbulence and that driven by collisionless trapped electron mode turbulence. We also extended this theory to electromagnetic ITG turbulence and investigated the electromagnetic effects on intrinsic parallel rotation drive. Finally, we demonstrated that the presence of turbulent acceleration does not conflict with momentum conservation.
Miao, Jun; Wong, Wilbur C K; Narayan, Sreenath; Wilson, David L
2011-11-01
Partially parallel imaging (PPI) greatly accelerates MR imaging by using surface coil arrays and under-sampling k-space. However, the reduction factor (R) in PPI is theoretically constrained by the number of coils (N(C)). A symmetrically shaped kernel is typically used, but this often prevents even the theoretically possible R from being achieved. Here, the authors propose a kernel design method to accelerate PPI faster than R = N(C). K-space data demonstrates an anisotropic pattern that is correlated with the object itself and to the asymmetry of the coil sensitivity profile, which is caused by coil placement and B(1) inhomogeneity. From spatial analysis theory, reconstruction of such pattern is best achieved by a signal-dependent anisotropic shape kernel. As a result, the authors propose the use of asymmetric kernels to improve k-space reconstruction. The authors fit a bivariate Gaussian function to the local signal magnitude of each coil, then threshold this function to extract the kernel elements. A perceptual difference model (Case-PDM) was employed to quantitatively evaluate image quality. A MR phantom experiment showed that k-space anisotropy increased as a function of magnetic field strength. The authors tested a K-spAce Reconstruction with AnisOtropic KErnel support ("KARAOKE") algorithm with both MR phantom and in vivo data sets, and compared the reconstructions to those produced by GRAPPA, a popular PPI reconstruction method. By exploiting k-space anisotropy, KARAOKE was able to better preserve edges, which is particularly useful for cardiac imaging and motion correction, while GRAPPA failed at a high R near or exceeding N(C). KARAOKE performed comparably to GRAPPA at low Rs. As a rule of thumb, KARAOKE reconstruction should always be used for higher quality k-space reconstruction, particularly when PPI data is acquired at high Rs and/or high field strength.
Miao, Jun; Wong, Wilbur C. K.; Narayan, Sreenath; Wilson, David L.
2011-01-01
Purpose: Partially parallel imaging (PPI) greatly accelerates MR imaging by using surface coil arrays and under-sampling k-space. However, the reduction factor (R) in PPI is theoretically constrained by the number of coils (NC). A symmetrically shaped kernel is typically used, but this often prevents even the theoretically possible R from being achieved. Here, the authors propose a kernel design method to accelerate PPI faster than R = NC. Methods: K-space data demonstrates an anisotropic pattern that is correlated with the object itself and to the asymmetry of the coil sensitivity profile, which is caused by coil placement and B1 inhomogeneity. From spatial analysis theory, reconstruction of such pattern is best achieved by a signal-dependent anisotropic shape kernel. As a result, the authors propose the use of asymmetric kernels to improve k-space reconstruction. The authors fit a bivariate Gaussian function to the local signal magnitude of each coil, then threshold this function to extract the kernel elements. A perceptual difference model (Case-PDM) was employed to quantitatively evaluate image quality. Results: A MR phantom experiment showed that k-space anisotropy increased as a function of magnetic field strength. The authors tested a K-spAce Reconstruction with AnisOtropic KErnel support (“KARAOKE”) algorithm with both MR phantom and in vivo data sets, and compared the reconstructions to those produced by GRAPPA, a popular PPI reconstruction method. By exploiting k-space anisotropy, KARAOKE was able to better preserve edges, which is particularly useful for cardiac imaging and motion correction, while GRAPPA failed at a high R near or exceeding NC. KARAOKE performed comparably to GRAPPA at low Rs. Conclusions: As a rule of thumb, KARAOKE reconstruction should always be used for higher quality k-space reconstruction, particularly when PPI data is acquired at high Rs and∕or high field strength. PMID:22047378
Efficient Scalable Median Filtering Using Histogram-Based Operations.
Green, Oded
2018-05-01
Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, and , comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.
Towards real-time thermometry using simultaneous multislice MRI
NASA Astrophysics Data System (ADS)
Borman, P. T. S.; Bos, C.; de Boorder, T.; Raaymakers, B. W.; Moonen, C. T. W.; Crijns, S. P. M.
2016-09-01
MR-guided thermal therapies, such as high-intensity focused ultrasound (MRgHIFU) and laser-induced thermal therapy (MRgLITT) are increasingly being applied in oncology and neurology. MRI is used for guidance since it can measure temperature noninvasively based on the proton resonance frequency shift (PRFS). For therapy guidance using PRFS thermometry, high temporal resolution and large spatial coverage are desirable. We propose to use the parallel imaging technique simultaneous multislice (SMS) in combination with controlled aliasing (CAIPIRINHA) to accelerate the acquisition. We compare this with the sensitivity encoding (SENSE) acceleration technique. Two experiments were performed to validate that SMS can be used to increase the spatial coverage or the temporal resolution. The first was performed in agar gel using LITT heating and a gradient-echo sequence with echo-planar imaging (EPI), and the second was performed in bovine muscle using HIFU heating and a gradient-echo sequence without EPI. In both experiments temperature curves from an unaccelerated scan and from SMS, SENSE, and SENSE/SMS accelerated scans were compared. The precision was quantified by a standard deviation analysis of scans without heating. Both experiments showed a good agreement between the temperature curves obtained from the unaccelerated, and SMS accelerated scans, confirming that accuracy was maintained during SMS acceleration. The standard deviations of the temperature measurements obtained with SMS were significantly smaller than when SENSE was used, implying that SMS allows for higher acceleration. In the LITT and HIFU experiments SMS factors up to 4 and 3 were reached, respectively, with a loss of precision of less than a factor of 3. Based on these results we conclude that SMS acceleration of PRFS thermometry is a valuable addition to SENSE, because it allows for a higher temporal resolution or bigger spatial coverage, with a higher precision.
Are supernova remnants quasi-parallel or quasi-perpendicular accelerators
NASA Technical Reports Server (NTRS)
Spangler, S. R.; Leckband, J. A.; Cairns, I. H.
1989-01-01
Observations of shock waves in the solar system which show a pronounced difference in the plasma wave and particle environment depending on whether the shock is propagating along or perpendicular to the interplanetary magnetic field are discussed. Theories for particle acceleration developed for quasi-parallel and quasi-perpendicular shocks, when extended to the interstellar medium suggest that the relativistic electrons in radio supernova remnants are accelerated by either the Q parallel or Q perpendicular mechanisms. A model for the galactic magnetic field and published maps of supernova remnants were used to search for a dependence of structure on the angle Phi. Results show no tendency for the remnants as a whole to favor the relationship expected for either mechanism, although individual sources resemble model remnants of one or the other acceleration process.
New Imaging Strategies Using a Motion-Resistant Liver Sequence in Uncooperative Patients
Kim, Bong Soo; Lee, Kyung Ryeol; Goh, Myeng Ju
2014-01-01
MR imaging has unique benefits for evaluating the liver because of its high-resolution capability and ability to permit detailed assessment of anatomic lesions. In uncooperative patients, motion artifacts can impair the image quality and lead to the loss of diagnostic information. In this setting, the recent advances in motion-resistant liver MR techniques, including faster imaging protocols (e.g., dual-echo magnetization-prepared rapid-acquisition gradient echo (MP-RAGE), view-sharing technique), the data under-sampling (e.g., gradient recalled echo (GRE) with controlled aliasing in parallel imaging results in higher acceleration (CAIPIRINHA), single-shot echo-train spin-echo (SS-ETSE)), and motion-artifact minimization method (e.g., radial GRE with/without k-space-weighted image contrast (KWIC)), can provide consistent, artifact-free images with adequate image quality and can lead to promising diagnostic performance. Understanding of the different motion-resistant options allows radiologists to adopt the most appropriate technique for their clinical practice and thereby significantly improve patient care. PMID:25243115
Self-Calibrating Wave-Encoded Variable-Density Single-Shot Fast Spin Echo Imaging.
Chen, Feiyu; Taviani, Valentina; Tamir, Jonathan I; Cheng, Joseph Y; Zhang, Tao; Song, Qiong; Hargreaves, Brian A; Pauly, John M; Vasanawala, Shreyas S
2018-04-01
It is highly desirable in clinical abdominal MR scans to accelerate single-shot fast spin echo (SSFSE) imaging and reduce blurring due to T 2 decay and partial-Fourier acquisition. To develop and investigate the clinical feasibility of wave-encoded variable-density SSFSE imaging for improved image quality and scan time reduction. Prospective controlled clinical trial. With Institutional Review Board approval and informed consent, the proposed method was assessed on 20 consecutive adult patients (10 male, 10 female, range, 24-84 years). A wave-encoded variable-density SSFSE sequence was developed for clinical 3.0T abdominal scans to enable high acceleration (3.5×) with full-Fourier acquisitions by: 1) introducing wave encoding with self-refocusing gradient waveforms to improve acquisition efficiency; 2) developing self-calibrated estimation of wave-encoding point-spread function and coil sensitivity to improve motion robustness; and 3) incorporating a parallel imaging and compressed sensing reconstruction to reconstruct highly accelerated datasets. Image quality was compared pairwise with standard Cartesian acquisition independently and blindly by two radiologists on a scale from -2 to 2 for noise, contrast, confidence, sharpness, and artifacts. The average ratio of scan time between these two approaches was also compared. A Wilcoxon signed-rank tests with a P value under 0.05 considered statistically significant. Wave-encoded variable-density SSFSE significantly reduced the perceived noise level and improved the sharpness of the abdominal wall and the kidneys compared with standard acquisition (mean scores 0.8, 1.2, and 0.8, respectively, P < 0.003). No significant difference was observed in relation to other features (P = 0.11). An average of 21% decrease in scan time was achieved using the proposed method. Wave-encoded variable-density sampling SSFSE achieves improved image quality with clinically relevant echo time and reduced scan time, thus providing a fast and robust approach for clinical SSFSE imaging. 1 Technical Efficacy: Stage 6 J. Magn. Reson. Imaging 2018;47:954-966. © 2017 International Society for Magnetic Resonance in Medicine.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuo, Wangda; McNeil, Andrew; Wetter, Michael
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Kwon, Heejin; Reid, Scott; Kim, Dongeun; Lee, Sangyun; Cho, Jinhan; Oh, Jongyeong
2018-01-04
This study aimed to evaluate image quality and diagnostic performance of a recently developed navigated three-dimensional magnetic resonance cholangiopancreatography (3D-MRCP) with compressed sensing (CS) based on parallel imaging (PI) and conventional 3D-MRCP with PI only in patients with abnormal bile duct dilatation. This institutional review board-approved study included 45 consecutive patients [non-malignant common bile duct lesions (n = 21) and malignant common bile duct lesions (n = 24)] who underwent MRCP of the abdomen to evaluate bile duct dilatation. All patients were imaged at 3T (MR 750, GE Healthcare, Waukesha, WI) including two kinds of 3D-MRCP using 352 × 288 matrices with and without CS based on PI. Two radiologists independently and blindly assessed randomized images. CS acceleration reduced the acquisition time on average 5 min and 6 s to a total of 2 min and 56 s. The all CS cine image quality was significantly higher than standard cine MR image for all quantitative measurements. Diagnostic accuracy for benign and malignant lesions is statistically different between standard and CS 3D-MRCP. Total image quality and diagnostic accuracy at biliary obstruction evaluation demonstrates that CS-accelerated 3D-MRCP sequences can provide superior quality of diagnostic information in 42.5% less time. This has the potential to reduce motion-related artifacts and improve diagnostic efficacy.
Utilizing GPUs to Accelerate Turbomachinery CFD Codes
NASA Technical Reports Server (NTRS)
MacCalla, Weylin; Kulkarni, Sameer
2016-01-01
GPU computing has established itself as a way to accelerate parallel codes in the high performance computing world. This work focuses on speeding up APNASA, a legacy CFD code used at NASA Glenn Research Center, while also drawing conclusions about the nature of GPU computing and the requirements to make GPGPU worthwhile on legacy codes. Rewriting and restructuring of the source code was avoided to limit the introduction of new bugs. The code was profiled and investigated for parallelization potential, then OpenACC directives were used to indicate parallel parts of the code. The use of OpenACC directives was not able to reduce the runtime of APNASA on either the NVIDIA Tesla discrete graphics card, or the AMD accelerated processing unit. Additionally, it was found that in order to justify the use of GPGPU, the amount of parallel work being done within a kernel would have to greatly exceed the work being done by any one portion of the APNASA code. It was determined that in order for an application like APNASA to be accelerated on the GPU, it should not be modular in nature, and the parallel portions of the code must contain a large portion of the code's computation time.
NASA Astrophysics Data System (ADS)
Hollingsworth, Kieren Grant
2015-11-01
MRI is often the most sensitive or appropriate technique for important measurements in clinical diagnosis and research, but lengthy acquisition times limit its use due to cost and considerations of patient comfort and compliance. Once an image field of view and resolution is chosen, the minimum scan acquisition time is normally fixed by the amount of raw data that must be acquired to meet the Nyquist criteria. Recently, there has been research interest in using the theory of compressed sensing (CS) in MR imaging to reduce scan acquisition times. The theory argues that if our target MR image is sparse, having signal information in only a small proportion of pixels (like an angiogram), or if the image can be mathematically transformed to be sparse then it is possible to use that sparsity to recover a high definition image from substantially less acquired data. This review starts by considering methods of k-space undersampling which have already been incorporated into routine clinical imaging (partial Fourier imaging and parallel imaging), and then explains the basis of using compressed sensing in MRI. The practical considerations of applying CS to MRI acquisitions are discussed, such as designing k-space undersampling schemes, optimizing adjustable parameters in reconstructions and exploiting the power of combined compressed sensing and parallel imaging (CS-PI). A selection of clinical applications that have used CS and CS-PI prospectively are considered. The review concludes by signposting other imaging acceleration techniques under present development before concluding with a consideration of the potential impact and obstacles to bringing compressed sensing into routine use in clinical MRI.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing.
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-08
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4_ speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration.
Freiberger, Manuel; Egger, Herbert; Liebmann, Manfred; Scharfetter, Hermann
2011-11-01
Image reconstruction in fluorescence optical tomography is a three-dimensional nonlinear ill-posed problem governed by a system of partial differential equations. In this paper we demonstrate that a combination of state of the art numerical algorithms and a careful hardware optimized implementation allows to solve this large-scale inverse problem in a few seconds on standard desktop PCs with modern graphics hardware. In particular, we present methods to solve not only the forward but also the non-linear inverse problem by massively parallel programming on graphics processors. A comparison of optimized CPU and GPU implementations shows that the reconstruction can be accelerated by factors of about 15 through the use of the graphics hardware without compromising the accuracy in the reconstructed images.
Flight Results from the HST SM4 Relative Navigation Sensor System
NASA Technical Reports Server (NTRS)
Naasz, Bo; Eepoel, John Van; Queen, Steve; Southward, C. Michael; Hannah, Joel
2010-01-01
On May 11, 2009, Space Shuttle Atlantis roared off of Launch Pad 39A enroute to the Hubble Space Telescope (HST) to undertake its final servicing of HST, Servicing Mission 4. Onboard Atlantis was a small payload called the Relative Navigation Sensor experiment, which included three cameras of varying focal ranges, avionics to record images and estimate, in real time, the relative position and attitude (aka "pose") of the telescope during rendezvous and deploy. The avionics package, known as SpaceCube and developed at the Goddard Space Flight Center, performed image processing using field programmable gate arrays to accelerate this process, and in addition executed two different pose algorithms in parallel, the Goddard Natural Feature Image Recognition and the ULTOR Passive Pose and Position Engine (P3E) algorithms
Measuring signal-to-noise ratio in partially parallel imaging MRI
Goerner, Frank L.; Clarke, Geoffrey D.
2011-01-01
Purpose: To assess five different methods of signal-to-noise ratio (SNR) measurement for partially parallel imaging (PPI) acquisitions. Methods: Measurements were performed on a spherical phantom and three volunteers using a multichannel head coil a clinical 3T MRI system to produce echo planar, fast spin echo, gradient echo, and balanced steady state free precession image acquisitions. Two different PPI acquisitions, generalized autocalibrating partially parallel acquisition algorithm and modified sensitivity encoding with acceleration factors (R) of 2–4, were evaluated and compared to nonaccelerated acquisitions. Five standard SNR measurement techniques were investigated and Bland–Altman analysis was used to determine agreement between the various SNR methods. The estimated g-factor values, associated with each method of SNR calculation and PPI reconstruction method, were also subjected to assessments that considered the effects on SNR due to reconstruction method, phase encoding direction, and R-value. Results: Only two SNR measurement methods produced g-factors in agreement with theoretical expectations (g ≥ 1). Bland–Altman tests demonstrated that these two methods also gave the most similar results relative to the other three measurements. R-value was the only factor of the three we considered that showed significant influence on SNR changes. Conclusions: Non-signal methods used in SNR evaluation do not produce results consistent with expectations in the investigated PPI protocols. Two of the methods studied provided the most accurate and useful results. Of these two methods, it is recommended, when evaluating PPI protocols, the image subtraction method be used for SNR calculations due to its relative accuracy and ease of implementation. PMID:21978049
Zhou, Lili; Clifford Chao, K S; Chang, Jenghwa
2012-11-01
Simulated projection images of digital phantoms constructed from CT scans have been widely used for clinical and research applications but their quality and computation speed are not optimal for real-time comparison with the radiography acquired with an x-ray source of different energies. In this paper, the authors performed polyenergetic forward projections using open computing language (OpenCL) in a parallel computing ecosystem consisting of CPU and general purpose graphics processing unit (GPGPU) for fast and realistic image formation. The proposed polyenergetic forward projection uses a lookup table containing the NIST published mass attenuation coefficients (μ∕ρ) for different tissue types and photon energies ranging from 1 keV to 20 MeV. The CT images of interested sites are first segmented into different tissue types based on the CT numbers and converted to a three-dimensional attenuation phantom by linking each voxel to the corresponding tissue type in the lookup table. The x-ray source can be a radioisotope or an x-ray generator with a known spectrum described as weight w(n) for energy bin E(n). The Siddon method is used to compute the x-ray transmission line integral for E(n) and the x-ray fluence is the weighted sum of the exponential of line integral for all energy bins with added Poisson noise. To validate this method, a digital head and neck phantom constructed from the CT scan of a Rando head phantom was segmented into three (air, gray∕white matter, and bone) regions for calculating the polyenergetic projection images for the Mohan 4 MV energy spectrum. To accelerate the calculation, the authors partitioned the workloads using the task parallelism and data parallelism and scheduled them in a parallel computing ecosystem consisting of CPU and GPGPU (NVIDIA Tesla C2050) using OpenCL only. The authors explored the task overlapping strategy and the sequential method for generating the first and subsequent DRRs. A dispatcher was designed to drive the high-degree parallelism of the task overlapping strategy. Numerical experiments were conducted to compare the performance of the OpenCL∕GPGPU-based implementation with the CPU-based implementation. The projection images were similar to typical portal images obtained with a 4 or 6 MV x-ray source. For a phantom size of 512 × 512 × 223, the time for calculating the line integrals for a 512 × 512 image panel was 16.2 ms on GPGPU for one energy bin in comparison to 8.83 s on CPU. The total computation time for generating one polyenergetic projection image of 512 × 512 was 0.3 s (141 s for CPU). The relative difference between the projection images obtained with the CPU-based and OpenCL∕GPGPU-based implementations was on the order of 10(-6) and was virtually indistinguishable. The task overlapping strategy was 5.84 and 1.16 times faster than the sequential method for the first and the subsequent digitally reconstruction radiographies, respectively. The authors have successfully built digital phantoms using anatomic CT images and NIST μ∕ρ tables for simulating realistic polyenergetic projection images and optimized the processing speed with parallel computing using GPGPU∕OpenCL-based implementation. The computation time was fast (0.3 s per projection image) enough for real-time IGRT (image-guided radiotherapy) applications.
ION ACCELERATION AT THE QUASI-PARALLEL BOW SHOCK: DECODING THE SIGNATURE OF INJECTION
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sundberg, Torbjörn; Haynes, Christopher T.; Burgess, D.
Collisionless shocks are efficient particle accelerators. At Earth, ions with energies exceeding 100 keV are seen upstream of the bow shock when the magnetic geometry is quasi-parallel, and large-scale supernova remnant shocks can accelerate ions into cosmic-ray energies. This energization is attributed to diffusive shock acceleration; however, for this process to become active, the ions must first be sufficiently energized. How and where this initial acceleration takes place has been one of the key unresolved issues in shock acceleration theory. Using Cluster spacecraft observations, we study the signatures of ion reflection events in the turbulent transition layer upstream of the terrestrial bowmore » shock, and with the support of a hybrid simulation of the shock, we show that these reflection signatures are characteristic of the first step in the ion injection process. These reflection events develop in particular in the region where the trailing edge of large-amplitude upstream waves intercept the local shock ramp and the upstream magnetic field changes from quasi-perpendicular to quasi-parallel. The dispersed ion velocity signature observed can be attributed to a rapid succession of ion reflections at this wave boundary. After the ions’ initial interaction with the shock, they flow upstream along the quasi-parallel magnetic field. Each subsequent wavefront in the upstream region will sweep the ions back toward the shock, where they gain energy with each transition between the upstream and the shock wave frames. Within three to five gyroperiods, some ions have gained enough parallel velocity to escape upstream, thus completing the injection process.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators
Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew
2014-01-01
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Convolution of large 3D images on GPU and its decomposition
NASA Astrophysics Data System (ADS)
Karas, Pavel; Svoboda, David
2011-12-01
In this article, we propose a method for computing convolution of large 3D images. The convolution is performed in a frequency domain using a convolution theorem. The algorithm is accelerated on a graphic card by means of the CUDA parallel computing model. Convolution is decomposed in a frequency domain using the decimation in frequency algorithm. We pay attention to keeping our approach efficient in terms of both time and memory consumption and also in terms of memory transfers between CPU and GPU which have a significant inuence on overall computational time. We also study the implementation on multiple GPUs and compare the results between the multi-GPU and multi-CPU implementations.
NASA Astrophysics Data System (ADS)
Wang, Huanyu; Lu, Quanming; Huang, Can; Wang, Shui
2017-05-01
Secondary magnetic islands may be generated in the vicinity of an X line during magnetic reconnection. In this paper, by performing two-dimensional (2-D) particle-in-cell simulations, we investigate the role of a secondary magnetic island in electron acceleration during magnetic reconnection with a guide field. The electron motions are found to be adiabatic, and we analyze the contributions of the parallel electric field and Fermi and betatron mechanisms to electron acceleration in the secondary island during the evolution of magnetic reconnection. When the secondary island is formed, electrons are accelerated by the parallel electric field due to the existence of the reconnection electric field in the electron current sheet. Electrons can be accelerated by both the parallel electric field and Fermi mechanism when the secondary island begins to merge with the primary magnetic island, which is formed simultaneously with the appearance of X lines. With the increase in the guide field, the contributions of the Fermi mechanism to electron acceleration become less and less important. When the guide field is sufficiently large, the contribution of the Fermi mechanism is almost negligible.
Particle Acceleration, Magnetic Field Generation in Relativistic Shocks
NASA Technical Reports Server (NTRS)
Nishikawa, Ken-Ichi; Hardee, P.; Hededal, C. B.; Richardson, G.; Sol, H.; Preece, R.; Fishman, G. J.
2005-01-01
Shock acceleration is an ubiquitous phenomenon in astrophysical plasmas. Plasma waves and their associated instabilities (e.g., the Buneman instability, two-streaming instability, and the Weibel instability) created in the shocks are responsible for particle (electron, positron, and ion) acceleration. Using a 3-D relativistic electromagnetic particle (REMP) code, we have investigated particle acceleration associated with a relativistic jet front propagating through an ambient plasma with and without initial magnetic fields. We find only small differences in the results between no ambient and weak ambient parallel magnetic fields. Simulations show that the Weibel instability created in the collisionless shock front accelerates particles perpendicular and parallel to the jet propagation direction. New simulations with an ambient perpendicular magnetic field show the strong interaction between the relativistic jet and the magnetic fields. The magnetic fields are piled up by the jet and the jet electrons are bent, which creates currents and displacement currents. At the nonlinear stage, the magnetic fields are reversed by the current and the reconnection may take place. Due to these dynamics the jet and ambient electron are strongly accelerated in both parallel and perpendicular directions.
Particle Acceleration, Magnetic Field Generation, and Emission in Relativistic Shocks
NASA Technical Reports Server (NTRS)
Nishikawa, Ken-IchiI.; Hededal, C.; Hardee, P.; Richardson, G.; Preece, R.; Sol, H.; Fishman, G.
2004-01-01
Shock acceleration is an ubiquitous phenomenon in astrophysical plasmas. Plasma waves and their associated instabilities (e.g., the Buneman instability, two-streaming instability, and the Weibel instability) created in the shocks are responsible for particle (electron, positron, and ion) acceleration. Using a 3-D relativistic electromagnetic particle (m) code, we have investigated particle acceleration associated with a relativistic jet front propagating through an ambient plasma with and without initial magnetic fields. We find only small differences in the results between no ambient and weak ambient parallel magnetic fields. Simulations show that the Weibel instability created in the collisionless shock front accelerates particles perpendicular and parallel to the jet propagation direction. New simulations with an ambient perpendicular magnetic field show the strong interaction between the relativistic jet and the magnetic fields. The magnetic fields are piled up by the jet and the jet electrons are bent, which creates currents and displacement currents. At the nonlinear stage, the magnetic fields are reversed by the current and the reconnection may take place. Due to these dynamics the jet and ambient electron are strongly accelerated in both parallel and perpendicular directions.
Experimental Investigation of the Induced Airflow of Corona Discharge
NASA Astrophysics Data System (ADS)
Huang, Yong; Zhang, Xin; Wang, Xun-Nian; Wang, Wan-Bo; Huang, Zong-Bo; Li, Hua-Xing
2013-09-01
In order to improve the acceleration effect of corona discharge acting on air, we present an experimental study on the induced airflow produced by corona discharge between two parallel electrodes. The parameters investigated are the type of electrodes, actuation voltage and the distance in the absence of free airflow. The induced flow velocity is measured directly in the accelerated region using the particle image velocimetry technology. The results show that if corona discharge is not developed into arc discharge, the induced airflow velocity increases nearly linearly with the applied voltage and the maximum induced airflow velocity near the needle electrode reaches 36 m/s. It is expected that in the future, the result can be referred to in the research about effect of active flow control to reach much higher induced airflow speed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Low, D; Mutic, S; Shvartsman, S
Purpose: To develop a method for isolating the MRI magnetic field from field-sensitive linear accelerator components at distances close to isocenter. Methods: A MRI-guided radiation therapy system has been designed that integrates a linear accelerator with simultaneous MR imaging. In order to accomplish this, the magnetron, port circulator, radiofrequency waveguide, gun driver, and linear accelerator needed to be placed in locations with low magnetic fields. The system was also required to be compact, so moving these components far from the main magnetic field and isocenter was not an option. The magnetic field sensitive components (exclusive of the waveguide) were placedmore » in coaxial steel sleeves that were electrically and mechanically isolated and whose thickness and placement were optimized using E&M modeling software. Six sets of sleeves were placed 60° apart, 85 cm from isocenter. The Faraday effect occurs when the direction of propagation is parallel to the magnetic RF field component, rotating the RF polarization, subsequently diminishing RF power. The Faraday effect was avoided by orienting the waveguides such that the magnetic field RF component was parallel to the magnetic field. Results: The magnetic field within the shields was measured to be less than 40 Gauss, significantly below the amount needed for the magnetron and port circulator. Additional mu-metal was employed to reduce the magnetic field at the linear accelerator to less than 1 Gauss. The orientation of the RF waveguides allowed the RT transport with minimal loss and reflection. Conclusion: One of the major challenges in designing a compact linear accelerator based MRI-guided radiation therapy system, that of creating low magnetic field environments for the magnetic-field sensitive components, has been solved. The measured magnetic fields are sufficiently small to enable system integration. This work supported by ViewRay, Inc.« less
Kinetic Simulations of Particle Acceleration at Shocks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Caprioli, Damiano; Guo, Fan
2015-07-16
Collisionless shocks are mediated by collective electromagnetic interactions and are sources of non-thermal particles and emission. The full particle-in-cell approach and a hybrid approach are sketched, simulations of collisionless shocks are shown using a multicolor presentation. Results for SN 1006, a case involving ion acceleration and B field amplification where the shock is parallel, are shown. Electron acceleration takes place in planetary bow shocks and galaxy clusters. It is concluded that acceleration at shocks can be efficient: >15%; CRs amplify B field via streaming instability; ion DSA is efficient at parallel, strong shocks; ions are injected via reflection and shockmore » drift acceleration; and electron DSA is efficient at oblique shocks.« less
Further optimization of SeDDaRA blind image deconvolution algorithm and its DSP implementation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Qiheng; Zhang, Jianlin
2011-11-01
Efficient algorithm for blind image deconvolution and its high-speed implementation is of great value in practice. Further optimization of SeDDaRA is developed, from algorithm structure to numerical calculation methods. The main optimization covers that, the structure's modularization for good implementation feasibility, reducing the data computation and dependency of 2D-FFT/IFFT, and acceleration of power operation by segmented look-up table. Then the Fast SeDDaRA is proposed and specialized for low complexity. As the final implementation, a hardware system of image restoration is conducted by using the multi-DSP parallel processing. Experimental results show that, the processing time and memory demand of Fast SeDDaRA decreases 50% at least; the data throughput of image restoration system is over 7.8Msps. The optimization is proved efficient and feasible, and the Fast SeDDaRA is able to support the real-time application.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S
2015-01-01
This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Parallel/Vector Integration Methods for Dynamical Astronomy
NASA Astrophysics Data System (ADS)
Fukushima, T.
Progress of parallel/vector computers has driven us to develop suitable numerical integrators utilizing their computational power to the full extent while being independent on the size of system to be integrated. Unfortunately, the parallel version of Runge-Kutta type integrators are known to be not so efficient. Recently we developed a parallel version of the extrapolation method (Ito and Fukushima 1997), which allows variable timesteps and still gives an acceleration factor of 3-4 for general problems. While the vector-mode usage of Picard-Chebyshev method (Fukushima 1997a, 1997b) will lead the acceleration factor of order of 1000 for smooth problems such as planetary/satellites orbit integration. The success of multiple-correction PECE mode of time-symmetric implicit Hermitian integrator (Kokubo 1998) seems to enlighten Milankar's so-called "pipelined predictor corrector method", which is expected to lead an acceleration factor of 3-4. We will review these directions and discuss future prospects.
Wen, Qiuting; Kodiweera, Chandana; Dale, Brian M; Shivraman, Giri; Wu, Yu-Chien
2018-01-01
To accelerate high-resolution diffusion imaging, rotating single-shot acquisition (RoSA) with composite reconstruction is proposed. Acceleration was achieved by acquiring only one rotating single-shot blade per diffusion direction, and high-resolution diffusion-weighted (DW) images were reconstructed by using similarities of neighboring DW images. A parallel imaging technique was implemented in RoSA to further improve the image quality and acquisition speed. RoSA performance was evaluated by simulation and human experiments. A brain tensor phantom was developed to determine an optimal blade size and rotation angle by considering similarity in DW images, off-resonance effects, and k-space coverage. With the optimal parameters, RoSA MR pulse sequence and reconstruction algorithm were developed to acquire human brain data. For comparison, multishot echo planar imaging (EPI) and conventional single-shot EPI sequences were performed with matched scan time, resolution, field of view, and diffusion directions. The simulation indicated an optimal blade size of 48 × 256 and a 30 ° rotation angle. For 1 × 1 mm 2 in-plane resolution, RoSA was 12 times faster than the multishot acquisition with comparable image quality. With the same acquisition time as SS-EPI, RoSA provided superior image quality and minimum geometric distortion. RoSA offers fast, high-quality, high-resolution diffusion images. The composite image reconstruction is model-free and compatible with various diffusion computation approaches including parametric and nonparametric analyses. Magn Reson Med 79:264-275, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Software for Use with Optoelectronic Measuring Tool
NASA Technical Reports Server (NTRS)
Ballard, Kim C.
2004-01-01
A computer program has been written to facilitate and accelerate the process of measurement by use of the apparatus described in "Optoelectronic Tool Adds Scale Marks to Photographic Images" (KSC-12201). The tool contains four laser diodes that generate parallel beams of light spaced apart at a known distance. The beams of light are used to project bright spots that serve as scale marks that become incorporated into photographic images (including film and electronic images). The sizes of objects depicted in the images can readily be measured by reference to the scale marks. The computer program is applicable to a scene that contains the laser spots and that has been imaged in a square pixel format that can be imported into a graphical user interface (GUI) generated by the program. It is assumed that the laser spots and the distance(s) to be measured all lie in the same plane and that the plane is perpendicular to the line of sight of the camera used to record the image
NASA Astrophysics Data System (ADS)
Bosch, Carl; Degirmenci, Soysal; Barlow, Jason; Mesika, Assaf; Politte, David G.; O'Sullivan, Joseph A.
2016-05-01
X-ray computed tomography reconstruction for medical, security and industrial applications has evolved through 40 years of experience with rotating gantry scanners using analytic reconstruction techniques such as filtered back projection (FBP). In parallel, research into statistical iterative reconstruction algorithms has evolved to apply to sparse view scanners in nuclear medicine, low data rate scanners in Positron Emission Tomography (PET) [5, 7, 10] and more recently to reduce exposure to ionizing radiation in conventional X-ray CT scanners. Multiple approaches to statistical iterative reconstruction have been developed based primarily on variations of expectation maximization (EM) algorithms. The primary benefit of EM algorithms is the guarantee of convergence that is maintained when iterative corrections are made within the limits of convergent algorithms. The primary disadvantage, however is that strict adherence to correction limits of convergent algorithms extends the number of iterations and ultimate timeline to complete a 3D volumetric reconstruction. Researchers have studied methods to accelerate convergence through more aggressive corrections [1], ordered subsets [1, 3, 4, 9] and spatially variant image updates. In this paper we describe the development of an AM reconstruction algorithm with accelerated convergence for use in a real-time explosive detection application for aviation security. By judiciously applying multiple acceleration techniques and advanced GPU processing architectures, we are able to perform 3D reconstruction of scanned passenger baggage at a rate of 75 slices per second. Analysis of the results on stream of commerce passenger bags demonstrates accelerated convergence by factors of 8 to 15, when comparing images from accelerated and strictly convergent algorithms.
Combined algorithmic and GPU acceleration for ultra-fast circular conebeam backprojection
NASA Astrophysics Data System (ADS)
Brokish, Jeffrey; Sack, Paul; Bresler, Yoram
2010-04-01
In this paper, we describe the first implementation and performance of a fast O(N3logN) hierarchical backprojection algorithm for cone beam CT with a circular trajectory1,developed on a modern Graphics Processing Unit (GPU). The resulting tomographic backprojection system for 3D cone beam geometry combines speedup through algorithmic improvements provided by the hierarchical backprojection algorithm with speedup from a massively parallel hardware accelerator. For data parameters typical in diagnostic CT and using a mid-range GPU card, we report reconstruction speeds of up to 360 frames per second, and relative speedup of almost 6x compared to conventional backprojection on the same hardware. The significance of these results is twofold. First, they demonstrate that the reduction in operation counts demonstrated previously for the FHBP algorithm can be translated to a comparable run-time improvement in a massively parallel hardware implementation, while preserving stringent diagnostic image quality. Second, the dramatic speedup and throughput numbers achieved indicate the feasibility of systems based on this technology, which achieve real-time 3D reconstruction for state-of-the art diagnostic CT scanners with small footprint, high-reliability, and affordable cost.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Jiajia; Wang, Yuming; McIntosh, Scott W.
We combine observations of the Coronal Multi-channel Polarimeter and the Atmospheric Imaging Assembly on board the Solar Dynamics Observatory to study the characteristic properties of (propagating) Alfvénic motions and quasi-periodic intensity disturbances in polar plumes. This unique combination of instruments highlights the physical richness of the processes taking place at the base of the (fast) solar wind. The (parallel) intensity perturbations with intensity enhancements around 1% have an apparent speed of 120 km s{sup −1} (in both the 171 and 193 Å passbands) and a periodicity of 15 minutes, while the (perpendicular) Alfvénic wave motions have a velocity amplitude ofmore » 0.5 km s{sup −1}, a phase speed of 830 km s{sup −1}, and a shorter period of 5 minutes on the same structures. These observations illustrate a scenario where the excited Alfvénic motions are propagating along an inhomogeneously loaded magnetic field structure such that the combination could be a potential progenitor of the magnetohydrodynamic turbulence required to accelerate the fast solar wind.« less
Zhang, Xiaodong; Tong, Frank; Li, Chun-Xia; Yan, Yumei; Nair, Govind; Nagaoka, Tsukasa; Tanaka, Yoji; Zola, Stuart; Howell, Leonard
2014-04-01
Many MRI parameters have been explored and demonstrated the capability or potential to evaluate acute stroke injury, providing anatomical, microstructural, functional, or neurochemical information for diagnostic purposes and therapeutic development. However, the application of multiparameter MRI approach is hindered in clinic due to the very limited time window after stroke insult. Parallel imaging technique can accelerate MRI data acquisition dramatically and has been incorporated in modern clinical scanners and increasingly applied for various diagnostic purposes. In the present study, a fast multiparameter MRI approach including structural T1-weighted imaging (T1W), T2-weighted imaging (T2W), diffusion tensor imaging (DTI), T2-mapping, proton magnetic resonance spectroscopy, cerebral blood flow (CBF), and magnetization transfer (MT) imaging, was implemented and optimized for assessing acute stroke injury on a 3T clinical scanner. A macaque model of transient ischemic stroke induced by a minimal interventional approach was utilized for evaluating the multiparameter MRI approach. The preliminary results indicate the surgical procedure successfully induced ischemic occlusion in the cortex and/or subcortex in adult macaque monkeys (n=4). Application of parallel imaging technique substantially reduced the scanning duration of most MRI data acquisitions, allowing for fast and repeated evaluation of acute stroke injury. Hence, the use of the multiparameter MRI approach with up to five quantitative measures can provide significant advantages in preclinical or clinical studies of stroke disease.
Comparative study of bowtie and patient scatter in diagnostic CT
NASA Astrophysics Data System (ADS)
Prakash, Prakhar; Boudry, John M.
2017-03-01
A fast, GPU accelerated Monte Carlo engine for simulating relevant photon interaction processes over the diagnostic energy range in third-generation CT systems was developed to study the relative contributions of bowtie and object scatter to the total scatter reaching an imaging detector. Primary and scattered projections for an elliptical water phantom (major axis set to 300mm) with muscle and fat inserts were simulated for a typical diagnostic CT system as a function of anti-scatter grid (ASG) configurations. The ASG design space explored grid orientation, i.e. septa either a) parallel or b) parallel and perpendicular to the axis of rotation, as well as septa height. The septa material was Tungsten. The resulting projections were reconstructed and the scatter induced image degradation was quantified using common CT image metrics (such as Hounsfield Unit (HU) inaccuracy and loss in contrast), along with a qualitative review of image artifacts. Results indicate object scatter dominates total scatter in the detector channels under the shadow of the imaged object with the bowtie scatter fraction progressively increasing towards the edges of the object projection. Object scatter was shown to be the driving factor behind HU inaccuracy and contrast reduction in the simulated images while shading artifacts and elevated loss in HU accuracy at the object boundary were largely attributed to bowtie scatter. Because the impact of bowtie scatter could not be sufficiently mitigated with a large grid ratio ASG, algorithmic correction may be necessary to further mitigate these artifacts.
A 2D MTF approach to evaluate and guide dynamic imaging developments.
Chao, Tzu-Cheng; Chung, Hsiao-Wen; Hoge, W Scott; Madore, Bruno
2010-02-01
As the number and complexity of partially sampled dynamic imaging methods continue to increase, reliable strategies to evaluate performance may prove most useful. In the present work, an analytical framework to evaluate given reconstruction methods is presented. A perturbation algorithm allows the proposed evaluation scheme to perform robustly without requiring knowledge about the inner workings of the method being evaluated. A main output of the evaluation process consists of a two-dimensional modulation transfer function, an easy-to-interpret visual rendering of a method's ability to capture all combinations of spatial and temporal frequencies. Approaches to evaluate noise properties and artifact content at all spatial and temporal frequencies are also proposed. One fully sampled phantom and three fully sampled cardiac cine datasets were subsampled (R = 4 and 8) and reconstructed with the different methods tested here. A hybrid method, which combines the main advantageous features observed in our assessments, was proposed and tested in a cardiac cine application, with acceleration factors of 3.5 and 6.3 (skip factors of 4 and 8, respectively). This approach combines features from methods such as k-t sensitivity encoding, unaliasing by Fourier encoding the overlaps in the temporal dimension-sensitivity encoding, generalized autocalibrating partially parallel acquisition, sensitivity profiles from an array of coils for encoding and reconstruction in parallel, self, hybrid referencing with unaliasing by Fourier encoding the overlaps in the temporal dimension and generalized autocalibrating partially parallel acquisition, and generalized autocalibrating partially parallel acquisition-enhanced sensitivity maps for sensitivity encoding reconstructions.
NASA Astrophysics Data System (ADS)
Zhang, Yi; Gabr, Refaat E.; Zhou, Jinyuan; Weiss, Robert G.; Bottomley, Paul A.
2013-12-01
Noninvasive magnetic resonance spectroscopy (MRS) with chemical shift imaging (CSI) provides valuable metabolic information for research and clinical studies, but is often limited by long scan times. Recently, spectroscopy with linear algebraic modeling (SLAM) was shown to provide compartment-averaged spectra resolved in one spatial dimension with many-fold reductions in scan-time. This was achieved using a small subset of the CSI phase-encoding steps from central image k-space that maximized the signal-to-noise ratio. Here, SLAM is extended to two- and three-dimensions (2D, 3D). In addition, SLAM is combined with sensitivity-encoded (SENSE) parallel imaging techniques, enabling the replacement of even more CSI phase-encoding steps to further accelerate scan-speed. A modified SLAM reconstruction algorithm is introduced that significantly reduces the effects of signal nonuniformity within compartments. Finally, main-field inhomogeneity corrections are provided, analogous to CSI. These methods are all tested on brain proton MRS data from a total of 24 patients with brain tumors, and in a human cardiac phosphorus 3D SLAM study at 3T. Acceleration factors of up to 120-fold versus CSI are demonstrated, including speed-up factors of 5-fold relative to already-accelerated SENSE CSI. Brain metabolites are quantified in SLAM and SENSE SLAM spectra and found to be indistinguishable from CSI measures from the same compartments. The modified reconstruction algorithm demonstrated immunity to maladjusted segmentation and errors from signal heterogeneity in brain data. In conclusion, SLAM demonstrates the potential to supplant CSI in studies requiring compartment-average spectra or large volume coverage, by dramatically reducing scan-time while providing essentially the same quantitative results.
Jiang, Yun; Ma, Dan; Bhat, Himanshu; Ye, Huihui; Cauley, Stephen F; Wald, Lawrence L; Setsompop, Kawin; Griswold, Mark A
2017-11-01
The purpose of this study is to accelerate an MR fingerprinting (MRF) acquisition by using a simultaneous multislice method. A multiband radiofrequency (RF) pulse was designed to excite two slices with different flip angles and phases. The signals of two slices were driven to be as orthogonal as possible. The mixed and undersampled MRF signal was matched to two dictionaries to retrieve T 1 and T 2 maps of each slice. Quantitative results from the proposed method were validated with the gold-standard spin echo methods in a phantom. T 1 and T 2 maps of in vivo human brain from two simultaneously acquired slices were also compared to the results of fast imaging with steady-state precession based MRF method (MRF-FISP) with a single-band RF excitation. The phantom results showed that the simultaneous multislice imaging MRF-FISP method quantified the relaxation properties accurately compared to the gold-standard spin echo methods. T 1 and T 2 values of in vivo brain from the proposed method also matched the results from the normal MRF-FISP acquisition. T 1 and T 2 values can be quantified at a multiband acceleration factor of two using our proposed acquisition even in a single-channel receive coil. Further acceleration could be achieved by combining this method with parallel imaging or iterative reconstruction. Magn Reson Med 78:1870-1876, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
The Experimental Study of Rayleigh-Taylor Instability using a Linear Induction Motor Accelerator
NASA Astrophysics Data System (ADS)
Yamashita, Nicholas; Jacobs, Jeffrey
2009-11-01
The experiments to be presented utilize an incompressible system of two stratified miscible liquids of different densities that are accelerated in order to produce the Rayleigh-Taylor instability. Three liquid combinations are used: isopropyl alcohol with water, a calcium nitrate solution or a lithium polytungstate solution, giving Atwood numbers of 0.11, 0.22 and 0.57, respectively. The acceleration required to drive the instability is produced by two high-speed linear induction motors mounted to an 8 m tall drop tower. The motors are mounted in parallel and have an effective acceleration length of 1.7 m and are each capable of producing 15 kN of thrust. The liquid system is contained within a square acrylic tank with inside dimensions 76 x76x184 mm. The tank is mounted to an aluminum plate, which is driven by the motors to create constant accelerations in the range of 1-20 g's, though the potential exists for higher accelerations. Also attached to the plate are a high-speed camera and an LED backlight to provide continuous video of the instability. In addition, an accelerometer is used to provide acceleration measurements during each experiment. Experimental image sequences will be presented which show the development of a random three-dimensional instability from an unforced initial perturbation. Measurements of the mixing zone width will be compared with traditional growth models.
Parallel Computer System for 3D Visualization Stereo on GPU
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Zori, Sergii A.
2018-03-01
This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
NASA Astrophysics Data System (ADS)
Zaripov, D. I.; Renfu, Li
2018-05-01
The implementation of high-efficiency digital image correlation methods based on a zero-normalized cross-correlation (ZNCC) procedure for high-speed, time-resolved measurements using a high-resolution digital camera is associated with big data processing and is often time consuming. In order to speed-up ZNCC computation, a high-speed technique based on a parallel projection correlation procedure is proposed. The proposed technique involves the use of interrogation window projections instead of its two-dimensional field of luminous intensity. This simplification allows acceleration of ZNCC computation up to 28.8 times compared to ZNCC calculated directly, depending on the size of interrogation window and region of interest. The results of three synthetic test cases, such as a one-dimensional uniform flow, a linear shear flow and a turbulent boundary-layer flow, are discussed in terms of accuracy. In the latter case, the proposed technique is implemented together with an iterative window-deformation technique. On the basis of the results of the present work, the proposed technique is recommended to be used for initial velocity field calculation, with further correction using more accurate techniques.
A review of 3D first-pass, whole-heart, myocardial perfusion cardiovascular magnetic resonance.
Fair, Merlin J; Gatehouse, Peter D; DiBella, Edward V R; Firmin, David N
2015-08-01
A comprehensive review is undertaken of the methods available for 3D whole-heart first-pass perfusion (FPP) and their application to date, with particular focus on possible acceleration techniques. Following a summary of the parameters typically desired of 3D FPP methods, the review explains the mechanisms of key acceleration techniques and their potential use in FPP for attaining 3D acquisitions. The mechanisms include rapid sequences, non-Cartesian k-space trajectories, reduced k-space acquisitions, parallel imaging reconstructions and compressed sensing. An attempt is made to explain, rather than simply state, the varying methods with the hope that it will give an appreciation of the different components making up a 3D FPP protocol. Basic estimates demonstrating the required total acceleration factors in typical 3D FPP cases are included, providing context for the extent that each acceleration method can contribute to the required imaging speed, as well as potential limitations in present 3D FPP literature. Although many 3D FPP methods are too early in development for the type of clinical trials required to show any clear benefit over current 2D FPP methods, the review includes the small but growing quantity of clinical research work already using 3D FPP, alongside the more technical work. Broader challenges concerning FPP such as quantitative analysis are not covered, but challenges with particular impact on 3D FPP methods, particularly with regards to motion effects, are discussed along with anticipated future work in the field.
GPU-accelerated depth map generation for X-ray simulations of complex CAD geometries
NASA Astrophysics Data System (ADS)
Grandin, Robert J.; Young, Gavin; Holland, Stephen D.; Krishnamurthy, Adarsh
2018-04-01
Interactive x-ray simulations of complex computer-aided design (CAD) models can provide valuable insights for better interpretation of the defect signatures such as porosity from x-ray CT images. Generating the depth map along a particular direction for the given CAD geometry is the most compute-intensive step in x-ray simulations. We have developed a GPU-accelerated method for real-time generation of depth maps of complex CAD geometries. We preprocess complex components designed using commercial CAD systems using a custom CAD module and convert them into a fine user-defined surface tessellation. Our CAD module can be used by different simulators as well as handle complex geometries, including those that arise from complex castings and composite structures. We then make use of a parallel algorithm that runs on a graphics processing unit (GPU) to convert the finely-tessellated CAD model to a voxelized representation. The voxelized representation can enable heterogeneous modeling of the volume enclosed by the CAD model by assigning heterogeneous material properties in specific regions. The depth maps are generated from this voxelized representation with the help of a GPU-accelerated ray-casting algorithm. The GPU-accelerated ray-casting method enables interactive (> 60 frames-per-second) generation of the depth maps of complex CAD geometries. This enables arbitrarily rotation and slicing of the CAD model, leading to better interpretation of the x-ray images by the user. In addition, the depth maps can be used to aid directly in CT reconstruction algorithms.
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang
2017-10-01
The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
Parallel processing considerations for image recognition tasks
NASA Astrophysics Data System (ADS)
Simske, Steven J.
2011-01-01
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations
NASA Astrophysics Data System (ADS)
Nguyen, Trung Dac
2017-03-01
The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-01
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4× speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration. PMID:28075343
Computer generated hologram from point cloud using graphics processor.
Chen, Rick H-Y; Wilkinson, Timothy D
2009-12-20
Computer generated holography is an extremely demanding and complex task when it comes to providing realistic reconstructions with full parallax, occlusion, and shadowing. We present an algorithm designed for data-parallel computing on modern graphics processing units to alleviate the computational burden. We apply Gaussian interpolation to create a continuous surface representation from discrete input object points. The algorithm maintains a potential occluder list for each individual hologram plane sample to keep the number of visibility tests to a minimum. We experimented with two approximations that simplify and accelerate occlusion computation. It is observed that letting several neighboring hologram plane samples share visibility information on object points leads to significantly faster computation without causing noticeable artifacts in the reconstructed images. Computing a reduced sample set via nonuniform sampling is also found to be an effective acceleration technique.
CUDA-Accelerated Geodesic Ray-Tracing for Fiber Tracking
van Aart, Evert; Sepasian, Neda; Jalba, Andrei; Vilanova, Anna
2011-01-01
Diffusion Tensor Imaging (DTI) allows to noninvasively measure the diffusion of water in fibrous tissue. By reconstructing the fibers from DTI data using a fiber-tracking algorithm, we can deduce the structure of the tissue. In this paper, we outline an approach to accelerating such a fiber-tracking algorithm using a Graphics Processing Unit (GPU). This algorithm, which is based on the calculation of geodesics, has shown promising results for both synthetic and real data, but is limited in its applicability by its high computational requirements. We present a solution which uses the parallelism offered by modern GPUs, in combination with the CUDA platform by NVIDIA, to significantly reduce the execution time of the fiber-tracking algorithm. Compared to a multithreaded CPU implementation of the same algorithm, our GPU mapping achieves a speedup factor of up to 40 times. PMID:21941525
A novel coil array for combined TMS/fMRI experiments at 3 T.
Navarro de Lara, Lucia I; Windischberger, Christian; Kuehne, Andre; Woletz, Michael; Sieg, Jürgen; Bestmann, Sven; Weiskopf, Nikolaus; Strasser, Bernhard; Moser, Ewald; Laistler, Elmar
2015-11-01
To overcome current limitations in combined transcranial magnetic stimulation (TMS) and functional magnetic resonance imaging (fMRI) studies by employing a dedicated coil array design for 3 Tesla. The state-of-the-art setup for concurrent TMS/fMRI is to use a large birdcage head coil, with the TMS between the subject's head and the MR coil. This setup has drawbacks in sensitivity, positioning, and available imaging techniques. In this study, an ultraslim 7-channel receive-only coil array for 3 T, which can be placed between the subject's head and the TMS, is presented. Interactions between the devices are investigated and the performance of the new setup is evaluated in comparison to the state-of-the-art setup. MR sensitivity obtained at the depth of the TMS stimulation is increased by a factor of five. Parallel imaging with an acceleration factor of two is feasible with low g-factors. Possible interactions between TMS and the novel hardware were investigated and were found negligible. The novel coil array is safe, strongly improves signal-to-noise ratio in concurrent TMS/fMRI experiments, enables parallel imaging, and allows for flexible positioning of the TMS on the head while ensuring efficient TMS stimulation due to its ultraslim design. © 2014 The Authors. Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.
Magnetosheath Filamentary Structures Formed by Ion Acceleration at the Quasi-Parallel Bow Shock
NASA Technical Reports Server (NTRS)
Omidi, N.; Sibeck, D.; Gutynska, O.; Trattner, K. J.
2014-01-01
Results from 2.5-D electromagnetic hybrid simulations show the formation of field-aligned, filamentary plasma structures in the magnetosheath. They begin at the quasi-parallel bow shock and extend far into the magnetosheath. These structures exhibit anticorrelated, spatial oscillations in plasma density and ion temperature. Closer to the bow shock, magnetic field variations associated with density and temperature oscillations may also be present. Magnetosheath filamentary structures (MFS) form primarily in the quasi-parallel sheath; however, they may extend to the quasi-perpendicular magnetosheath. They occur over a wide range of solar wind Alfvénic Mach numbers and interplanetary magnetic field directions. At lower Mach numbers with lower levels of magnetosheath turbulence, MFS remain highly coherent over large distances. At higher Mach numbers, magnetosheath turbulence decreases the level of coherence. Magnetosheath filamentary structures result from localized ion acceleration at the quasi-parallel bow shock and the injection of energetic ions into the magnetosheath. The localized nature of ion acceleration is tied to the generation of fast magnetosonic waves at and upstream of the quasi-parallel shock. The increased pressure in flux tubes containing the shock accelerated ions results in the depletion of the thermal plasma in these flux tubes and the enhancement of density in flux tubes void of energetic ions. This results in the observed anticorrelation between ion temperature and plasma density.
GYROSURFING ACCELERATION OF IONS IN FRONT OF EARTH's QUASI-PARALLEL BOW SHOCK
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kis, Arpad; Lemperger, Istvan; Wesztergom, Viktor
2013-07-01
It is well known that shocks in space plasmas can accelerate particles to high energies. However, many details of the shock acceleration mechanism are still unknown. A critical element of shock acceleration is the injection problem; i.e., the presence of the so called seed particle population that is needed for the acceleration to work efficiently. In our case study, we present for the first time observational evidence of gyroresonant surfing acceleration in front of Earth's quasi-parallel bow shock resulting in the appearance of the long-suspected seed particle population. For our analysis, we use simultaneous multi-spacecraft measurements provided by the Clustermore » spacecraft ion (CIS), magnetic (FGM), and electric field and wave instrument (EFW) during a time period of large inter-spacecraft separation distance. The spacecraft were moving toward the bow shock and were situated in the foreshock region. The results show that the gyroresonance surfing acceleration takes place as a consequence of interaction between circularly polarized monochromatic (or quasi-monochromatic) transversal electromagnetic plasma waves and short large amplitude magnetic structures (SLAMSs). The magnetic mirror force of the SLAMS provides the resonant conditions for the ions trapped by the waves and results in the acceleration of ions. Since wave packets with circular polarization and different kinds of magnetic structures are very commonly observed in front of Earth's quasi-parallel bow shock, the gyroresonant surfing acceleration proves to be an important particle injection mechanism. We also show that seed ions are accelerated directly from the solar wind ion population.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Huanyu; Lu, Quanming; Huang, Can
2016-04-20
The interactions between magnetic islands are considered to play an important role in electron acceleration during magnetic reconnection. In this paper, two-dimensional particle-in-cell simulations are performed to study electron acceleration during multiple X line reconnection with a guide field. Because the electrons remain almost magnetized, we can analyze the contributions of the parallel electric field, Fermi, and betatron mechanisms to electron acceleration during the evolution of magnetic reconnection through comparison with a guide-center theory. The results show that with the magnetic reconnection proceeding, two magnetic islands are formed in the simulation domain. Next, the electrons are accelerated by both themore » parallel electric field in the vicinity of the X lines and the Fermi mechanism due to the contraction of the two magnetic islands. Then, the two magnetic islands begin to merge into one, and, in such a process, the electrons can be accelerated by both the parallel electric field and betatron mechanisms. During the betatron acceleration, the electrons are locally accelerated in the regions where the magnetic field is piled up by the high-speed flow from the X line. At last, when the coalescence of the two islands into one big island finishes, the electrons can be further accelerated by the Fermi mechanism because of the contraction of the big island. With the increase of the guide field, the contributions of the Fermi and betatron mechanisms to electron acceleration become less and less important. When the guide field is sufficiently large, the contributions of the Fermi and betatron mechanisms are almost negligible.« less
GPU-Accelerated Voxelwise Hepatic Perfusion Quantification
Wang, H; Cao, Y
2012-01-01
Voxelwise quantification of hepatic perfusion parameters from dynamic contrast enhanced (DCE) imaging greatly contributes to assessment of liver function in response to radiation therapy. However, the efficiency of the estimation of hepatic perfusion parameters voxel-by-voxel in the whole liver using a dual-input single-compartment model requires substantial improvement for routine clinical applications. In this paper, we utilize the parallel computation power of a graphics processing unit (GPU) to accelerate the computation, while maintaining the same accuracy as the conventional method. Using CUDA-GPU, the hepatic perfusion computations over multiple voxels are run across the GPU blocks concurrently but independently. At each voxel, non-linear least squares fitting the time series of the liver DCE data to the compartmental model is distributed to multiple threads in a block, and the computations of different time points are performed simultaneously and synchronically. An efficient fast Fourier transform in a block is also developed for the convolution computation in the model. The GPU computations of the voxel-by-voxel hepatic perfusion images are compared with ones by the CPU using the simulated DCE data and the experimental DCE MR images from patients. The computation speed is improved by 30 times using a NVIDIA Tesla C2050 GPU compared to a 2.67 GHz Intel Xeon CPU processor. To obtain liver perfusion maps with 626400 voxels in a patient’s liver, it takes 0.9 min with the GPU-accelerated voxelwise computation, compared to 110 min with the CPU, while both methods result in perfusion parameters differences less than 10−6. The method will be useful for generating liver perfusion images in clinical settings. PMID:22892645
Ojeda-May, Pedro; Nam, Kwangho
2017-08-08
The strategy and implementation of scalable and efficient semiempirical (SE) QM/MM methods in CHARMM are described. The serial version of the code was first profiled to identify routines that required parallelization. Afterward, the code was parallelized and accelerated with three approaches. The first approach was the parallelization of the entire QM/MM routines, including the Fock matrix diagonalization routines, using the CHARMM message passage interface (MPI) machinery. In the second approach, two different self-consistent field (SCF) energy convergence accelerators were implemented using density and Fock matrices as targets for their extrapolations in the SCF procedure. In the third approach, the entire QM/MM and MM energy routines were accelerated by implementing the hybrid MPI/open multiprocessing (OpenMP) model in which both the task- and loop-level parallelization strategies were adopted to balance loads between different OpenMP threads. The present implementation was tested on two solvated enzyme systems (including <100 QM atoms) and an S N 2 symmetric reaction in water. The MPI version exceeded existing SE QM methods in CHARMM, which include the SCC-DFTB and SQUANTUM methods, by at least 4-fold. The use of SCF convergence accelerators further accelerated the code by ∼12-35% depending on the size of the QM region and the number of CPU cores used. Although the MPI version displayed good scalability, the performance was diminished for large numbers of MPI processes due to the overhead associated with MPI communications between nodes. This issue was partially overcome by the hybrid MPI/OpenMP approach which displayed a better scalability for a larger number of CPU cores (up to 64 CPUs in the tested systems).
Garty, Guy; Chen, Youhua; Turner, Helen C; Zhang, Jian; Lyulko, Oleksandra V; Bertucci, Antonella; Xu, Yanping; Wang, Hongliang; Simaan, Nabil; Randers-Pehrson, Gerhard; Lawrence Yao, Y; Brenner, David J
2011-08-01
Over the past five years the Center for Minimally Invasive Radiation Biodosimetry at Columbia University has developed the Rapid Automated Biodosimetry Tool (RABiT), a completely automated, ultra-high throughput biodosimetry workstation. This paper describes recent upgrades and reliability testing of the RABiT. The RABiT analyses fingerstick-derived blood samples to estimate past radiation exposure or to identify individuals exposed above or below a cut-off dose. Through automated robotics, lymphocytes are extracted from fingerstick blood samples into filter-bottomed multi-well plates. Depending on the time since exposure, the RABiT scores either micronuclei or phosphorylation of the histone H2AX, in an automated robotic system, using filter-bottomed multi-well plates. Following lymphocyte culturing, fixation and staining, the filter bottoms are removed from the multi-well plates and sealed prior to automated high-speed imaging. Image analysis is performed online using dedicated image processing hardware. Both the sealed filters and the images are archived. We have developed a new robotic system for lymphocyte processing, making use of an upgraded laser power and parallel processing of four capillaries at once. This system has allowed acceleration of lymphocyte isolation, the main bottleneck of the RABiT operation, from 12 to 2 sec/sample. Reliability tests have been performed on all robotic subsystems. Parallel handling of multiple samples through the use of dedicated, purpose-built, robotics and high speed imaging allows analysis of up to 30,000 samples per day.
Garty, Guy; Chen, Youhua; Turner, Helen; Zhang, Jian; Lyulko, Oleksandra; Bertucci, Antonella; Xu, Yanping; Wang, Hongliang; Simaan, Nabil; Randers-Pehrson, Gerhard; Yao, Y. Lawrence; Brenner, David J.
2011-01-01
Purpose Over the past five years the Center for Minimally Invasive Radiation Biodosimetry at Columbia University has developed the Rapid Automated Biodosimetry Tool (RABiT), a completely automated, ultra-high throughput biodosimetry workstation. This paper describes recent upgrades and reliability testing of the RABiT. Materials and methods The RABiT analyzes fingerstick-derived blood samples to estimate past radiation exposure or to identify individuals exposed above or below a cutoff dose. Through automated robotics, lymphocytes are extracted from fingerstick blood samples into filter-bottomed multi-well plates. Depending on the time since exposure, the RABiT scores either micronuclei or phosphorylation of the histone H2AX, in an automated robotic system, using filter-bottomed multi-well plates. Following lymphocyte culturing, fixation and staining, the filter bottoms are removed from the multi-well plates and sealed prior to automated high-speed imaging. Image analysis is performed online using dedicated image processing hardware. Both the sealed filters and the images are archived. Results We have developed a new robotic system for lymphocyte processing, making use of an upgraded laser power and parallel processing of four capillaries at once. This system has allowed acceleration of lymphocyte isolation, the main bottleneck of the RABiT operation, from 12 to 2 sec/sample. Reliability tests have been performed on all robotic subsystems. Conclusions Parallel handling of multiple samples through the use of dedicated, purpose-built, robotics and high speed imaging allows analysis of up to 30,000 samples per day. PMID:21557703
Language Classification using N-grams Accelerated by FPGA-based Bloom Filters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacob, A; Gokhale, M
N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.
Accelerating statistical image reconstruction algorithms for fan-beam x-ray CT using cloud computing
NASA Astrophysics Data System (ADS)
Srivastava, Somesh; Rao, A. Ravishankar; Sheinin, Vadim
2011-03-01
Statistical image reconstruction algorithms potentially offer many advantages to x-ray computed tomography (CT), e.g. lower radiation dose. But, their adoption in practical CT scanners requires extra computation power, which is traditionally provided by incorporating additional computing hardware (e.g. CPU-clusters, GPUs, FPGAs etc.) into a scanner. An alternative solution is to access the required computation power over the internet from a cloud computing service, which is orders-of-magnitude more cost-effective. This is because users only pay a small pay-as-you-go fee for the computation resources used (i.e. CPU time, storage etc.), and completely avoid purchase, maintenance and upgrade costs. In this paper, we investigate the benefits and shortcomings of using cloud computing for statistical image reconstruction. We parallelized the most time-consuming parts of our application, the forward and back projectors, using MapReduce, the standard parallelization library on clouds. From preliminary investigations, we found that a large speedup is possible at a very low cost. But, communication overheads inside MapReduce can limit the maximum speedup, and a better MapReduce implementation might become necessary in the future. All the experiments for this paper, including development and testing, were completed on the Amazon Elastic Compute Cloud (EC2) for less than $20.
Infrared/microwave (IR/MW) micromirror array beam combiner design and analysis.
Tian, Yi; Lv, Lijun; Jiang, Liwei; Wang, Xin; Li, Yanhong; Yu, Haiming; Feng, Xiaochen; Li, Qi; Zhang, Li; Li, Zhuo
2013-08-01
We investigated the design method of an infrared (IR)/microwave (MW) micromirror array type of beam combiner. The size of micromirror is in microscopic levels and comparable to MW wavelengths, so that the MW will not react in these dimensions, whereas the much shorter optical wavelengths will be reflected by them. Hence, the MW multilayered substrate was simplified and designed using transmission line theory. The beam combiner used an IR wavefront-division imaging technique to reflect the IR radiation image to the unit under test (UUT)'s pupil in a parallel light path. In addition, the boresight error detected by phase monopulse radar was analyzed using a moment-of method (MoM) and multilevel fast multipole method (MLFMM) acceleration technique. The boresight error introduced by the finite size of the beam combiner was less than 1°. Finally, in order to verify the wavefront-division imaging technique, a prototype of a micromirror array was fabricated, and IR images were tested. The IR images obtained by the thermal imager verified the correctness of the wavefront-division imaging technique.
Integrated optical 3D digital imaging based on DSP scheme
NASA Astrophysics Data System (ADS)
Wang, Xiaodong; Peng, Xiang; Gao, Bruce Z.
2008-03-01
We present a scheme of integrated optical 3-D digital imaging (IO3DI) based on digital signal processor (DSP), which can acquire range images independently without PC support. This scheme is based on a parallel hardware structure with aid of DSP and field programmable gate array (FPGA) to realize 3-D imaging. In this integrated scheme of 3-D imaging, the phase measurement profilometry is adopted. To realize the pipeline processing of the fringe projection, image acquisition and fringe pattern analysis, we present a multi-threads application program that is developed under the environment of DSP/BIOS RTOS (real-time operating system). Since RTOS provides a preemptive kernel and powerful configuration tool, with which we are able to achieve a real-time scheduling and synchronization. To accelerate automatic fringe analysis and phase unwrapping, we make use of the technique of software optimization. The proposed scheme can reach a performance of 39.5 f/s (frames per second), so it may well fit into real-time fringe-pattern analysis and can implement fast 3-D imaging. Experiment results are also presented to show the validity of proposed scheme.
NASA Astrophysics Data System (ADS)
Yang, Ying; Liu, Xiaobao; Wang, Jieci; Jing, Jiliang
2018-03-01
We study how to improve the precision of the quantum estimation of phase for an uniformly accelerated atom in fluctuating electromagnetic field by reflecting boundaries. We find that the precision decreases with increases of the acceleration without the boundary. With the presence of a reflecting boundary, the precision depends on the atomic polarization, position and acceleration, which can be effectively enhanced compared to the case without boundary if we choose the appropriate conditions. In particular, with the presence of two parallel reflecting boundaries, we obtain the optimal precision for atomic parallel polarization and the special distance between two boundaries, as if the atom were shielded from the fluctuation.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Evidence for Field-parallel Electron Acceleration in Solar Flares
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haerendel, G.
It is proposed that the coincidence of higher brightness and upward electric current observed by Janvier et al. during a flare indicates electron acceleration by field-parallel potential drops sustained by extremely strong field-aligned currents of the order of 10{sup 4} A m{sup −2}. A consequence of this is the concentration of the currents in sheets with widths of the order of 1 m. The high current density suggests that the field-parallel potential drops are maintained by current-driven anomalous resistivity. The origin of these currents remains a strong challenge for theorists.
Co-evolution of upstream waves and accelerated ions at parallel shocks
NASA Astrophysics Data System (ADS)
Fujimoto, M.; Sugiyama, T.
2016-12-01
Shock waves in space plasmas have been considered as the agents for various particle acceleration phenomena. The basic idea behind shock acceleration is that particles are accelerated as they move back-and-forth across a shock front. Detailed studies of ion acceleration at the terrestrial bow shock have been performed, however, the restricted maximum energies attained prevent a straight-forward application of obtained knowledge to more energetic astrophysical situations. Here we show by a large-scale self-consistent particle simulation that the co-evolution of magnetic turbulence and accelerated ion population is the foundation for continuous operation of shock acceleration to ever higher energies. Magnetic turbulence is created by ions reflected back upstream of a parallel shock front. The co-evolution arises because more energetic ions excite waves of longer wavelengths, and because longer wavelength modes are capable of scattering (in the upstream) and reflecting (at the shock front) more energetic ions. Via carefully designed numerical experiments, we show very clearly that this picture is true.
Non-Cartesian Parallel Imaging Reconstruction
Wright, Katherine L.; Hamilton, Jesse I.; Griswold, Mark A.; Gulani, Vikas; Seiberlich, Nicole
2014-01-01
Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be employed to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the non-homogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian GRAPPA, and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. PMID:24408499
Xu, Zheng; Wang, Sheng; Li, Yeqing; Zhu, Feiyun; Huang, Junzhou
2018-02-08
The most recent history of parallel Magnetic Resonance Imaging (pMRI) has in large part been devoted to finding ways to reduce acquisition time. While joint total variation (JTV) regularized model has been demonstrated as a powerful tool in increasing sampling speed for pMRI, however, the major bottleneck is the inefficiency of the optimization method. While all present state-of-the-art optimizations for the JTV model could only reach a sublinear convergence rate, in this paper, we squeeze the performance by proposing a linear-convergent optimization method for the JTV model. The proposed method is based on the Iterative Reweighted Least Squares algorithm. Due to the complexity of the tangled JTV objective, we design a novel preconditioner to further accelerate the proposed method. Extensive experiments demonstrate the superior performance of the proposed algorithm for pMRI regarding both accuracy and efficiency compared with state-of-the-art methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haerendel, G.
It is proposed that the coincidence of higher brightness and upward electric current observed by Janvier et al. during a flare indicates electron acceleration by field-parallel potential drops sustained by extremely strong field-aligned currents of order 10{sup 4} A m{sup −2}. A few consequences are discussed here.
GPU accelerated cell-based adaptive mesh refinement on unstructured quadrilateral grid
NASA Astrophysics Data System (ADS)
Luo, Xisheng; Wang, Luying; Ran, Wei; Qin, Fenghua
2016-10-01
A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.
Soyer, Philippe; Lagadec, Matthieu; Sirol, Marc; Dray, Xavier; Duchat, Florent; Vignaud, Alexandre; Fargeaudou, Yann; Placé, Vinciane; Gault, Valérie; Hamzi, Lounis; Pocard, Marc; Boudiaf, Mourad
2010-02-11
Our objective was to determine the diagnostic accuracy of a free-breathing diffusion-weighted single-shot echo-planar magnetic resonance imaging (FBDW-SSEPI) technique with parallel imaging and high diffusion factor value (b = 1000 s/mm2) in the detection of primary rectal adenocarcinomas. Thirty-one patients (14M and 17F; mean age 67 years) with histopathologically proven primary rectal adenocarcinomas and 31 patients without rectal malignancies (14M and 17F; mean age 63.6 years) were examined with FBDW-SSEPI (repetition time (TR/echo time (TE) 3900/91 ms, gradient strength 45 mT/m, acquisition time 2 min) at 1.5 T using generalized autocalibrating partially parallel acquisitions (GRAPPA, acceleration factor 2) and a b value of 1000 s/mm2. Apparent diffusion coefficients (ADCs) of rectal adenocarcinomas and normal rectal wall were measured. FBDW-SSEPI images were evaluated for tumour detection by 2 readers. Sensitivity, specificity, accuracy and Youden score for rectal adenocarcinoma detection were calculated with their 95% confidence intervals (CI) for ADC value measurement and visual image analysis. Rectal adenocarcinomas had significantly lower ADCs (mean 1.036 x 10(-3)+/- 0.107 x 10(-3) mm2/s; median 1.015 x 10(-3) mm2/s; range (0.827-1.239) x 10(-3) mm2/s) compared with the rectal wall of control subjects (mean 1.387 x 10(-3)+/- 0.106 x 10(-3) mm2/s; median 1.385 x 10(-3) mm2/s; range (1.176-1.612) x 10(-3) mm2/s) (p < 0.0001). Using a threshold value < or = 1.240 x 10(-3) mm2/s, all rectal adenocarcinomas were correctly categorized and 100% sensitivity (31/31; 95% CI 95-100%), 94% specificity (31/33; 95% CI 88-100%), 97% accuracy (60/62; 95% CI 92-100%) and Youden index 0.94 were obtained for the diagnosis of rectal adenocarcinoma. FBDW-SSEPI image analysis allowed depiction of all rectal adenocarcinomas but resulted in 2 false-positive findings, yielding 100% sensitivity (31/31; 95% CI 95-100%), 94% specificity (31/33; 95% CI 88-100%), 97% accuracy (60/62; 95% CI 92-100%) and Youden index 0.94 for the diagnosis of primary rectal adenocarcinoma. We can conclude that FBDW-SSEPI using parallel imaging and high b value may be helpful in the detection of primary rectal adenocarcinomas.
Wavelet-space correlation imaging for high-speed MRI without motion monitoring or data segmentation.
Li, Yu; Wang, Hui; Tkach, Jean; Roach, David; Woods, Jason; Dumoulin, Charles
2015-12-01
This study aims to (i) develop a new high-speed MRI approach by implementing correlation imaging in wavelet-space, and (ii) demonstrate the ability of wavelet-space correlation imaging to image human anatomy with involuntary or physiological motion. Correlation imaging is a high-speed MRI framework in which image reconstruction relies on quantification of data correlation. The presented work integrates correlation imaging with a wavelet transform technique developed originally in the field of signal and image processing. This provides a new high-speed MRI approach to motion-free data collection without motion monitoring or data segmentation. The new approach, called "wavelet-space correlation imaging", is investigated in brain imaging with involuntary motion and chest imaging with free-breathing. Wavelet-space correlation imaging can exceed the speed limit of conventional parallel imaging methods. Using this approach with high acceleration factors (6 for brain MRI, 16 for cardiac MRI, and 8 for lung MRI), motion-free images can be generated in static brain MRI with involuntary motion and nonsegmented dynamic cardiac/lung MRI with free-breathing. Wavelet-space correlation imaging enables high-speed MRI in the presence of involuntary motion or physiological dynamics without motion monitoring or data segmentation. © 2014 Wiley Periodicals, Inc.
Gpufit: An open-source toolkit for GPU-accelerated curve fitting.
Przybylski, Adrian; Thiel, Björn; Keller-Findeisen, Jan; Stock, Bernd; Bates, Mark
2017-11-16
We present a general purpose, open-source software library for estimation of non-linear parameters by the Levenberg-Marquardt algorithm. The software, Gpufit, runs on a Graphics Processing Unit (GPU) and executes computations in parallel, resulting in a significant gain in performance. We measured a speed increase of up to 42 times when comparing Gpufit with an identical CPU-based algorithm, with no loss of precision or accuracy. Gpufit is designed such that it is easily incorporated into existing applications or adapted for new ones. Multiple software interfaces, including to C, Python, and Matlab, ensure that Gpufit is accessible from most programming environments. The full source code is published as an open source software repository, making its function transparent to the user and facilitating future improvements and extensions. As a demonstration, we used Gpufit to accelerate an existing scientific image analysis package, yielding significantly improved processing times for super-resolution fluorescence microscopy datasets.
Stankovic, Zoran; Fink, Jury; Collins, Jeremy D; Semaan, Edouard; Russe, Maximilian F; Carr, James C; Markl, Michael; Langer, Mathias; Jung, Bernd
2015-04-01
We sought to evaluate the feasibility of k-t parallel imaging for accelerated 4D flow MRI in the hepatic vascular system by investigating the impact of different acceleration factors. k-t GRAPPA accelerated 4D flow MRI of the liver vasculature was evaluated in 16 healthy volunteers at 3T with acceleration factors R = 3, R = 5, and R = 8 (2.0 × 2.5 × 2.4 mm(3), TR = 82 ms), and R = 5 (TR = 41 ms); GRAPPA R = 2 was used as the reference standard. Qualitative flow analysis included grading of 3D streamlines and time-resolved particle traces. Quantitative evaluation assessed velocities, net flow, and wall shear stress (WSS). Significant scan time savings were realized for all acceleration factors compared to standard GRAPPA R = 2 (21-71 %) (p < 0.001). Quantification of velocities and net flow offered similar results between k-t GRAPPA R = 3 and R = 5 compared to standard GRAPPA R = 2. Significantly increased leakage artifacts and noise were seen between standard GRAPPA R = 2 and k-t GRAPPA R = 8 (p < 0.001) with significant underestimation of peak velocities and WSS of up to 31 % in the hepatic arterial system (p <0.05). WSS was significantly underestimated up to 13 % in all vessels of the portal venous system for k-t GRAPPA R = 5, while significantly higher values were observed for the same acceleration with higher temporal resolution in two veins (p < 0.05). k-t acceleration of 4D flow MRI is feasible for liver hemodynamic assessment with acceleration factors R = 3 and R = 5 resulting in a scan time reduction of at least 40 % with similar quantitation of liver hemodynamics compared with GRAPPA R = 2.
Voltage regulation in linear induction accelerators
Parsons, William M.
1992-01-01
Improvement in voltage regulation in a Linear Induction Accelerator wherein a varistor, such as a metal oxide varistor, is placed in parallel with the beam accelerating cavity and the magnetic core. The non-linear properties of the varistor result in a more stable voltage across the beam accelerating cavity than with a conventional compensating resistance.
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.
Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal
2015-06-01
Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.
Wavelet-space Correlation Imaging for High-speed MRI without Motion Monitoring or Data Segmentation
Li, Yu; Wang, Hui; Tkach, Jean; Roach, David; Woods, Jason; Dumoulin, Charles
2014-01-01
Purpose This study aims to 1) develop a new high-speed MRI approach by implementing correlation imaging in wavelet-space, and 2) demonstrate the ability of wavelet-space correlation imaging to image human anatomy with involuntary or physiological motion. Methods Correlation imaging is a high-speed MRI framework in which image reconstruction relies on quantification of data correlation. The presented work integrates correlation imaging with a wavelet transform technique developed originally in the field of signal and image processing. This provides a new high-speed MRI approach to motion-free data collection without motion monitoring or data segmentation. The new approach, called “wavelet-space correlation imaging”, is investigated in brain imaging with involuntary motion and chest imaging with free-breathing. Results Wavelet-space correlation imaging can exceed the speed limit of conventional parallel imaging methods. Using this approach with high acceleration factors (6 for brain MRI, 16 for cardiac MRI and 8 for lung MRI), motion-free images can be generated in static brain MRI with involuntary motion and nonsegmented dynamic cardiac/lung MRI with free-breathing. Conclusion Wavelet-space correlation imaging enables high-speed MRI in the presence of involuntary motion or physiological dynamics without motion monitoring or data segmentation. PMID:25470230
The "Alfvén" proposal for the European Space Agency M5 Mission Call
NASA Astrophysics Data System (ADS)
Berthomier, M.; Fazakerley, A. N.
2017-12-01
The Alfvén mission objective is to elucidate the particle acceleration processes and their consequences for electromagnetic radiation and energy transport in strongly magnetised plasmas. The Earth's Auroral Acceleration Region is a unique laboratory for investigating these processes. The only way to distinguish between the models describing acceleration processes at the heart of Magnetosphere-Ionosphere Coupling is to combine high-time resolution in situ measurements (as pioneered by FAST), multi-point measurements (as pioneered by CLUSTER), and auroral arc imaging in one mission. Charged particle acceleration in strongly magnetized plasmas requires the conversion of electromagnetic energy into magnetic-field-aligned particle kinetic energy. Alfvén will measure for the first time the occurrence and distribution of small scale parallel electric fields in space and time. In order to determine the relative efficiency of the different conversion mechanisms, Alfvén will also measure the corresponding particle energy fluxes locally and into the aurora. Alfvén will discover how electromagnetic radiation is generated in the acceleration region and how it escapes. Alfvén will make key measurements of Auroral Kilometric Radiation needed to test competing models of wave generation, mode conversion and escape from their source region. These will reveal the mode conversion processes and which information is ultimately carried by the polarization of radio waves reaching free space. Alfvén will discover the global impact of particle acceleration on the dynamic coupling between a magnetized object and its plasma environment. Dual spacecraft measurements offer the unique opportunity to unambiguously determine which part of the energy flowing into the ionosphere is eventually dissipated in this collisional plasma and which part is transmitted to outflowing ions of ionospheric origin. The Alfvén mission design involves use of two simple identical spacecraft, a comprehensive suite of inter-calibrated particles and fields instruments, cutting edge auroral imaging, easily accessible orbits that frequently visit the region of scientific interest and straightforward operations.
The Goddard Space Flight Center Program to develop parallel image processing systems
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1972-01-01
Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Blob dynamics in TORPEX poloidal null configurations
NASA Astrophysics Data System (ADS)
Shanahan, B. W.; Dudson, B. D.
2016-12-01
3D blob dynamics are simulated in X-point magnetic configurations in the TORPEX device via a non-field-aligned coordinate system, using an isothermal model which evolves density, vorticity, parallel velocity and parallel current density. By modifying the parallel gradient operator to include perpendicular perturbations from poloidal field coils, numerical singularities associated with field aligned coordinates are avoided. A comparison with a previously developed analytical model (Avino 2016 Phys. Rev. Lett. 116 105001) is performed and an agreement is found with minimal modification. Experimental comparison determines that the null region can cause an acceleration of filaments due to increasing connection length, but this acceleration is small relative to other effects, which we quantify. Experimental measurements (Avino 2016 Phys. Rev. Lett. 116 105001) are reproduced, and the dominant acceleration mechanism is identified as that of a developing dipole in a moving background. Contributions from increasing connection length close to the null point are a small correction.
A novel coil array for combined TMS/fMRI experiments at 3 T
Navarro de Lara, Lucia I.; Windischberger, Christian; Kuehne, Andre; Woletz, Michael; Sieg, Jürgen; Bestmann, Sven; Weiskopf, Nikolaus; Strasser, Bernhard; Moser, Ewald
2014-01-01
Purpose To overcome current limitations in combined transcranial magnetic stimulation (TMS) and functional magnetic resonance imaging (fMRI) studies by employing a dedicated coil array design for 3 Tesla. Methods The state‐of‐the‐art setup for concurrent TMS/fMRI is to use a large birdcage head coil, with the TMS between the subject's head and the MR coil. This setup has drawbacks in sensitivity, positioning, and available imaging techniques. In this study, an ultraslim 7‐channel receive‐only coil array for 3 T, which can be placed between the subject's head and the TMS, is presented. Interactions between the devices are investigated and the performance of the new setup is evaluated in comparison to the state‐of‐the‐art setup. Results MR sensitivity obtained at the depth of the TMS stimulation is increased by a factor of five. Parallel imaging with an acceleration factor of two is feasible with low g‐factors. Possible interactions between TMS and the novel hardware were investigated and were found negligible. Conclusion The novel coil array is safe, strongly improves signal‐to‐noise ratio in concurrent TMS/fMRI experiments, enables parallel imaging, and allows for flexible positioning of the TMS on the head while ensuring efficient TMS stimulation due to its ultraslim design. Magn Reson Med 74:1492–1501, 2015. © 2014 The Authors. Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine. PMID:25421603
Ting, Samuel T; Ahmad, Rizwan; Jin, Ning; Craft, Jason; Serafim da Silveira, Juliana; Xue, Hui; Simonetti, Orlando P
2017-04-01
Sparsity-promoting regularizers can enable stable recovery of highly undersampled magnetic resonance imaging (MRI), promising to improve the clinical utility of challenging applications. However, lengthy computation time limits the clinical use of these methods, especially for dynamic MRI with its large corpus of spatiotemporal data. Here, we present a holistic framework that utilizes the balanced sparse model for compressive sensing and parallel computing to reduce the computation time of cardiac MRI recovery methods. We propose a fast, iterative soft-thresholding method to solve the resulting ℓ1-regularized least squares problem. In addition, our approach utilizes a parallel computing environment that is fully integrated with the MRI acquisition software. The methodology is applied to two formulations of the multichannel MRI problem: image-based recovery and k-space-based recovery. Using measured MRI data, we show that, for a 224 × 144 image series with 48 frames, the proposed k-space-based approach achieves a mean reconstruction time of 2.35 min, a 24-fold improvement compared a reconstruction time of 55.5 min for the nonlinear conjugate gradient method, and the proposed image-based approach achieves a mean reconstruction time of 13.8 s. Our approach can be utilized to achieve fast reconstruction of large MRI datasets, thereby increasing the clinical utility of reconstruction techniques based on compressed sensing. Magn Reson Med 77:1505-1515, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
The effect of cosmic-ray acceleration on supernova blast wave dynamics
NASA Astrophysics Data System (ADS)
Pais, M.; Pfrommer, C.; Ehlert, K.; Pakmor, R.
2018-05-01
Non-relativistic shocks accelerate ions to highly relativistic energies provided that the orientation of the magnetic field is closely aligned with the shock normal (quasi-parallel shock configuration). In contrast, quasi-perpendicular shocks do not efficiently accelerate ions. We model this obliquity-dependent acceleration process in a spherically expanding blast wave setup with the moving-mesh code AREPO for different magnetic field morphologies, ranging from homogeneous to turbulent configurations. A Sedov-Taylor explosion in a homogeneous magnetic field generates an oblate ellipsoidal shock surface due to the slower propagating blast wave in the direction of the magnetic field. This is because of the efficient cosmic ray (CR) production in the quasi-parallel polar cap regions, which softens the equation of state and increases the compressibility of the post-shock gas. We find that the solution remains self-similar because the ellipticity of the propagating blast wave stays constant in time. This enables us to derive an effective ratio of specific heats for a composite of thermal gas and CRs as a function of the maximum acceleration efficiency. We finally discuss the behavior of supernova remnants expanding into a turbulent magnetic field with varying coherence lengths. For a maximum CR acceleration efficiency of about 15 per cent at quasi-parallel shocks (as suggested by kinetic plasma simulations), we find an average efficiency of about 5 per cent, independent of the assumed magnetic coherence length.
Wakefield Computations for the CLIC PETS using the Parallel Finite Element Time-Domain Code T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the high-performance parallel 3D electromagnetic time-domain code, T3P, for simulations of wakefields and transients in complex accelerator structures. T3P is based on advanced higher-order Finite Element methods on unstructured grids with quadratic surface approximation. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with unprecedented accuracy, aiding the design of the next generation of accelerator facilities. Applications to the Compact Linear Collider (CLIC) Power Extraction and Transfer Structure (PETS) are presented.
INVITED TOPICAL REVIEW: Parallel magnetic resonance imaging
NASA Astrophysics Data System (ADS)
Larkman, David J.; Nunes, Rita G.
2007-04-01
Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed.
Linear induction accelerators made from pulse-line cavities with external pulse injection.
Smith, I
1979-06-01
Two types of linear induction accelerator have been reported previously. In one, unidirectional voltage pulses are generated outside the accelerator and injected into the accelerator cavity modules, which contain ferromagnetic material to reduce energy losses in the form of currents induced, in parallel with the beam, in the cavity structure. In the other type, the accelerator cavity modules are themselves pulse-forming lines with energy storage and switches; parallel current losses are made zero by the use of circuits that generate bidirectional acceleration waveforms with a zero voltage-time integral. In a third type of design described here, the cavities are externally driven, and 100% efficient coupling of energy to the beam is obtained by designing the external pulse generators to produce bidirectional voltage waveforms with zero voltage-time integral. A design for such a pulse generator is described that is itself one hundred percent efficient and which is well suited to existing pulse power techniques. Two accelerator cavity designs are described that can couple the pulse from such a generator to the beam; one of these designs provides voltage doubling. Comparison is made between the accelerating gradients that can be obtained with this and the preceding types of induction accelerator.
Means and method for the focusing and acceleration of parallel beams of charged particles
Maschke, Alfred W.
1983-07-05
A novel apparatus and method for focussing beams of charged particles comprising planar arrays of electrostatic quadrupoles. The quadrupole arrays may comprise electrodes which are shared by two or more quadrupoles. Such quadrupole arrays are particularly adapted to providing strong focussing forces for high current, high brightness, beams of charged particles, said beams further comprising a plurality of parallel beams, or beamlets, each such beamlet being focussed by one quadrupole of the array. Such arrays may be incorporated in various devices wherein beams of charged particles are accelerated or transported, such as linear accelerators, klystron tubes, beam transport lines, etc.
Voltage regulation in linear induction accelerators
Parsons, W.M.
1992-12-29
Improvement in voltage regulation in a linear induction accelerator wherein a varistor, such as a metal oxide varistor, is placed in parallel with the beam accelerating cavity and the magnetic core is disclosed. The non-linear properties of the varistor result in a more stable voltage across the beam accelerating cavity than with a conventional compensating resistance. 4 figs.
A GPU-Accelerated Approach for Feature Tracking in Time-Varying Imagery Datasets.
Peng, Chao; Sahani, Sandip; Rushing, John
2017-10-01
We propose a novel parallel connected component labeling (CCL) algorithm along with efficient out-of-core data management to detect and track feature regions of large time-varying imagery datasets. Our approach contributes to the big data field with parallel algorithms tailored for GPU architectures. We remove the data dependency between frames and achieve pixel-level parallelism. Due to the large size, the entire dataset cannot fit into cached memory. Frames have to be streamed through the memory hierarchy (disk to CPU main memory and then to GPU memory), partitioned, and processed as batches, where each batch is small enough to fit into the GPU. To reconnect the feature regions that are separated due to data partitioning, we present a novel batch merging algorithm to extract the region connection information across multiple batches in a parallel fashion. The information is organized in a memory-efficient structure and supports fast indexing on the GPU. Our experiment uses a commodity workstation equipped with a single GPU. The results show that our approach can efficiently process a weather dataset composed of terabytes of time-varying radar images. The advantages of our approach are demonstrated by comparing to the performance of an efficient CPU cluster implementation which is being used by the weather scientists.
GPU-accelerated regularized iterative reconstruction for few-view cone beam CT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matenine, Dmitri, E-mail: dmitri.matenine.1@ulaval.ca; Goussard, Yves, E-mail: yves.goussard@polymtl.ca; Després, Philippe, E-mail: philippe.despres@phy.ulaval.ca
2015-04-15
Purpose: The present work proposes an iterative reconstruction technique designed for x-ray transmission computed tomography (CT). The main objective is to provide a model-based solution to the cone-beam CT reconstruction problem, yielding accurate low-dose images via few-views acquisitions in clinically acceptable time frames. Methods: The proposed technique combines a modified ordered subsets convex (OSC) algorithm and the total variation minimization (TV) regularization technique and is called OSC-TV. The number of subsets of each OSC iteration follows a reduction pattern in order to ensure the best performance of the regularization method. Considering the high computational cost of the algorithm, it ismore » implemented on a graphics processing unit, using parallelization to accelerate computations. Results: The reconstructions were performed on computer-simulated as well as human pelvic cone-beam CT projection data and image quality was assessed. In terms of convergence and image quality, OSC-TV performs well in reconstruction of low-dose cone-beam CT data obtained via a few-view acquisition protocol. It compares favorably to the few-view TV-regularized projections onto convex sets (POCS-TV) algorithm. It also appears to be a viable alternative to full-dataset filtered backprojection. Execution times are of 1–2 min and are compatible with the typical clinical workflow for nonreal-time applications. Conclusions: Considering the image quality and execution times, this method may be useful for reconstruction of low-dose clinical acquisitions. It may be of particular benefit to patients who undergo multiple acquisitions by reducing the overall imaging radiation dose and associated risks.« less
Megavolt parallel potentials arising from double-layer streams in the Earth's outer radiation belt.
Mozer, F S; Bale, S D; Bonnell, J W; Chaston, C C; Roth, I; Wygant, J
2013-12-06
Huge numbers of double layers carrying electric fields parallel to the local magnetic field line have been observed on the Van Allen probes in connection with in situ relativistic electron acceleration in the Earth's outer radiation belt. For one case with adequate high time resolution data, 7000 double layers were observed in an interval of 1 min to produce a 230,000 V net parallel potential drop crossing the spacecraft. Lower resolution data show that this event lasted for 6 min and that more than 1,000,000 volts of net parallel potential crossed the spacecraft during this time. A double layer traverses the length of a magnetic field line in about 15 s and the orbital motion of the spacecraft perpendicular to the magnetic field was about 700 km during this 6 min interval. Thus, the instantaneous parallel potential along a single magnetic field line was the order of tens of kilovolts. Electrons on the field line might experience many such potential steps in their lifetimes to accelerate them to energies where they serve as the seed population for relativistic acceleration by coherent, large amplitude whistler mode waves. Because the double-layer speed of 3100 km/s is the order of the electron acoustic speed (and not the ion acoustic speed) of a 25 eV plasma, the double layers may result from a new electron acoustic mode. Acceleration mechanisms involving double layers may also be important in planetary radiation belts such as Jupiter, Saturn, Uranus, and Neptune, in the solar corona during flares, and in astrophysical objects.
A Semi-flexible 64-channel Receive-only Phased Array for Pediatric Body MRI at 3T
Zhang, Tao; Grafendorfer, Thomas; Cheng, Joseph Y.; Ning, Peigang; Rainey, Bob; Giancola, Mark; Ortman, Sarah; Robb, Fraser J.; Calderon, Paul D.; Hargreaves, Brian A.; Lustig, Michael; Scott, Greig C.; Pauly, John M.; Vasanawala, Shreyas S.
2015-01-01
Purpose To design, construct, and validate a semi-flexible 64-channel receive-only phased array for pediatric body MRI at 3T. Methods A 64-channel receive-only phased array was developed and constructed. The designed flexible coil can easily conform to different patient sizes with non-overlapping coil elements in the transverse plane. It can cover a field of view of up to 44 × 28 cm2 and removes the need for coil repositioning for body MRI patients with multiple clinical concerns. The 64-channel coil was compared with a 32-channel standard coil for signal-to-noise ratio (SNR) and parallel imaging performances on different phantoms. With IRB approval and informed consent/assent, the designed coil was validated on 21 consecutive pediatric patients. Results The pediatric coil provided higher SNR than the standard coil on different phantoms, with the averaged SNR gain at least 23% over a depth of 7 cm along the cross-section of phantoms. It also achieved better parallel imaging performance under moderate acceleration factors. Good image quality (average score 4.6 out of 5) was achieved using the developed pediatric coil in the clinical studies. Conclusion A 64-channel semi-flexible receive-only phased array has been developed and validated to facilitate high quality pediatric body MRI at 3T. PMID:26418283
Accelerating semantic graph databases on commodity clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Castellana, Vito G.; Haglin, David J.
We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.
Electron heating in a Monte Carlo model of a high Mach number, supercritical, collisionless shock
NASA Technical Reports Server (NTRS)
Ellison, Donald C.; Jones, Frank C.
1987-01-01
Preliminary work in the investigation of electron injection and acceleration at parallel shocks is presented. A simple model of electron heating that is derived from a unified shock model which includes the effects of an electrostatic potential jump is described. The unified shock model provides a kinetic description of the injection and acceleration of ions and a fluid description of electron heating at high Mach number, supercritical, and parallel shocks.
Phased array ghost elimination.
Kellman, Peter; McVeigh, Elliot R
2006-05-01
Parallel imaging may be applied to cancel ghosts caused by a variety of distortion mechanisms, including distortions such as off-resonance or local flow, which are space variant. Phased array combining coefficients may be calculated that null ghost artifacts at known locations based on a constrained optimization, which optimizes SNR subject to the nulling constraint. The resultant phased array ghost elimination (PAGE) technique is similar to the method known as sensitivity encoding (SENSE) used for accelerated imaging; however, in this formulation is applied to full field-of-view (FOV) images. The phased array method for ghost elimination may result in greater flexibility in designing acquisition strategies. For example, in multi-shot EPI applications ghosts are typically mitigated by the use of an interleaved phase encode acquisition order. An alternative strategy is to use a sequential, non-interleaved phase encode order and cancel the resultant ghosts using PAGE parallel imaging. Cancellation of ghosts by means of phased array processing makes sequential, non-interleaved phase encode acquisition order practical, and permits a reduction in repetition time, TR, by eliminating the need for echo-shifting. Sequential, non-interleaved phase encode order has benefits of reduced distortion due to off-resonance, in-plane flow and EPI delay misalignment. Furthermore, the use of EPI with PAGE has inherent fat-water separation and has been used to provide off-resonance correction using a technique referred to as lipid elimination with an echo-shifting N/2-ghost acquisition (LEENA), and may further generalized using the multi-point Dixon method. Other applications of PAGE include cancelling ghosts which arise due to amplitude or phase variation during the approach to steady state. Parallel imaging requires estimates of the complex coil sensitivities. In vivo estimates may be derived by temporally varying the phase encode ordering to obtain a full k-space dataset in a scheme similar to the autocalibrating TSENSE method. This scheme is a generalization of the UNFOLD method used for removing aliasing in undersampled acquisitions. The more general scheme may be used to modulate each EPI ghost image to a separate temporal frequency as described in this paper. Copyright (c) 2006 John Wiley & Sons, Ltd.
Phased array ghost elimination
Kellman, Peter; McVeigh, Elliot R.
2007-01-01
Parallel imaging may be applied to cancel ghosts caused by a variety of distortion mechanisms, including distortions such as off-resonance or local flow, which are space variant. Phased array combining coefficients may be calculated that null ghost artifacts at known locations based on a constrained optimization, which optimizes SNR subject to the nulling constraint. The resultant phased array ghost elimination (PAGE) technique is similar to the method known as sensitivity encoding (SENSE) used for accelerated imaging; however, in this formulation is applied to full field-of-view (FOV) images. The phased array method for ghost elimination may result in greater flexibility in designing acquisition strategies. For example, in multi-shot EPI applications ghosts are typically mitigated by the use of an interleaved phase encode acquisition order. An alternative strategy is to use a sequential, non-interleaved phase encode order and cancel the resultant ghosts using PAGE parallel imaging. Cancellation of ghosts by means of phased array processing makes sequential, non-interleaved phase encode acquisition order practical, and permits a reduction in repetition time, TR, by eliminating the need for echo-shifting. Sequential, non-interleaved phase encode order has benefits of reduced distortion due to off-resonance, in-plane flow and EPI delay misalignment. Furthermore, the use of EPI with PAGE has inherent fat-water separation and has been used to provide off-resonance correction using a technique referred to as lipid elimination with an echo-shifting N/2-ghost acquisition (LEENA), and may further generalized using the multi-point Dixon method. Other applications of PAGE include cancelling ghosts which arise due to amplitude or phase variation during the approach to steady state. Parallel imaging requires estimates of the complex coil sensitivities. In vivo estimates may be derived by temporally varying the phase encode ordering to obtain a full k-space dataset in a scheme similar to the autocalibrating TSENSE method. This scheme is a generalization of the UNFOLD method used for removing aliasing in undersampled acquisitions. The more general scheme may be used to modulate each EPI ghost image to a separate temporal frequency as described in this paper. PMID:16705636
Parallel/Vector Integration Methods for Dynamical Astronomy
NASA Astrophysics Data System (ADS)
Fukushima, Toshio
1999-01-01
This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
NASA Astrophysics Data System (ADS)
Song, Y.; Lysak, R. L.
2017-12-01
Parallel electrostatic electric fields provide a powerful mechanism to accelerate auroral particles to high energy in the auroral acceleration region (AAR), creating both quasi-static and Alfvenic discrete aurorae. The total field-aligned current can be written as J||total=J||+J||D, where the displacement current is denoted as J||D=(1/4π)(∂E||/∂t), which describes the E||-generation (Song and Lysak, 2006). The generation of the total field-aligned current is related to spatial gradients of the parallel vorticity caused by the axial torque acting on field-aligned flux tubes in M-I coupling system. It should be noticed that parallel electric fields are not produced by the field-aligned current. In fact, the E||-generation is caused by Alfvenic interaction in the M-I coupling system, and is favored by a low plasma density and the enhanced localized azimuthal magnetic flux. We suggest that the nonlinear interaction of incident and reflected Alfven wave packets in the AAR can create reactive stress concentration, and therefore can generate the parallel electrostatic electric fields together with a seed low density cavity. The generated electric fields will quickly deepen the seed low density cavity, which can effectively create even stronger electrostatic electric fields. The electrostatic electric fields nested in a low density cavity and surrounded by enhanced azimuthal magnetic flux constitute Alfvenic electromagnetic plasma structures, such as Alfvenic Double Layers (DLs). The Poynting flux carried by Alfven waves can continuously supply energy from the generator region to the auroral acceleration region, supporting and sustaining Alfvenic DLs with long-lasting electrostatic electric fields which accelerate auroral particles to high energy. The generation of parallel electric fields and the formation of auroral arcs can redistribute perpendicular mechanical and magnetic stresses in auroral flux tubes, decoupling the magnetosphere from ionosphere drag locally. This may enhance the magnetotail earthward shear flows and rapidly buildup stronger parallel electric fields in the auroral acceleration region, leading to a sudden and violent tail energy release, if there is accumulated free magnetic energy in the tail.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.
1997-12-01
Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
Accelerated 4D self-gated MRI of tibiofemoral kinematics.
Mazzoli, Valentina; Schoormans, Jasper; Froeling, Martijn; Sprengers, Andre M; Coolen, Bram F; Verdonschot, Nico; Strijkers, Gustav J; Nederveen, Aart J
2017-11-01
Anatomical (static) magnetic resonance imaging (MRI) is the most useful imaging technique for the evaluation and assessment of internal derangement of the knee, but does not provide dynamic information and does not allow the study of the interaction of the different tissues during motion. As knee pain is often only experienced during dynamic tasks, the ability to obtain four-dimensional (4D) images of the knee during motion could improve the diagnosis and provide a deeper understanding of the knee joint. In this work, we present a novel approach for dynamic, high-resolution, 4D imaging of the freely moving knee without the need for external triggering. The dominant knee of five healthy volunteers was scanned during a flexion/extension task. To evaluate the effects of non-uniform motion and poor coordination skills on the quality of the reconstructed images, we performed a comparison between fully free movement and movement instructed by a visual cue. The trigger signal for self-gating was extracted using principal component analysis (PCA), and the images were reconstructed using a parallel imaging and compressed sensing reconstruction pipeline. The reconstructed 4D movies were scored for image quality and used to derive bone kinematics through image registration. Using our method, we were able to obtain 4D high-resolution movies of the knee without the need for external triggering hardware. The movies obtained with and without instruction did not differ significantly in terms of image scoring and quantitative values for tibiofemoral kinematics. Our method showed to be robust for the extraction of the self-gating signal even for uninstructed motion. This can make the technique suitable for patients who, as a result of pain, may find it difficult to comply exactly with instructions. Furthermore, bone kinematics can be derived from accelerated MRI without the need for additional hardware for triggering. Copyright © 2017 John Wiley & Sons, Ltd.
16-channel bow tie antenna transceiver array for cardiac MR at 7.0 tesla.
Oezerdem, Celal; Winter, Lukas; Graessl, Andreas; Paul, Katharina; Els, Antje; Weinberger, Oliver; Rieger, Jan; Kuehne, Andre; Dieringer, Matthias; Hezel, Fabian; Voit, Dirk; Frahm, Jens; Niendorf, Thoralf
2016-06-01
To design, evaluate, and apply a bow tie antenna transceiver radiofrequency (RF) coil array tailored for cardiac MRI at 7.0 Tesla (T). The radiofrequency (RF) coil array comprises 16 building blocks each containing a bow tie shaped λ/2-dipole antenna. Numerical simulations were used for transmission field homogenization and RF safety validation. RF characteristics were examined in a phantom study. The array's suitability for high spatial resolution two-dimensional (2D) CINE imaging and for real time imaging of the heart was examined in a volunteer study. The arrays transmission fields and RF characteristics are suitable for cardiac MRI at 7.0T. The coil performance afforded a spatial resolution as good as (0.8 × 0.8 × 2.5) mm(3) for segmented 2D CINE MRI at 7.0T which is by a factor of 12 superior versus standardized protocols used in clinical practice at 1.5T. The proposed transceiver array supports 1D acceleration factors of up to R = 6 without impairing image quality significantly. The 16-channel bow tie antenna transceiver array supports accelerated and high spatial resolution cardiac MRI. The array is compatible with multichannel transmission and provides a technological basis for future clinical assessment of parallel transmission techniques at 7.0 Tesla. Magn Reson Med 75:2553-2565, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Yun, Seong Dae
2017-01-01
The relatively high imaging speed of EPI has led to its widespread use in dynamic MRI studies such as functional MRI. An approach to improve the performance of EPI, EPI with Keyhole (EPIK), has been previously presented and its use in fMRI was verified at 1.5T as well as 3T. The method has been proven to achieve a higher temporal resolution and smaller image distortions when compared to single-shot EPI. Furthermore, the performance of EPIK in the detection of functional signals was shown to be comparable to that of EPI. For these reasons, we were motivated to employ EPIK here for high-resolution imaging. The method was optimised to offer the highest possible in-plane resolution and slice coverage under the given imaging constraints: fixed TR/TE, FOV and acceleration factors for parallel imaging and partial Fourier techniques. The performance of EPIK was evaluated in direct comparison to the optimised protocol obtained from EPI. The two imaging methods were applied to visual fMRI experiments involving sixteen subjects. The results showed that enhanced spatial resolution with a whole-brain coverage was achieved by EPIK (1.00 mm × 1.00 mm; 32 slices) when compared to EPI (1.25 mm × 1.25 mm; 28 slices). As a consequence, enhanced characterisation of functional areas has been demonstrated in EPIK particularly for relatively small brain regions such as the lateral geniculate nucleus (LGN) and superior colliculus (SC); overall, a significantly increased t-value and activation area were observed from EPIK data. Lastly, the use of EPIK for fMRI was validated with the simulation of different types of data reconstruction methods. PMID:28945780
Thread concept for automatic task parallelization in image analysis
NASA Astrophysics Data System (ADS)
Lueckenhaus, Maximilian; Eckstein, Wolfgang
1998-09-01
Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Adaptive and accelerated tracking-learning-detection
NASA Astrophysics Data System (ADS)
Guo, Pengyu; Li, Xin; Ding, Shaowen; Tian, Zunhua; Zhang, Xiaohu
2013-08-01
An improved online long-term visual tracking algorithm, named adaptive and accelerated TLD (AA-TLD) based on Tracking-Learning-Detection (TLD) which is a novel tracking framework has been introduced in this paper. The improvement focuses on two aspects, one is adaption, which makes the algorithm not dependent on the pre-defined scanning grids by online generating scale space, and the other is efficiency, which uses not only algorithm-level acceleration like scale prediction that employs auto-regression and moving average (ARMA) model to learn the object motion to lessen the detector's searching range and the fixed number of positive and negative samples that ensures a constant retrieving time, but also CPU and GPU parallel technology to achieve hardware acceleration. In addition, in order to obtain a better effect, some TLD's details are redesigned, which uses a weight including both normalized correlation coefficient and scale size to integrate results, and adjusts distance metric thresholds online. A contrastive experiment on success rate, center location error and execution time, is carried out to show a performance and efficiency upgrade over state-of-the-art TLD with partial TLD datasets and Shenzhou IX return capsule image sequences. The algorithm can be used in the field of video surveillance to meet the need of real-time video tracking.
Multi-GPU Jacobian accelerated computing for soft-field tomography.
Borsic, A; Attardo, E A; Halter, R J
2012-10-01
Image reconstruction in soft-field tomography is based on an inverse problem formulation, where a forward model is fitted to the data. In medical applications, where the anatomy presents complex shapes, it is common to use finite element models (FEMs) to represent the volume of interest and solve a partial differential equation that models the physics of the system. Over the last decade, there has been a shifting interest from 2D modeling to 3D modeling, as the underlying physics of most problems are 3D. Although the increased computational power of modern computers allows working with much larger FEM models, the computational time required to reconstruct 3D images on a fine 3D FEM model can be significant, on the order of hours. For example, in electrical impedance tomography (EIT) applications using a dense 3D FEM mesh with half a million elements, a single reconstruction iteration takes approximately 15-20 min with optimized routines running on a modern multi-core PC. It is desirable to accelerate image reconstruction to enable researchers to more easily and rapidly explore data and reconstruction parameters. Furthermore, providing high-speed reconstructions is essential for some promising clinical application of EIT. For 3D problems, 70% of the computing time is spent building the Jacobian matrix, and 25% of the time in forward solving. In this work, we focus on accelerating the Jacobian computation by using single and multiple GPUs. First, we discuss an optimized implementation on a modern multi-core PC architecture and show how computing time is bounded by the CPU-to-memory bandwidth; this factor limits the rate at which data can be fetched by the CPU. Gains associated with the use of multiple CPU cores are minimal, since data operands cannot be fetched fast enough to saturate the processing power of even a single CPU core. GPUs have much faster memory bandwidths compared to CPUs and better parallelism. We are able to obtain acceleration factors of 20 times on a single NVIDIA S1070 GPU, and of 50 times on four GPUs, bringing the Jacobian computing time for a fine 3D mesh from 12 min to 14 s. We regard this as an important step toward gaining interactive reconstruction times in 3D imaging, particularly when coupled in the future with acceleration of the forward problem. While we demonstrate results for EIT, these results apply to any soft-field imaging modality where the Jacobian matrix is computed with the adjoint method.
Multi-GPU Jacobian Accelerated Computing for Soft Field Tomography
Borsic, A.; Attardo, E. A.; Halter, R. J.
2012-01-01
Image reconstruction in soft-field tomography is based on an inverse problem formulation, where a forward model is fitted to the data. In medical applications, where the anatomy presents complex shapes, it is common to use Finite Element Models to represent the volume of interest and to solve a partial differential equation that models the physics of the system. Over the last decade, there has been a shifting interest from 2D modeling to 3D modeling, as the underlying physics of most problems are three-dimensional. Though the increased computational power of modern computers allows working with much larger FEM models, the computational time required to reconstruct 3D images on a fine 3D FEM model can be significant, on the order of hours. For example, in Electrical Impedance Tomography applications using a dense 3D FEM mesh with half a million elements, a single reconstruction iteration takes approximately 15 to 20 minutes with optimized routines running on a modern multi-core PC. It is desirable to accelerate image reconstruction to enable researchers to more easily and rapidly explore data and reconstruction parameters. Further, providing high-speed reconstructions are essential for some promising clinical application of EIT. For 3D problems 70% of the computing time is spent building the Jacobian matrix, and 25% of the time in forward solving. In the present work, we focus on accelerating the Jacobian computation by using single and multiple GPUs. First, we discuss an optimized implementation on a modern multi-core PC architecture and show how computing time is bounded by the CPU-to-memory bandwidth; this factor limits the rate at which data can be fetched by the CPU. Gains associated with use of multiple CPU cores are minimal, since data operands cannot be fetched fast enough to saturate the processing power of even a single CPU core. GPUs have a much faster memory bandwidths compared to CPUs and better parallelism. We are able to obtain acceleration factors of 20 times on a single NVIDIA S1070 GPU, and of 50 times on 4 GPUs, bringing the Jacobian computing time for a fine 3D mesh from 12 minutes to 14 seconds. We regard this as an important step towards gaining interactive reconstruction times in 3D imaging, particularly when coupled in the future with acceleration of the forward problem. While we demonstrate results for Electrical Impedance Tomography, these results apply to any soft-field imaging modality where the Jacobian matrix is computed with the Adjoint Method. PMID:23010857
Plasma and energetic particle structure of a collisionless quasi-parallel shock
NASA Technical Reports Server (NTRS)
Kennel, C. F.; Scarf, F. L.; Coroniti, F. V.; Russell, C. T.; Smith, E. J.; Wenzel, K. P.; Reinhard, R.; Sanderson, T. R.; Feldman, W. C.; Parks, G. K.
1983-01-01
The quasi-parallel interplanetary shock of November 11-12, 1978 from both the collisionless shock and energetic particle points of view were studied using measurements of the interplanetary magnetic and electric fields, solar wind electrons, plasma and MHD waves, and intermediate and high energy ions obtained on ISEE-1, -2, and -3. The interplanetary environment through which the shock was propagating when it encountered the three spacecraft was characterized; the observations of this shock are documented and current theories of quasi-parallel shock structure and particle acceleration are tested. These observations tend to confirm present self consistent theories of first order Fermi acceleration by shocks and of collisionless shock dissipation involving firehouse instability.
Electron acceleration by surface plasma waves in double metal surface structure
NASA Astrophysics Data System (ADS)
Liu, C. S.; Kumar, Gagan; Singh, D. B.; Tripathi, V. K.
2007-12-01
Two parallel metal sheets, separated by a vacuum region, support a surface plasma wave whose amplitude is maximum on the two parallel interfaces and minimum in the middle. This mode can be excited by a laser using a glass prism. An electron beam launched into the middle region experiences a longitudinal ponderomotive force due to the surface plasma wave and gets accelerated to velocities of the order of phase velocity of the surface wave. The scheme is viable to achieve beams of tens of keV energy. In the case of a surface plasma wave excited on a single metal-vacuum interface, the field gradient normal to the interface pushes the electrons away from the high field region, limiting the acceleration process. The acceleration energy thus achieved is in agreement with the experimental observations.
Synergia: an accelerator modeling tool with 3-D space charge
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amundson, James F.; Spentzouris, P.; /Fermilab
2004-07-01
High precision modeling of space-charge effects, together with accurate treatment of single-particle dynamics, is essential for designing future accelerators as well as optimizing the performance of existing machines. We describe Synergia, a high-fidelity parallel beam dynamics simulation package with fully three dimensional space-charge capabilities and a higher order optics implementation. We describe the computational techniques, the advanced human interface, and the parallel performance obtained using large numbers of macroparticles. We also perform code benchmarks comparing to semi-analytic results and other codes. Finally, we present initial results on particle tune spread, beam halo creation, and emittance growth in the Fermilab boostermore » accelerator.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
NASA Astrophysics Data System (ADS)
Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon
2017-01-01
With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
Wiens, Curtis N.; Artz, Nathan S.; Jang, Hyungseok; McMillan, Alan B.; Reeder, Scott B.
2017-01-01
Purpose To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. Theory and Methods A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Results Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. Conclusion A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. PMID:27403613
Yue, Chao; Li, Wen; Reeves, Geoffrey D.; ...
2016-07-01
Interactions between interplanetary (IP) shocks and the Earth's magnetosphere manifest many important space physics phenomena including low-energy ion flux enhancements and particle acceleration. In order to investigate the mechanisms driving shock-induced enhancement of low-energy ion flux, we have examined two IP shock events that occurred when the Van Allen Probes were located near the equator while ionospheric and ground observations were available around the spacecraft footprints. We have found that, associated with the shock arrival, electromagnetic fields intensified, and low-energy ion fluxes, including H +, He +, and O +, were enhanced dramatically in both the parallel and perpendicular directions.more » During the 2 October 2013 shock event, both parallel and perpendicular flux enhancements lasted more than 20 min with larger fluxes observed in the perpendicular direction. In contrast, for the 15 March 2013 shock event, the low-energy perpendicular ion fluxes increased only in the first 5 min during an impulse of electric field, while the parallel flux enhancement lasted more than 30 min. In addition, ionospheric outflows were observed after shock arrivals. From a simple particle motion calculation, we found that the rapid response of low-energy ions is due to drifts of plasmaspheric population by the enhanced electric field. Furthermore, the fast acceleration in the perpendicular direction cannot solely be explained by E × B drift but betatron acceleration also plays a role. Adiabatic acceleration may also explain the fast response of the enhanced parallel ion fluxes, while ion outflows may contribute to the enhanced parallel fluxes that last longer than the perpendicular fluxes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yue, Chao; Li, Wen; Reeves, Geoffrey D.
Interactions between interplanetary (IP) shocks and the Earth's magnetosphere manifest many important space physics phenomena including low-energy ion flux enhancements and particle acceleration. In order to investigate the mechanisms driving shock-induced enhancement of low-energy ion flux, we have examined two IP shock events that occurred when the Van Allen Probes were located near the equator while ionospheric and ground observations were available around the spacecraft footprints. We have found that, associated with the shock arrival, electromagnetic fields intensified, and low-energy ion fluxes, including H +, He +, and O +, were enhanced dramatically in both the parallel and perpendicular directions.more » During the 2 October 2013 shock event, both parallel and perpendicular flux enhancements lasted more than 20 min with larger fluxes observed in the perpendicular direction. In contrast, for the 15 March 2013 shock event, the low-energy perpendicular ion fluxes increased only in the first 5 min during an impulse of electric field, while the parallel flux enhancement lasted more than 30 min. In addition, ionospheric outflows were observed after shock arrivals. From a simple particle motion calculation, we found that the rapid response of low-energy ions is due to drifts of plasmaspheric population by the enhanced electric field. Furthermore, the fast acceleration in the perpendicular direction cannot solely be explained by E × B drift but betatron acceleration also plays a role. Adiabatic acceleration may also explain the fast response of the enhanced parallel ion fluxes, while ion outflows may contribute to the enhanced parallel fluxes that last longer than the perpendicular fluxes.« less
GPU-based acceleration of computations in nonlinear finite element deformation analysis.
Mafi, Ramin; Sirouspour, Shahin
2014-03-01
The physics of deformation for biological soft-tissue is best described by nonlinear continuum mechanics-based models, which then can be discretized by the FEM for a numerical solution. However, computational complexity of such models have limited their use in applications requiring real-time or fast response. In this work, we propose a graphic processing unit-based implementation of the FEM using implicit time integration for dynamic nonlinear deformation analysis. This is the most general formulation of the deformation analysis. It is valid for large deformations and strains and can account for material nonlinearities. The data-parallel nature and the intense arithmetic computations of nonlinear FEM equations make it particularly suitable for implementation on a parallel computing platform such as graphic processing unit. In this work, we present and compare two different designs based on the matrix-free and conventional preconditioned conjugate gradients algorithms for solving the FEM equations arising in deformation analysis. The speedup achieved with the proposed parallel implementations of the algorithms will be instrumental in the development of advanced surgical simulators and medical image registration methods involving soft-tissue deformation. Copyright © 2013 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Enomoto, Ayano; Hirata, Hiroshi
2014-02-01
This article describes a feasibility study of parallel image-acquisition using a two-channel surface coil array in continuous-wave electron paramagnetic resonance (CW-EPR) imaging. Parallel EPR imaging was performed by multiplexing of EPR detection in the frequency domain. The parallel acquisition system consists of two surface coil resonators and radiofrequency (RF) bridges for EPR detection. To demonstrate the feasibility of this method of parallel image-acquisition with a surface coil array, three-dimensional EPR imaging was carried out using a tube phantom. Technical issues in the multiplexing method of EPR detection were also clarified. We found that degradation in the signal-to-noise ratio due to the interference of RF carriers is a key problem to be solved.
Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.
Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J
2015-07-05
We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Parallel MR imaging: a user's guide.
Glockner, James F; Hu, Houchun H; Stanley, David W; Angelos, Lisa; King, Kevin
2005-01-01
Parallel imaging is a recently developed family of techniques that take advantage of the spatial information inherent in phased-array radiofrequency coils to reduce acquisition times in magnetic resonance imaging. In parallel imaging, the number of sampled k-space lines is reduced, often by a factor of two or greater, thereby significantly shortening the acquisition time. Parallel imaging techniques have only recently become commercially available, and the wide range of clinical applications is just beginning to be explored. The potential clinical applications primarily involve reduction in acquisition time, improved spatial resolution, or a combination of the two. Improvements in image quality can be achieved by reducing the echo train lengths of fast spin-echo and single-shot fast spin-echo sequences. Parallel imaging is particularly attractive for cardiac and vascular applications and will likely prove valuable as 3-T body and cardiovascular imaging becomes part of standard clinical practice. Limitations of parallel imaging include reduced signal-to-noise ratio and reconstruction artifacts. It is important to consider these limitations when deciding when to use these techniques. (c) RSNA, 2005.
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.
Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less
Chang, Gregory; Wiggins, Graham C.; Xia, Ding; Lattanzi, Riccardo; Madelin, Guillaume; Raya, Jose G.; Finnerty, Matthew; Fujita, Hiroyuki; Recht, Michael P.; Regatte, Ravinder R.
2011-01-01
Purpose To compare a new birdcage-transmit, 28 channel-receive array (28 Ch) coil and a quadrature volume coil for 7 Tesla morphologic MRI and T2 mapping of knee cartilage. Methods The right knees of ten healthy subjects were imaged on a 7 Tesla whole body MR scanner using both coils. 3-dimensional fast low-angle shot (3D-FLASH) and multi-echo spin-echo (MESE) sequences were implemented. Cartilage signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), thickness, and T2 values were assessed. Results SNR/CNR was 17–400% greater for the 28 Ch compared to the quadrature coil (p≤0.005). Bland-Altman plots show mean differences between measurements of tibial/femoral cartilage thickness and T2 values obtained with each coil to be small (−0.002±0.009 cm/0.003±0.011 cm) and large (−6.8±6.7 ms/−8.2±9.7 ms), respectively. For the 28 Ch coil, when parallel imaging with acceleration factors (AF) 2, 3, and 4 was performed, SNR retained was: 62–69%, 51–55%, and 39–45%. Conclusion A 28 Ch knee coil provides increased SNR/CNR for 7T cartilage morphologic imaging and T2 mapping. Coils should be switched with caution during clinical studies because T2 values may differ. The greater SNR of the 28 Ch coil could be used to perform parallel imaging with AF2 and obtain similar SNR as the quadrature coil. PMID:22095723
Durham extremely large telescope adaptive optics simulation platform.
Basden, Alastair; Butterley, Timothy; Myers, Richard; Wilson, Richard
2007-03-01
Adaptive optics systems are essential on all large telescopes for which image quality is important. These are complex systems with many design parameters requiring optimization before good performance can be achieved. The simulation of adaptive optics systems is therefore necessary to categorize the expected performance. We describe an adaptive optics simulation platform, developed at Durham University, which can be used to simulate adaptive optics systems on the largest proposed future extremely large telescopes as well as on current systems. This platform is modular, object oriented, and has the benefit of hardware application acceleration that can be used to improve the simulation performance, essential for ensuring that the run time of a given simulation is acceptable. The simulation platform described here can be highly parallelized using parallelization techniques suited for adaptive optics simulation, while still offering the user complete control while the simulation is running. The results from the simulation of a ground layer adaptive optics system are provided as an example to demonstrate the flexibility of this simulation platform.
Thalhammer, Christof; Renz, Wolfgang; Winter, Lukas; Hezel, Fabian; Rieger, Jan; Pfeiffer, Harald; Graessl, Andreas; Seifert, Frank; Hoffmann, Werner; von Knobelsdorff-Brenkenhoff, Florian; Tkachenko, Valeriy; Schulz-Menger, Jeanette; Kellman, Peter; Niendorf, Thoralf
2012-01-01
Purpose To design, evaluate and apply a two-dimensional 16 channel transmit/receive coil array tailored for cardiac MRI at 7.0 Tesla. Material and Methods The cardiac coil array consists of 2 sections each using 8 elements arranged in a 2 × 4 array. RF safety was validated by SAR simulations. Cardiac imaging was performed using 2D CINE FLASH imaging, T2* mapping and fat-water separation imaging. The characteristics of the coil array were analyzed including parallel imaging performance, left ventricular chamber quantification and overall image quality. Results RF characteristics were found to be appropriate for all subjects included in the study. The SAR values derived from the simulations fall well in the limits of legal guidelines. The baseline SNR advantage at 7.0 T was put to use to acquire 2D CINE images of the heart with a very high spatial resolution of (1 × 1 × 4) mm3. The proposed coil array supports 1D acceleration factors of up to R=4 without impairing image quality significantly. Conclusions The 16 channel TX/RX coil has the capability to acquire high contrast and high spatial resolution images of the heart at 7.0 Tesla. PMID:22706727
NASA Astrophysics Data System (ADS)
Zheng, F. L.; Wu, S. Z.; Wu, H. C.; Zhou, C. T.; Cai, H. B.; Yu, M. Y.; Tajima, T.; Yan, X. Q.; He, X. T.
2013-01-01
Proton acceleration by ultra-intense laser pulse irradiating a target with cross-section smaller than the laser spot size and connected to a parabolic density channel is investigated. The target splits the laser into two parallel propagating parts, which snowplow the back-side plasma electrons along their paths, creating two adjacent parallel wakes and an intense return current in the gap between them. The radiation-pressure pre-accelerated target protons trapped in the wake fields now undergo acceleration as well as collimation by the quasistatic wake electrostatic and magnetic fields. Particle-in-cell simulations show that stable long-distance acceleration can be realized, and a 30 fs monoenergetic ion beam of >10 GeV peak energy and <2° divergence can be produced by a circularly polarized laser pulse at an intensity of about 1022 W/cm2.
State of the art in electromagnetic modeling for the Compact Linear Collider
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, Arno; Kabel, Andreas; Lee, Lie-Quan
SLAC's Advanced Computations Department (ACD) has developed the parallel 3D electromagnetic time-domain code T3P for simulations of wakefields and transients in complex accelerator structures. T3P is based on state-of-the-art Finite Element methods on unstructured grids and features unconditional stability, quadratic surface approximation and up to 6th-order vector basis functions for unprecedented simulation accuracy. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with fast turn-around times, aiding the design of the next generation of accelerator facilities. Applications include simulations of the proposed two-beam accelerator structures for the Compact Linear Collider (CLIC) - wakefieldmore » damping in the Power Extraction and Transfer Structure (PETS) and power transfer to the main beam accelerating structures are investigated.« less
Electron acceleration to high energies at quasi-parallel shock waves in the solar corona
NASA Technical Reports Server (NTRS)
Mann, G.; Classen, H.-T.
1995-01-01
In the solar corona shock waves are generated by flares and/or coronal mass ejections. They manifest themselves in solar type 2 radio bursts appearing as emission stripes with a slow drift from high to low frequencies in dynamic radio spectra. Their nonthermal radio emission indicates that electrons are accelerated to suprathermal and/or relativistic velocities at these shocks. As well known by extraterrestrial in-situ measurements supercritical, quasi-parallel, collisionless shocks are accompanied by so-called SLAMS (short large amplitude magnetic field structures). These SLAMS can act as strong magnetic mirrors, at which charged particles can be reflected and accelerated. Thus, thermal electrons gain energy due to multiple reflections between two SLAMS and reach suprathermal and relativistic velocities. This mechanism of accelerating electrons is discussed for circumstances in the solar corona and may be responsible for the so-called 'herringbones' observed in solar type 2 radio bursts.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Image Processing Using a Parallel Architecture.
1987-12-01
ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than
Wiens, Curtis N; Artz, Nathan S; Jang, Hyungseok; McMillan, Alan B; Reeder, Scott B
2017-06-01
To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. Magn Reson Med 77:2303-2309, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
The Particle Accelerator Simulation Code PyORBIT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorlov, Timofey V; Holmes, Jeffrey A; Cousineau, Sarah M
2015-01-01
The particle accelerator simulation code PyORBIT is presented. The structure, implementation, history, parallel and simulation capabilities, and future development of the code are discussed. The PyORBIT code is a new implementation and extension of algorithms of the original ORBIT code that was developed for the Spallation Neutron Source accelerator at the Oak Ridge National Laboratory. The PyORBIT code has a two level structure. The upper level uses the Python programming language to control the flow of intensive calculations performed by the lower level code implemented in the C++ language. The parallel capabilities are based on MPI communications. The PyORBIT ismore » an open source code accessible to the public through the Google Open Source Projects Hosting service.« less
NASA Astrophysics Data System (ADS)
Chytyk-Praznik, Krista Joy
Radiation therapy is continuously increasing in complexity due to technological innovation in delivery techniques, necessitating thorough dosimetric verification. Comparing accurately predicted portal dose images to measured images obtained during patient treatment can determine if a particular treatment was delivered correctly. The goal of this thesis was to create a method to predict portal dose images that was versatile and accurate enough to use in a clinical setting. All measured images in this work were obtained with an amorphous silicon electronic portal imaging device (a-Si EPID), but the technique is applicable to any planar imager. A detailed, physics-motivated fluence model was developed to characterize fluence exiting the linear accelerator head. The model was further refined using results from Monte Carlo simulations and schematics of the linear accelerator. The fluence incident on the EPID was converted to a portal dose image through a superposition of Monte Carlo-generated, monoenergetic dose kernels specific to the a-Si EPID. Predictions of clinical IMRT fields with no patient present agreed with measured portal dose images within 3% and 3 mm. The dose kernels were applied ignoring the geometrically divergent nature of incident fluence on the EPID. A computational investigation into this parallel dose kernel assumption determined its validity under clinically relevant situations. Introducing a patient or phantom into the beam required the portal image prediction algorithm to account for patient scatter and attenuation. Primary fluence was calculated by attenuating raylines cast through the patient CT dataset, while scatter fluence was determined through the superposition of pre-calculated scatter fluence kernels. Total dose in the EPID was calculated by convolving the total predicted incident fluence with the EPID-specific dose kernels. The algorithm was tested on water slabs with square fields, agreeing with measurement within 3% and 3 mm. The method was then applied to five prostate and six head-and-neck IMRT treatment courses (˜1900 clinical images). Deviations between the predicted and measured images were quantified. The portal dose image prediction model developed in this thesis work has been shown to be accurate, and it was demonstrated to be able to verify patients' delivered radiation treatments.
Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian
2017-03-28
Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation.
Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian
2017-01-01
Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation. PMID:28350358
The HyperV 8000 μg, 50 km/s Plasma Railgun for PLX
NASA Astrophysics Data System (ADS)
Brockington, Samuel; Case, Andrew; Messer, Sarah; Wu, Linchun; Witherspoon, F. Douglas
2012-10-01
HyperV has developed a gas fed, pulsed, plasma railgun which accelerates 8000 μg of argon to 50 km/s meeting the performance requirements originally specified for the Plasma Liner Experiment (PLX). The present 2.5 cm square-bore plasma railgun forms plasma armatures from high density neutral gas, pre-ionizes it electro-thermally, and accelerates the armature with 30 cm long parallel-plate railgun electrodes driven by a pulse forming network (PFN). A high voltage, high current linear array spark-gap switch and flexible, low-inductance transmission line were designed and constructed to handle the increased current load. We will describe these systems and present initial performance data from high current operation of the plasma rail gun from spectroscopy, interferometry, and imaging systems. Measurements of momentum, pressure, magnetic field, and other optical diagnostics will also be discussed as well as plans for upcoming experimentation to increase performance even further. Work supported by USDOE under DE-FG02-05ER54810 and DE-FG02-08ER85114.
Ha, S; Matej, S; Ispiryan, M; Mueller, K
2013-02-01
We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.
NASA Astrophysics Data System (ADS)
Ha, S.; Matej, S.; Ispiryan, M.; Mueller, K.
2013-02-01
We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.
Effects of Shock and Turbulence Properties on Electron Acceleration
NASA Astrophysics Data System (ADS)
Qin, G.; Kong, F.-J.; Zhang, L.-H.
2018-06-01
Using test particle simulations, we study electron acceleration at collisionless shocks with a two-component model turbulent magnetic field with slab component including dissipation range. We investigate the importance of the shock-normal angle θ Bn, magnetic turbulence level {(b/{B}0)}2, and shock thickness on the acceleration efficiency of electrons. It is shown that at perpendicular shocks the electron acceleration efficiency is enhanced with the decrease of {(b/{B}0)}2, and at {(b/{B}0)}2=0.01 the acceleration becomes significant due to a strong drift electric field with long time particles staying near the shock front for shock drift acceleration (SDA). In addition, at parallel shocks the electron acceleration efficiency is increasing with the increase of {(b/{B}0)}2, and at {(b/{B}0)}2=10.0 the acceleration is very strong due to sufficient pitch-angle scattering for first-order Fermi acceleration, as well as due to the large local component of the magnetic field perpendicular to the shock-normal angle for SDA. On the other hand, the high perpendicular shock acceleration with {(b/{B}0)}2=0.01 is stronger than the high parallel shock acceleration with {(b/{B}0)}2=10.0, the reason might be the assumption that SDA is more efficient than first-order Fermi acceleration. Furthermore, for oblique shocks, the acceleration efficiency is small no matter whether the turbulence level is low or high. Moreover, for the effect of shock thickness on electron acceleration at perpendicular shocks, we show that there exists the bendover thickness, L diff,b. The acceleration efficiency does not noticeably change if the shock thickness is much smaller than L diff,b. However, if the shock thickness is much larger than L diff,b, the acceleration efficiency starts to drop abruptly.
Two-stage Electron Acceleration by 3D Collisionless Guide-field Magnetic Reconnection
NASA Astrophysics Data System (ADS)
Buechner, J.; Munoz, P.
2017-12-01
We discuss a two-stage process of electron acceleration near X-lines of 3D collisionless guide-field magnetic reconnection. Non-relativistic electrons are first pre-accelerated by magnetic-field-aligned (parallel) electric fields. At the nonlinear stage of 3D guide-field magnetic reconnection electric and magnetic fields become filamentary structured due to streaming instabilities. This causes an additional curvature-driven electron acceleration in the guide-field direction. The resulting spectrum of the accelerated electrons follows a power law.
A self-calibrated angularly continuous 2D GRAPPA kernel for propeller trajectories
Skare, Stefan; Newbould, Rexford D; Nordell, Anders; Holdsworth, Samantha J; Bammer, Roland
2008-01-01
The k-space readout of propeller-type sequences may be accelerated by the use of parallel imaging (PI). For PROPELLER, the main benefits are reduced blurring due to T2 decay and SAR reduction, while for EPI-based propeller acquisitions such as Turbo-PROP and SAP-EPI, the faster k-space traversal alleviates geometric distortions. In this work, the feasibility of calculating a 2D GRAPPA kernel on only the undersampled propeller blades themselves is explored, using the matching orthogonal undersampled blade. It is shown that the GRAPPA kernel varies slowly across blades, therefore an angularly continuous 2D GRAPPA kernel is proposed, in which the angular variation of the weights is parameterized. This new angularly continuous kernel formulation greatly increases the numerical stability of the GRAPPA weight estimation, allowing the generation of fully sampled diagnostic quality images using only the undersampled propeller data. PMID:19025911
Real-time digital holographic microscopy using the graphic processing unit.
Shimobaba, Tomoyoshi; Sato, Yoshikuni; Miura, Junya; Takenouchi, Mai; Ito, Tomoyoshi
2008-08-04
Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to calculate the Fresnel diffraction by the FFT algorithm. In this paper, we describe a real-time DHM system using a graphic processing unit (GPU) with many stream processors, which allows use as a highly parallel processor. The computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs. The real-time DHM system can obtain reconstructed images from holograms whose size is 512 x 512 grids in 24 frames per second.
Hydrodynamics of laser-driven double-foil collisions studied by orthogonal x-ray imaging
NASA Astrophysics Data System (ADS)
Aglitskiy, Y.; Metzler, N.; Karasik, M.; Serlin, V.; Obenschain, S. P.; Schmitt, A. J.; Velikovich, A. L.; Gardner, J. H.; Weaver, J.; Oh, J.
2006-10-01
With this experiment we start the study of the physics of hydrodynamic instability seeding and growth during the deceleration and stagnation phases. Our first targets consisted of two separated parallel plastic foils -- flat and rippled. The flat foil was irradiated by the 4 ns Nike KrF laser pulses at 50 TW/cm^2 and accelerated towards the rippled one. Orthogonal imaging, i. e., a simultaneous side-on and face-on radiography of the targets has been used in these experiments. Side-on x-ray radiography and VISAR data yield shock and target velocities before and after the collision. Face-on streaks revealed well-pronounced oscillatory behavior of the single-mode mass perturbations. Both sets of synchronized data were compared with 1D and 2D simulations. Observed velocities, timing and the peak value of areal mass variation are in good agreement with the simulated ones.
Parallel Algorithms for Image Analysis.
1982-06-01
8217 _ _ _ _ _ _ _ 4. TITLE (aid Subtitle) S. TYPE OF REPORT & PERIOD COVERED PARALLEL ALGORITHMS FOR IMAGE ANALYSIS TECHNICAL 6. PERFORMING O4G. REPORT NUMBER TR-1180...Continue on reverse side it neceesary aid Identlfy by block number) Image processing; image analysis ; parallel processing; cellular computers. 20... IMAGE ANALYSIS TECHNICAL 6. PERFORMING ONG. REPORT NUMBER TR-1180 - 7. AUTHOR(&) S. CONTRACT OR GRANT NUMBER(s) Azriel Rosenfeld AFOSR-77-3271 9
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce
NASA Astrophysics Data System (ADS)
Chen, Xi; Zhou, Liqing
2015-12-01
With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
Jiang, Hanyu; Ganesan, Narayan
2016-02-27
HMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors. A Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance. CUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other acceleration attempts, comprehensive evaluations against high-end CPUs (Intel i5, i7 and Xeon) shows that CUDAMPF yields upto 440 GCUPS for SSV, 277 GCUPS for MSV and 14.3 GCUPS for P7Viterbi all with 100 % accuracy, which translates to a maximum speedup of 37.5, 23.1 and 11.6-fold for MSV, SSV and P7Viterbi respectively. The source code is available at https://github.com/Super-Hippo/CUDAMPF.
MRXCAT: Realistic numerical phantoms for cardiovascular magnetic resonance
2014-01-01
Background Computer simulations are important for validating novel image acquisition and reconstruction strategies. In cardiovascular magnetic resonance (CMR), numerical simulations need to combine anatomical information and the effects of cardiac and/or respiratory motion. To this end, a framework for realistic CMR simulations is proposed and its use for image reconstruction from undersampled data is demonstrated. Methods The extended Cardiac-Torso (XCAT) anatomical phantom framework with various motion options was used as a basis for the numerical phantoms. Different tissue, dynamic contrast and signal models, multiple receiver coils and noise are simulated. Arbitrary trajectories and undersampled acquisition can be selected. The utility of the framework is demonstrated for accelerated cine and first-pass myocardial perfusion imaging using k-t PCA and k-t SPARSE. Results MRXCAT phantoms allow for realistic simulation of CMR including optional cardiac and respiratory motion. Example reconstructions from simulated undersampled k-t parallel imaging demonstrate the feasibility of simulated acquisition and reconstruction using the presented framework. Myocardial blood flow assessment from simulated myocardial perfusion images highlights the suitability of MRXCAT for quantitative post-processing simulation. Conclusion The proposed MRXCAT phantom framework enables versatile and realistic simulations of CMR including breathhold and free-breathing acquisitions. PMID:25204441
Chen, Nan-kuei; Guidon, Arnaud; Chang, Hing-Chiu; Song, Allen W.
2013-01-01
Diffusion weighted magnetic resonance imaging (DWI) data have been mostly acquired with single-shot echo-planar imaging (EPI) to minimize motion induced artifacts. The spatial resolution, however, is inherently limited in single-shot EPI, even when the parallel imaging (usually at an acceleration factor of 2) is incorporated. Multi-shot acquisition strategies could potentially achieve higher spatial resolution and fidelity, but they are generally susceptible to motion-induced phase errors among excitations that are exacerbated by diffusion sensitizing gradients, rendering the reconstructed images unusable. It has been shown that shot-to-shot phase variations may be corrected using navigator echoes, but at the cost of imaging throughput. To address these challenges, a novel and robust multi-shot DWI technique, termed multiplexed sensitivity-encoding (MUSE), is developed here to reliably and inherently correct nonlinear shot-to-shot phase variations without the use of navigator echoes. The performance of the MUSE technique is confirmed experimentally in healthy adult volunteers on 3 Tesla MRI systems. This newly developed technique should prove highly valuable for mapping brain structures and connectivities at high spatial resolution for neuroscience studies. PMID:23370063
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-12-01
Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-01-01
Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
Magnetic Field Would Reduce Electron Backstreaming in Ion Thrusters
NASA Technical Reports Server (NTRS)
Foster, John E.
2003-01-01
The imposition of a magnetic field has been proposed as a means of reducing the electron backstreaming problem in ion thrusters. Electron backstreaming refers to the backflow of electrons into the ion thruster. Backstreaming electrons are accelerated by the large potential difference that exists between the ion-thruster acceleration electrodes, which otherwise accelerates positive ions out of the engine to develop thrust. The energetic beam formed by the backstreaming electrons can damage the discharge cathode, as well as other discharge surfaces upstream of the acceleration electrodes. The electron-backstreaming condition occurs when the center potential of the ion accelerator grid is no longer sufficiently negative to prevent electron diffusion back into the ion thruster. This typically occurs over extended periods of operation as accelerator-grid apertures enlarge due to erosion. As a result, ion thrusters are required to operate at increasingly negative accelerator-grid voltages in order to prevent electron backstreaming. These larger negative voltages give rise to higher accelerator grid erosion rates, which in turn accelerates aperture enlargement. Electron backstreaming due to accelerator-gridhole enlargement has been identified as a failure mechanism that will limit ionthruster service lifetime. The proposed method would make it possible to not only reduce the electron backstreaming current at and below the backstreaming voltage limit, but also reduce the backstreaming voltage limit itself. This reduction in the voltage at which electron backstreaming occurs provides operating margin and thereby reduces the magnitude of negative voltage that must be placed on the accelerator grid. Such a reduction reduces accelerator- grid erosion rates. The basic idea behind the proposed method is to impose a spatially uniform magnetic field downstream of the accelerator electrode that is oriented transverse to the thruster axis. The magnetic field must be sufficiently strong to impede backstreaming electrons, but not so strong as to significantly perturb ion trajectories. An electromagnet or permanent magnetic circuit can be used to impose the transverse magnetic field downstream of the accelerator-grid electrode. For example, in the case of an accelerator grid containing straight, parallel rows of apertures, one can apply nearly uniform magnetic fields across all the apertures by the use of permanent magnets of alternating polarity connected to pole pieces laid out parallel to the rows, as shown in the left part of the figure. For low-temperature operation, the pole pieces can be replaced with bar magnets of alternating polarity. Alternatively, for the same accelerator grid, one could use an electromagnet in the form of current-carrying rods laid out parallel to the rows.
Hinton-Yates, Denise P; Cury, Ricardo C; Wald, Lawrence L; Wiggins, Graham C; Keil, Boris; Seethmaraju, Ravi; Gangadharamurthy, Dakshinamurthy; Ogilvy, Christopher S; Dai, Guangping; Houser, Stuart L; Stone, James R; Furie, Karen L
2007-10-01
The aim of this article is to evaluate 3.0 T magnetic resonance imaging for characterization of vessel morphology and plaque composition. Emphasis is placed on early and moderate stages of carotid atherosclerosis, where increases in signal-to-noise (SNR) and contrast-to-noise (CNR) ratios compared with 1.5 T are sought. Comparison of in vivo 3.0 T imaging to histopathology is performed for validation. Parallel acceleration methods applied with an 8-channel carotid array are investigated as well as higher field ex vivo imaging to explore even further gains. The overall endeavor is to improve prospective assessment of atherosclerosis stage and stability for reduction of atherothrombotic event risk. A total of 10 male and female subjects ranging in age from 22 to 72 years (5 healthy and 5 with cardiovascular disease) participated. Custom-built array coils were used with endogenous and exogenous multicontrast bright and black-blood protocols for 3.0 T carotid imaging. Comparisons were performed to 1.5 T, and ex vivo plaque was stained with hematoxylin and eosin for histology. Imaging (9.4 T) was also performed on intact specimens. The factor of 2 gain in signal-to-noise SNR is realized compared with 1.5 T along with improved wall-lumen and plaque component CNR. Post-contrast black-blood imaging within 5-10 minutes of gadolinium injection is optimal for detection of the necrotic lipid component. In a preliminary 18-month follow-up study, this method provided measurement of a 50% reduction in lipid content with minimal change in plaque size in a subject receiving aggressive statin therapy. Parallel imaging applied with signal averaging further improves 3.0 T black-blood vessel wall imaging. The use of 3.0 T for carotid plaque imaging has demonstrated increases in SNR and CNR compared with 1.5 T. Quantitative prospective studies of moderate and early plaques are feasible at 3.0 T. Continued improvements in coil arrays, 3-dimensional pulse sequences, and the use of novel molecular imaging agents implemented at high field will further improve magnetic resonance plaque characterization.
Xia, Fei; Jin, Guoqing
2014-06-01
PKNOTS is a most famous benchmark program and has been widely used to predict RNA secondary structure including pseudoknots. It adopts the standard four-dimensional (4D) dynamic programming (DP) method and is the basis of many variants and improved algorithms. Unfortunately, the O(N(6)) computing requirements and complicated data dependency greatly limits the usefulness of PKNOTS package with the explosion in gene database size. In this paper, we present a fine-grained parallel PKNOTS package and prototype system for accelerating RNA folding application based on FPGA chip. We adopted a series of storage optimization strategies to resolve the "Memory Wall" problem. We aggressively exploit parallel computing strategies to improve computational efficiency. We also propose several methods that collectively reduce the storage requirements for FPGA on-chip memory. To the best of our knowledge, our design is the first FPGA implementation for accelerating 4D DP problem for RNA folding application including pseudoknots. The experimental results show a factor of more than 50x average speedup over the PKNOTS-1.08 software running on a PC platform with Intel Core2 Q9400 Quad CPU for input RNA sequences. However, the power consumption of our FPGA accelerator is only about 50% of the general-purpose micro-processors.
NASA Astrophysics Data System (ADS)
Akhavan-Tafti, M.; Slavin, J. A.; Eastwood, J. P.; Cassak, P.; Gershman, D. J.; Zhao, C.
2017-12-01
Flux Transfer Events (FTEs) are transient signatures of magnetic reconnection at the dayside magnetopause and play significant roles in determining the rate of reconnection and accelerating particles. This study investigates the magnetohydrodynamic forces inside and outside FTEs to infer the process through which these structures become force-free and uses electron dynamics to study the mechanisms for particle acceleration within the FTE. Akhavan-Tafti et al. [2017] demonstrated that ion-scale FTEs contain regions of elevated plasma density which greatly contribute to plasma pressure forces inside FTEs. It is shown that as FTEs evolve, the plasma is evacuated as the core magnetic field strengthens, hence becoming more force-free. The neighboring ion-scale FTEs formed at the subsolar magnetopause due to multiple X-line reconnection are forced to interact, and likely coalesce. Entropy is invoked to motivate the discussion on the essential role of coalescence in reconfiguring magnetic fields and current density distributions inside FTEs to allow for the adiabatic growth of these structures. Here, we present observational evidence which shows that, in the absence of coalescence, FTEs can become less force free. Local electron kinematics is studied to compare the contributions of parallel electric field, Fermi acceleration, and betatron acceleration mechanisms to particle heating. Acceleration due to parallel electric fields are shown to be dominant in the vicinity of the reconnection site while betatron acceleration controls perpendicular heating inside the FTE in the presence of magnetic pressure gradients. In the downstream of the reconnection site, the `freshly' reconnected field lines start to straighten due to the magnetic curvature force. Straightening field lines accelerate trapped electrons parallel to the local magnetic field (i.e., first-order Fermi acceleration). These acceleration mechanisms are shown to explain the observed anisotropic pitch angle distributions at the core and at the edges of FTEs. Finally, the forces inside non-flux rope-type FTEs (due to coalescence, expansion, contraction, or division) are shown to contribute to selective plasma heating, hence giving rise to anisotropic plasma temperatures and the subsequent wave activities (e.g. propagation of whistler waves).
Acceleration of charged particles by crossed cyclotron waves, Resonant Moments Method
NASA Astrophysics Data System (ADS)
Ponomarjov, M.; Carati, D.
A mechanism for enhanced acceleration of charged particles in crossing radio frequency or micro waves propagating at different angles with respect to an external magnetic field is investigated. This mechanism consists in introducing low amplitude secondary waves in order to improve the parallel momentum transfer from the high amplitude primary wave to charged particles. The use of two parallel counter-propagating waves has recently been considered (Gell and Nakach, 1999) and numerical tests (Louies et al, 2001) have shown that the two-wave scheme may lead to higher averaged parallel velocity. On the other hand, it has been concluded that it may be more effective to accelerate electrons when the waves propagate obliquely to the external magnetic field (Karimabadi and Angelopoulos 1989, Cohen et al 1991). The idea considered here is similar although no constraint is imposed on the refraction indices of the primary and the secondary waves. The theoretical analysis of the acceleration mechanism is based on the Resonance Moments Method (RMM) in which moments of the velocity distribution are computed by using an averages over the resonant layers (RL)i only instead of a complete phase-space average. The quantities obtained using this approach, referred to as Resonant Moments (RM), suggest the existence of optimal angles of propagation for the primary and secondary waves as long as the maximization of the parallel flux of charged particles is considered. The fraction of charged particles that are close to the resonance conditions, that correspond to the RL, becomes then as important as the time these particles remain resonant. The secondary wave tends to maintain a pseudo-equilibrium velocity distribution by continuously re-filling the RL. Our suggestions are confirmed by direct numerical simulations for a populations of 105 relativistic electrons. The secondary wave yields a clear increase (up to one order of magnitude) of the average parallel velocity of the particles. It is a quite promising result since the amplitude of the secondary wave is ten times lower the one of the first wave. Qualitative results give one of the enhanced acceleration mechanisms of the charged particles (including relativistic electrons in planetary magnetospheres) by the crossed cyclotron waves in ambient magnetic field.
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure
Persaud, A.; Ji, Q.; Feinberg, E.; ...
2017-06-08
Here, a new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number ofmore » parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further red ucing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Finally, ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.« less
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure
NASA Astrophysics Data System (ADS)
Persaud, A.; Ji, Q.; Feinberg, E.; Seidl, P. A.; Waldron, W. L.; Schenkel, T.; Lal, A.; Vinayakumar, K. B.; Ardanuc, S.; Hammer, D. A.
2017-06-01
A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number of parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further reducing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure.
Persaud, A; Ji, Q; Feinberg, E; Seidl, P A; Waldron, W L; Schenkel, T; Lal, A; Vinayakumar, K B; Ardanuc, S; Hammer, D A
2017-06-01
A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number of parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further reducing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.
Filipovic, Nenad D.
2017-01-01
Image segmentation is one of the most common procedures in medical imaging applications. It is also a very important task in breast cancer detection. Breast cancer detection procedure based on mammography can be divided into several stages. The first stage is the extraction of the region of interest from a breast image, followed by the identification of suspicious mass regions, their classification, and comparison with the existing image database. It is often the case that already existing image databases have large sets of data whose processing requires a lot of time, and thus the acceleration of each of the processing stages in breast cancer detection is a very important issue. In this paper, the implementation of the already existing algorithm for region-of-interest based image segmentation for mammogram images on High-Performance Reconfigurable Dataflow Computers (HPRDCs) is proposed. As a dataflow engine (DFE) of such HPRDC, Maxeler's acceleration card is used. The experiments for examining the acceleration of that algorithm on the Reconfigurable Dataflow Computers (RDCs) are performed with two types of mammogram images with different resolutions. There were, also, several DFE configurations and each of them gave a different acceleration value of algorithm execution. Those acceleration values are presented and experimental results showed good acceleration. PMID:28611851
Milankovic, Ivan L; Mijailovic, Nikola V; Filipovic, Nenad D; Peulic, Aleksandar S
2017-01-01
Image segmentation is one of the most common procedures in medical imaging applications. It is also a very important task in breast cancer detection. Breast cancer detection procedure based on mammography can be divided into several stages. The first stage is the extraction of the region of interest from a breast image, followed by the identification of suspicious mass regions, their classification, and comparison with the existing image database. It is often the case that already existing image databases have large sets of data whose processing requires a lot of time, and thus the acceleration of each of the processing stages in breast cancer detection is a very important issue. In this paper, the implementation of the already existing algorithm for region-of-interest based image segmentation for mammogram images on High-Performance Reconfigurable Dataflow Computers (HPRDCs) is proposed. As a dataflow engine (DFE) of such HPRDC, Maxeler's acceleration card is used. The experiments for examining the acceleration of that algorithm on the Reconfigurable Dataflow Computers (RDCs) are performed with two types of mammogram images with different resolutions. There were, also, several DFE configurations and each of them gave a different acceleration value of algorithm execution. Those acceleration values are presented and experimental results showed good acceleration.
Adaptive Acceleration of Visually Evoked Smooth Eye Movements in Mice
2016-01-01
The optokinetic response (OKR) consists of smooth eye movements following global motion of the visual surround, which suppress image slip on the retina for visual acuity. The effective performance of the OKR is limited to rather slow and low-frequency visual stimuli, although it can be adaptably improved by cerebellum-dependent mechanisms. To better understand circuit mechanisms constraining OKR performance, we monitored how distinct kinematic features of the OKR change over the course of OKR adaptation, and found that eye acceleration at stimulus onset primarily limited OKR performance but could be dramatically potentiated by visual experience. Eye acceleration in the temporal-to-nasal direction depended more on the ipsilateral floccular complex of the cerebellum than did that in the nasal-to-temporal direction. Gaze-holding following the OKR was also modified in parallel with eye-acceleration potentiation. Optogenetic manipulation revealed that synchronous excitation and inhibition of floccular complex Purkinje cells could effectively accelerate eye movements in the nasotemporal and temporonasal directions, respectively. These results collectively delineate multiple motor pathways subserving distinct aspects of the OKR in mice and constrain hypotheses regarding cellular mechanisms of the cerebellum-dependent tuning of movement acceleration. SIGNIFICANCE STATEMENT Although visually evoked smooth eye movements, known as the optokinetic response (OKR), have been studied in various species for decades, circuit mechanisms of oculomotor control and adaptation remain elusive. In the present study, we assessed kinematics of the mouse OKR through the course of adaptation training. Our analyses revealed that eye acceleration at visual-stimulus onset primarily limited working velocity and frequency range of the OKR, yet could be dramatically potentiated during OKR adaptation. Potentiation of eye acceleration exhibited different properties between the nasotemporal and temporonasal OKRs, indicating distinct visuomotor circuits underlying the two. Lesions and optogenetic manipulation of the cerebellum provide constraints on neural circuits mediating visually driven eye acceleration and its adaptation. PMID:27335412
NASA Astrophysics Data System (ADS)
Helm, Anton; Vieira, Jorge; Silva, Luis; Fonseca, Ricardo
2016-10-01
Laser-driven accelerators gained an increased attention over the past decades. Typical modeling techniques for laser wakefield acceleration (LWFA) are based on particle-in-cell (PIC) simulations. PIC simulations, however, are very computationally expensive due to the disparity of the relevant scales ranging from the laser wavelength, in the micrometer range, to the acceleration length, currently beyond the ten centimeter range. To minimize the gap between these despair scales the ponderomotive guiding center (PGC) algorithm is a promising approach. By describing the evolution of the laser pulse envelope separately, only the scales larger than the plasma wavelength are required to be resolved in the PGC algorithm, leading to speedups in several orders of magnitude. Previous work was limited to two dimensions. Here we present the implementation of the 3D version of a PGC solver into the massively parallel, fully relativistic PIC code OSIRIS. We extended the solver to include periodic boundary conditions and parallelization in all spatial dimensions. We present benchmarks for distributed and shared memory parallelization. We also discuss the stability of the PGC solver.
ION INJECTION AT QUASI-PARALLEL SHOCKS SEEN BY THE CLUSTER SPACECRAFT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johlander, A.; Vaivads, A.; Khotyaintsev, Yu. V.
2016-01-20
Collisionless shocks in space plasma are known to be capable of accelerating ions to very high energies through diffusive shock acceleration (DSA). This process requires an injection of suprathermal ions, but the mechanisms producing such a suprathermal ion seed population are still not fully understood. We study acceleration of solar wind ions resulting from reflection off short large-amplitude magnetic structures (SLAMSs) in the quasi-parallel bow shock of Earth using in situ data from the four Cluster spacecraft. Nearly specularly reflected solar wind ions are observed just upstream of a SLAMS. The reflected ions are undergoing shock drift acceleration (SDA) andmore » obtain energies higher than the solar wind energy upstream of the SLAMS. Our test particle simulations show that solar wind ions with lower energy are more likely to be reflected off the SLAMS, while high-energy ions pass through the SLAMS, which is consistent with the observations. The process of SDA at SLAMSs can provide an effective way of accelerating solar wind ions to suprathermal energies. Therefore, this could be a mechanism of ion injection into DSA in astrophysical plasmas.« less
Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.
2014-01-01
Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P
2014-07-01
Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Fast Image Subtraction Using Multi-cores and GPUs
NASA Astrophysics Data System (ADS)
Hartung, Steven; Shukla, H.
2013-01-01
Many important image processing techniques in astronomy require a massive number of computations per pixel. Among them is an image differencing technique known as Optimal Image Subtraction (OIS), which is very useful for detecting and characterizing transient phenomena. Like many image processing routines, OIS computations increase proportionally with the number of pixels being processed, and the number of pixels in need of processing is increasing rapidly. Utilizing many-core graphical processing unit (GPU) technology in a hybrid conjunction with multi-core CPU and computer clustering technologies, this work presents a new astronomy image processing pipeline architecture. The chosen OIS implementation focuses on the 2nd order spatially-varying kernel with the Dirac delta function basis, a powerful image differencing method that has seen limited deployment in part because of the heavy computational burden. This tool can process standard image calibration and OIS differencing in a fashion that is scalable with the increasing data volume. It employs several parallel processing technologies in a hierarchical fashion in order to best utilize each of their strengths. The Linux/Unix based application can operate on a single computer, or on an MPI configured cluster, with or without GPU hardware. With GPU hardware available, even low-cost commercial video cards, the OIS convolution and subtraction times for large images can be accelerated by up to three orders of magnitude.
Dregely, Isabel; Ruset, Iulian C.; Wiggins, Graham; Mareyam, Azma; Mugler, John P.; Altes, Talissa A.; Meyer, Craig; Ruppert, Kai; Wald, Lawrence L.; Hersman, F. William
2012-01-01
Hyperpolarized xenon-129 (HP Xe) has the potential to become a non-invasive contrast agent for lung MRI. In addition to its utility for imaging of ventilated airspaces, the property of xenon to dissolve in lung tissue and blood upon inhalation provides the opportunity to study gas exchange. Implementations of imaging protocols for obtaining regional parameters that exploit the dissolved phase are limited by the available signal-to-noise ratio (SNR), excitation homogeneity, and length of acquisition times. To address these challenges, a 32-channel receive-array coil complemented by an asymmetric birdcage transmit coil tuned to the HP Xe resonance at 3T was developed. First results of spin-density imaging in healthy subjects and subjects with obstructive lung disease demonstrated the improvements in image quality by high resolution ventilation images with high SNR. Parallel imaging performance of the phased-array coil was demonstrated by acceleration factors up to three in 2D acquisitions and up to six in 3D acquisitions. Transmit-field maps showed a regional variation of only 8% across the whole lung. The newly developed phased-array receive coil with the birdcage transmit coil will lead to an improvement in existing imaging protocols, but moreover enable the development of new, functional lung imaging protocols based on the improvements in excitation homogeneity, SNR, and acquisition speed. PMID:23132336
NASA Technical Reports Server (NTRS)
Mattson, D. L.
1975-01-01
The effect of prolonged angular acceleration on choice reaction time to an accelerating visual stimulus was investigated, with 10 commercial airline pilots serving as subjects. The pattern of reaction times during and following acceleration was compared with the pattern of velocity estimates reported during identical trials. Both reaction times and velocity estimates increased at the onset of acceleration, declined prior to the termination of acceleration, and showed an aftereffect. These results are inconsistent with the torsion-pendulum theory of semicircular canal function and suggest that the vestibular adaptation is of central origin.
Generation of auroral kilometric radiation and the structure of auroral acceleration region
NASA Technical Reports Server (NTRS)
Lee, L. C.; Kan, J. R.; Wu, C. S.
1980-01-01
Generation of auroral kilometric radiation (AKR) in the auroral acceleration region is studied. It is shown that auroral kilometric radiation can be generated by backscattered electrons trapped in the acceleration region via a cyclotron maser process. The parallel electric field in the acceleration region is required to be distributed over 1-2 earth radii. The observed AKR frequency spectrum can be used to estimate the altitude range of the auroral acceleration region. The altitudes of the lower and upper boundaries of the acceleration region determined from the AKR data are respectively approximately 2000 and 9000 km.
NASA Astrophysics Data System (ADS)
Zerr, Robert Joseph
2011-12-01
The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)
Acceleration of low-energy ions at parallel shocks with a focused transport model
Zuo, Pingbing; Zhang, Ming; Rassoul, Hamid K.
2013-04-10
Here, we present a test particle simulation on the injection and acceleration of low-energy suprathermal particles by parallel shocks with a focused transport model. The focused transport equation contains all necessary physics of shock acceleration, but avoids the limitation of diffusive shock acceleration (DSA) that requires a small pitch angle anisotropy. This simulation verifies that the particles with speeds of a fraction of to a few times the shock speed can indeed be directly injected and accelerated into the DSA regime by parallel shocks. At higher energies starting from a few times the shock speed, the energy spectrum of acceleratedmore » particles is a power law with the same spectral index as the solution of standard DSA theory, although the particles are highly anisotropic in the upstream region. The intensity, however, is different from that predicted by DSA theory, indicating a different level of injection efficiency. It is found that the shock strength, the injection speed, and the intensity of an electric cross-shock potential (CSP) jump can affect the injection efficiency of the low-energy particles. A stronger shock has a higher injection efficiency. In addition, if the speed of injected particles is above a few times the shock speed, the produced power-law spectrum is consistent with the prediction of standard DSA theory in both its intensity and spectrum index with an injection efficiency of 1. CSP can increase the injection efficiency through direct particle reflection back upstream, but it has little effect on the energetic particle acceleration once the speed of injected particles is beyond a few times the shock speed. This test particle simulation proves that the focused transport theory is an extension of DSA theory with the capability of predicting the efficiency of particle injection.« less
Solution of a tridiagonal system of equations on the finite element machine
NASA Technical Reports Server (NTRS)
Bostic, S. W.
1984-01-01
Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.
a method of gravity and seismic sequential inversion and its GPU implementation
NASA Astrophysics Data System (ADS)
Liu, G.; Meng, X.
2011-12-01
In this abstract, we introduce a gravity and seismic sequential inversion method to invert for density and velocity together. For the gravity inversion, we use an iterative method based on correlation imaging algorithm; for the seismic inversion, we use the full waveform inversion. The link between the density and velocity is an empirical formula called Gardner equation, for large volumes of data, we use the GPU to accelerate the computation. For the gravity inversion method , we introduce a method based on correlation imaging algorithm,it is also a interative method, first we calculate the correlation imaging of the observed gravity anomaly, it is some value between -1 and +1, then we multiply this value with a little density ,this value become the initial density model. We get a forward reuslt with this initial model and also calculate the correaltion imaging of the misfit of observed data and the forward data, also multiply the correaltion imaging result a little density and add it to the initial model, then do the same procedure above , at last ,we can get a inversion density model. For the seismic inveron method ,we use a mothod base on the linearity of acoustic wave equation written in the frequency domain,with a intial velociy model, we can get a good velocity result. In the sequential inversion of gravity and seismic , we need a link formula to convert between density and velocity ,in our method , we use the Gardner equation. Driven by the insatiable market demand for real time, high-definition 3D images, the programmable NVIDIA Graphic Processing Unit (GPU) as co-processor of CPU has been developed for high performance computing. Compute Unified Device Architecture (CUDA) is a parallel programming model and software environment provided by NVIDIA designed to overcome the challenge of using traditional general purpose GPU while maintaining a low learn curve for programmers familiar with standard programming languages such as C. In our inversion processing, we use the GPU to accelerate our gravity and seismic inversion. Taking the gravity inversion as an example, its kernels are gravity forward simulation and correlation imaging, after the parallelization in GPU, in 3D case,the inversion module, the original five CPU loops are reduced to three,the forward module the original five CPU loops are reduced to two. Acknowledgments We acknowledge the financial support of Sinoprobe project (201011039 and 201011049-03), the Fundamental Research Funds for the Central Universities (2010ZY26 and 2011PY0183), the National Natural Science Foundation of China (41074095) and the Open Project of State Key Laboratory of Geological Processes and Mineral Resources (GPMR0945).
2011-06-01
induction accelerator with a voltage output of 18MeV at a current of 3kA. The electron beam is focused onto a tantalum target to produce X-rays. The... capacitors in each bank, half of which are charged in parallel positively, and the other half are negatively charged in parallel. The charge voltage can...be varied from ±30kV to ±40kV. The Marx capacitors are fired in series into the Blumleins with up to 400kV 2µS output. Figure 1 FXR Pulsed Power
GPU-Based Simulation of Ultrasound Imaging Artifacts for Cryosurgery Training.
Keelan, Robert; Shimada, Kenji; Rabin, Yoed
2017-02-01
This study presents an efficient computational technique for the simulation of ultrasound imaging artifacts associated with cryosurgery based on nonlinear ray tracing. This study is part of an ongoing effort to develop computerized training tools for cryosurgery, with prostate cryosurgery as a development model. The capability of performing virtual cryosurgical procedures on a variety of test cases is essential for effective surgical training. Simulated ultrasound imaging artifacts include reverberation and reflection of the cryoprobes in the unfrozen tissue, reflections caused by the freezing front, shadowing caused by the frozen region, and tissue property changes in repeated freeze-thaw cycles procedures. The simulated artifacts appear to preserve the key features observed in a clinical setting. This study displays an example of how training may benefit from toggling between the undisturbed ultrasound image, the simulated temperature field, the simulated imaging artifacts, and an augmented hybrid presentation of the temperature field superimposed on the ultrasound image. The proposed method is demonstrated on a graphic processing unit at 100 frames per second, on a mid-range personal workstation, at two orders of magnitude faster than a typical cryoprocedure. This performance is based on computation with C++ accelerated massive parallelism and its interoperability with the DirectX-rendering application programming interface.
GPU-Based Simulation of Ultrasound Imaging Artifacts for Cryosurgery Training
Keelan, Robert; Shimada, Kenji
2016-01-01
This study presents an efficient computational technique for the simulation of ultrasound imaging artifacts associated with cryosurgery based on nonlinear ray tracing. This study is part of an ongoing effort to develop computerized training tools for cryosurgery, with prostate cryosurgery as a development model. The capability of performing virtual cryosurgical procedures on a variety of test cases is essential for effective surgical training. Simulated ultrasound imaging artifacts include reverberation and reflection of the cryoprobes in the unfrozen tissue, reflections caused by the freezing front, shadowing caused by the frozen region, and tissue property changes in repeated freeze–thaw cycles procedures. The simulated artifacts appear to preserve the key features observed in a clinical setting. This study displays an example of how training may benefit from toggling between the undisturbed ultrasound image, the simulated temperature field, the simulated imaging artifacts, and an augmented hybrid presentation of the temperature field superimposed on the ultrasound image. The proposed method is demonstrated on a graphic processing unit at 100 frames per second, on a mid-range personal workstation, at two orders of magnitude faster than a typical cryoprocedure. This performance is based on computation with C++ accelerated massive parallelism and its interoperability with the DirectX-rendering application programming interface. PMID:26818026
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Ferrandi, Fabrizio
Emerging applications such as data mining, bioinformatics, knowledge discovery, social network analysis are irregular. They use data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructures grids, which generates unpredictable memory accesses. These data structures usually are large, but difficult to partition. These applications mostly are memory bandwidth bounded and have high synchronization intensity. However, they also have large amounts of inherent dynamic parallelism, because they potentially perform a task for each one of the element they are exploring. Several efforts are looking at accelerating these applications on hybrid architectures, which integrate general purpose processorsmore » with reconfigurable devices. Some solutions, which demonstrated significant speedups, include custom-hand tuned accelerators or even full processor architectures on the reconfigurable logic. In this paper we present an approach for the automatic synthesis of accelerators from C, targeted at irregular applications. In contrast to typical High Level Synthesis paradigms, which construct a centralized Finite State Machine, our approach generates dynamically scheduled hardware components. While parallelism exploitation in typical HLS-generated accelerators is usually bound within a single execution flow, our solution allows concurrently running multiple execution flow, thus also exploiting the coarser grain task parallelism of irregular applications. Our approach supports multiple, multi-ported and distributed memories, and atomic memory operations. Its main objective is parallelizing as many memory operations as possible, independently from their execution time, to maximize the memory bandwidth utilization. This significantly differs from current HLS flows, which usually consider a single memory port and require precise scheduling of memory operations. A key innovation of our approach is the generation of a memory interface controller, which dynamically maps concurrent memory accesses to multiple ports. We present a case study on a typical irregular kernel, Graph Breadth First search (BFS), exploring different tradeoffs in terms of parallelism and number of memories.« less
Design and Development of a Megavoltage CT Scanner for Radiation Therapy.
NASA Astrophysics Data System (ADS)
Chen, Ching-Tai
A Varian 4 MeV isocentric therapy accelerator has been modified to perform also as a CT scanner. The goal is to provide low cost computed tomography capability for use in radiotherapy. The system will have three principal uses. These are (i) to provide 2- and 3-dimensional maps of electron density distribution for CT assisted therapy planning, (ii) to aid in patient set up by providing sectional views of the treatment volume and high contrast scout-mode verification images and (iii) to provide a means for periodically checking the patients anatomical conformation against what was used to generate the original therapy plan. The treatment machine was modified by mounting an array of detectors on a frame bolted to the counter weight end of the gantry in such a manner as to define a 'third generation' CT Scanner geometry. The data gathering is controlled by a Z-80 based microcomputer system which transfers the x-ray transmission data to a general purpose PDP 11/34 for processing. There a series of calibration processes and a logarithmic conversion are performed to get projection data. After reordering the projection data to an equivalent parallel beam sinogram format a convolution algorithm is employed to construct the image from the equivalent parallel projection data. Results of phantom studies have shown a spatial resolution of 2.6 mm and an electron density discrimination of less than 1% which are sufficiently good for accurate therapy planning. Results also show that the system is linear to within the precision of our measurement ((DBLTURN).75%) over a wide range of electron densities corresponding to those found in body tissues. Animal and human images are also presented to demonstrate that the system's imaging capability is sufficient to allow the necessary visualization of anatomy.
Demi, Libertario; Viti, Jacopo; Kusters, Lieneke; Guidi, Francesco; Tortoli, Piero; Mischi, Massimo
2013-11-01
The speed of sound in the human body limits the achievable data acquisition rate of pulsed ultrasound scanners. To overcome this limitation, parallel beamforming techniques are used in ultrasound 2-D and 3-D imaging systems. Different parallel beamforming approaches have been proposed. They may be grouped into two major categories: parallel beamforming in reception and parallel beamforming in transmission. The first category is not optimal for harmonic imaging; the second category may be more easily applied to harmonic imaging. However, inter-beam interference represents an issue. To overcome these shortcomings and exploit the benefit of combining harmonic imaging and high data acquisition rate, a new approach has been recently presented which relies on orthogonal frequency division multiplexing (OFDM) to perform parallel beamforming in transmission. In this paper, parallel transmit beamforming using OFDM is implemented for the first time on an ultrasound scanner. An advanced open platform for ultrasound research is used to investigate the axial resolution and interbeam interference achievable with parallel transmit beamforming using OFDM. Both fundamental and second-harmonic imaging modalities have been considered. Results show that, for fundamental imaging, axial resolution in the order of 2 mm can be achieved in combination with interbeam interference in the order of -30 dB. For second-harmonic imaging, axial resolution in the order of 1 mm can be achieved in combination with interbeam interference in the order of -35 dB.
NASA Astrophysics Data System (ADS)
Knight, Silvin P.; Browne, Jacinta E.; Meaney, James F.; Smith, David S.; Fagan, Andrew J.
2016-10-01
A novel anthropomorphic flow phantom device has been developed, which can be used for quantitatively assessing the ability of magnetic resonance imaging (MRI) scanners to accurately measure signal/concentration time-intensity curves (CTCs) associated with dynamic contrast-enhanced (DCE) MRI. Modelling of the complex pharmacokinetics of contrast agents as they perfuse through the tumour capillary network has shown great promise for cancer diagnosis and therapy monitoring. However, clinical adoption has been hindered by methodological problems, resulting in a lack of consensus regarding the most appropriate acquisition and modelling methodology to use and a consequent wide discrepancy in published data. A heretofore overlooked source of such discrepancy may arise from measurement errors of tumour CTCs deriving from the imaging pulse sequence itself, while the effects on the fidelity of CTC measurement of using rapidly-accelerated sequences such as parallel imaging and compressed sensing remain unknown. The present work aimed to investigate these features by developing a test device in which ‘ground truth’ CTCs were generated and presented to the MRI scanner for measurement, thereby allowing for an assessment of the DCE-MRI protocol to accurately measure this curve shape. The device comprised a four-pump flow system wherein CTCs derived from prior patient prostate data were produced in measurement chambers placed within the imaged volume. The ground truth was determined as the mean of repeat measurements using an MRI-independent, custom-built optical imaging system. In DCE-MRI experiments, significant discrepancies between the ground truth and measured CTCs were found for both tumorous and healthy tissue-mimicking curve shapes. Pharmacokinetic modelling revealed errors in measured K trans, v e and k ep values of up to 42%, 31%, and 50% respectively, following a simple variation of the parallel imaging factor and number of signal averages in the acquisition protocol. The device allows for the quantitative assessment and standardisation of DCE-MRI protocols (both existing and emerging).
Studies in optical parallel processing. [All optical and electro-optic approaches
NASA Technical Reports Server (NTRS)
Lee, S. H.
1978-01-01
Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.
Acceleration of Particles Near Earth's Bow Shock
NASA Astrophysics Data System (ADS)
Sandroos, A.
2012-12-01
Collisionless shock waves, for example, near planetary bodies or driven by coronal mass ejections, are a key source of energetic particles in the heliosphere. When the solar wind hits Earth's bow shock, some of the incident particles get reflected back towards the Sun and are accelerated in the process. Reflected ions are responsible for the creation of a turbulent foreshock in quasi-parallel regions of Earth's bow shock. We present first results of foreshock macroscopic structure and of particle distributions upstream of Earth's bow shock, obtained with a new 2.5-dimensional self-consistent diffusive shock acceleration model. In the model particles' pitch angle scattering rates are calculated from Alfvén wave power spectra using quasilinear theory. Wave power spectra in turn are modified by particles' energy changes due to the scatterings. The new model has been implemented on massively parallel simulation platform Corsair. We have used an earlier version of the model to study ion acceleration in a shock-shock interaction event (Hietala, Sandroos, and Vainio, 2012).
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures
NASA Technical Reports Server (NTRS)
Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.
2012-01-01
In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.
Testud, Frederik; Gallichan, Daniel; Layton, Kelvin J; Barmet, Christoph; Welz, Anna M; Dewdney, Andrew; Cocosco, Chris A; Pruessmann, Klaas P; Hennig, Jürgen; Zaitsev, Maxim
2015-03-01
PatLoc (Parallel Imaging Technique using Localized Gradients) accelerates imaging and introduces a resolution variation across the field-of-view. Higher-dimensional encoding employs more spatial encoding magnetic fields (SEMs) than the corresponding image dimensionality requires, e.g. by applying two quadratic and two linear spatial encoding magnetic fields to reconstruct a 2D image. Images acquired with higher-dimensional single-shot trajectories can exhibit strong artifacts and geometric distortions. In this work, the source of these artifacts is analyzed and a reliable correction strategy is derived. A dynamic field camera was built for encoding field calibration. Concomitant fields of linear and nonlinear spatial encoding magnetic fields were analyzed. A combined basis consisting of spherical harmonics and concomitant terms was proposed and used for encoding field calibration and image reconstruction. A good agreement between the analytical solution for the concomitant fields and the magnetic field simulations of the custom-built PatLoc SEM coil was observed. Substantial image quality improvements were obtained using a dynamic field camera for encoding field calibration combined with the proposed combined basis. The importance of trajectory calibration for single-shot higher-dimensional encoding is demonstrated using the combined basis including spherical harmonics and concomitant terms, which treats the concomitant fields as an integral part of the encoding. © 2014 Wiley Periodicals, Inc.
Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying
2013-12-01
Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids
NASA Astrophysics Data System (ADS)
Ma, Xinrong; Duan, Zhijian
2018-04-01
High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
MMS observations and hybrid simulations of rippled and reforming quasi-parallel shocks
NASA Astrophysics Data System (ADS)
Gingell, I.; Schwartz, S. J.; Burgess, D.; Johlander, A.; Russell, C. T.; Burch, J. L.; Ergun, R.; Fuselier, S. A.; Gershman, D. J.; Giles, B. L.; Goodrich, K.; Khotyaintsev, Y. V.; Lavraud, B.; Lindqvist, P. A.; Strangeway, R. J.; Trattner, K. J.; Torbert, R. B.; Wilder, F. D.
2017-12-01
Surface ripples, i.e. deviations in the nominal local shock orientation, are expected to propagate in the ramp and overshoot of collisionless shocks. These ripples have typically been associated with observations and simulations of quasi-perpendicular shocks. We present observations of a crossing of Earth's marginally quasi-parallel (θBn ˜ 45°) bow shock by the MMS spacecraft on 2015-11-27 06:01:44 UTC, for which we identify signatures consistent with a propagating surface ripple. In order to demonstrate the differences between ripples at quasi-perpendicular and quasi-parallel shocks, we also present two-dimensional hybrid simulations over a range of shock normal angles θBn under the observed solar wind conditions. We show that in the quasi-parallel cases surface ripples are transient phenomena modulated by the cyclic reformation of the shock front. These ripples develop faster than an ion gyroperiod and only during the period of the reformation cycle when a newly developed shock ramp is unaffected by turbulence in the foot. We conclude that the change of properties of the surface ripple observed by MMS while crossing Earth's quasi-parallel bow shock are consistent with the influence of cyclic reformation on shock structure. Given that both surface ripples and cyclic reformation are expected to affect the acceleration of electrons within the shock, the interaction of these phenomena and any other sources of shock non-stationary are important for models of particle acceleration. We therefore discuss signatures of electron heating and acceleration in several rippled shocks observed by MMS.
The Origin Of Most Cosmic Rays: The Acceleration By E(parallel)
NASA Astrophysics Data System (ADS)
Colgate, Stirling A.; Li, H.
2008-03-01
We suggest a universal view of the origin of almost all cosmic rays. We propose that nearly every accelerated CRs was initially part of the parallel current that maintains most all force-free, twisted magnetic fields. We point out the greatest fraction of the free energy of magnetic fields in the universe likely resides in force-free fields as opposed to force-bounded ones, because the velocity of twisting, the ponder motive force, is small compared to local Alven speed. We suggest that these helical fields and the particles that they accelerate are distributed nearly uniformly and consequently are near space-filling with some notable exceptions. Charged particles are accelerated by the E( parallel to the magnetic field B) produced by the dissipation of the free energy of these fields by the progressive diffusive loss of "run-away" accelerated current-carrying charged particles from the "core" of the helical fields. Such diffusive loss is first identified as reconnection, but instead potentiates a much larger irreversible loss of highly accelerated anisotropic run-away current carrier particles. We suggest, as in fusion confinement experiments, that there exists a universal, highly robust, diffusion coefficient, D, resulting in D 1% of Bohm diffusion, as has been found in all confinement experiments, possibly driven by drift waves and, or collision-less, tearing modes. The consequential current carrier loss along the resulting tangled field lines is sufficient to account for the energy, number and spectrum of nearly all CR acceleration, both galactic as well as extra galactic. The spectrum is determined by a loss fraction dn/n -dE/E where dn D E-3/2 resulting in dn/dE = E/E0-2.5 up to 1022 ev. Only mass accretion onto SMBHs can supply the energy necessary, 1060 ergs, to fill the IGM with a CR spectrum of Γ 2.6. (Supported by the DOE)
A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.
Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming
2017-06-16
Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.
Volumetric visualization algorithm development for an FPGA-based custom computing machine
NASA Astrophysics Data System (ADS)
Sallinen, Sami J.; Alakuijala, Jyrki; Helminen, Hannu; Laitinen, Joakim
1998-05-01
Rendering volumetric medical images is a burdensome computational task for contemporary computers due to the large size of the data sets. Custom designed reconfigurable hardware could considerably speed up volume visualization if an algorithm suitable for the platform is used. We present an algorithm and speedup techniques for visualizing volumetric medical CT and MR images with a custom-computing machine based on a Field Programmable Gate Array (FPGA). We also present simulated performance results of the proposed algorithm calculated with a software implementation running on a desktop PC. Our algorithm is capable of generating perspective projection renderings of single and multiple isosurfaces with transparency, simulated X-ray images, and Maximum Intensity Projections (MIP). Although more speedup techniques exist for parallel projection than for perspective projection, we have constrained ourselves to perspective viewing, because of its importance in the field of radiotherapy. The algorithm we have developed is based on ray casting, and the rendering is sped up by three different methods: shading speedup by gradient precalculation, a new generalized version of Ray-Acceleration by Distance Coding (RADC), and background ray elimination by speculative ray selection.
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations
Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W.
2016-01-01
Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures. PMID:26904094
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations.
Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W
2016-01-01
Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.
X-PROP: a fast and robust diffusion-weighted propeller technique.
Li, Zhiqiang; Pipe, James G; Lee, Chu-Yu; Debbins, Josef P; Karis, John P; Huo, Donglai
2011-08-01
Diffusion-weighted imaging (DWI) has shown great benefits in clinical MR exams. However, current DWI techniques have shortcomings of sensitivity to distortion or long scan times or combinations of the two. Diffusion-weighted echo-planar imaging (EPI) is fast but suffers from severe geometric distortion. Periodically rotated overlapping parallel lines with enhanced reconstruction diffusion-weighted imaging (PROPELLER DWI) is free of geometric distortion, but the scan time is usually long and imposes high Specific Absorption Rate (SAR) especially at high fields. TurboPROP was proposed to accelerate the scan by combining signal from gradient echoes, but the off-resonance artifacts from gradient echoes can still degrade the image quality. In this study, a new method called X-PROP is presented. Similar to TurboPROP, it uses gradient echoes to reduce the scan time. By separating the gradient and spin echoes into individual blades and removing the off-resonance phase, the off-resonance artifacts in X-PROP are minimized. Special reconstruction processes are applied on these blades to correct for the motion artifacts. In vivo results show its advantages over EPI, PROPELLER DWI, and TurboPROP techniques. Copyright © 2011 Wiley-Liss, Inc.
Ferrand, Guillaume; Luong, Michel; Cloos, Martijn A; Amadon, Alexis; Wackernagel, Hans
2014-08-01
Transmit arrays have been developed to mitigate the RF field inhomogeneity commonly observed in high field magnetic resonance imaging (MRI), typically above 3T. To this end, the knowledge of the RF complex-valued B1 transmit-sensitivities of each independent radiating element has become essential. This paper details a method to speed up a currently available B1-calibration method. The principle relies on slice undersampling, slice and channel interleaving and kriging, an interpolation method developed in geostatistics and applicable in many domains. It has been demonstrated that, under certain conditions, kriging gives the best estimator of a field in a region of interest. The resulting accelerated sequence allows mapping a complete set of eight volumetric field maps of the human head in about 1 min. For validation, the accuracy of kriging is first evaluated against a well-known interpolation technique based on Fourier transform as well as to a B1-maps interpolation method presented in the literature. This analysis is carried out on simulated and decimated experimental B1 maps. Finally, the accelerated sequence is compared to the standard sequence on a phantom and a volunteer. The new sequence provides B1 maps three times faster with a loss of accuracy limited potentially to about 5%.
Richter, P R; Schuster, M; Meyer, I; Lebert, M; Häder, D-P
2006-12-01
The effects of the calcium sequester EGTA on gravitactic orientation and membrane potential changes in the unicellular flagellate Euglena gracilis were investigated during a recent parabolic-flight experiment aboard of an Airbus A300. In the course of a flight parabola, an acceleration profile is achieved which yields subsequently about 20 s of hypergravity (1.8 g(n)), about 20 s of microgravity, and another 20 s of hypergravity phases. The movement behavior of the cells was investigated with real-time, computer-based image analysis. Membrane potential changes were detected with a newly developed photometer which measures absorption changes of the membrane potential-sensitive probe oxonol VI. To test whether the data obtained by the oxonol device were reliable, the signal of non-oxonol-labelled cells was recorded. In these samples, no absorption shift was detected. Changes of the oxonol VI signals indicate that the cells depolarize during acceleration (very obvious in the step from microgravity to hypergravity) and slightly hyperpolarize in microgravity, which can possibly be explained with the action of Ca-ATPases. These signals (mainly the depolarization) were significantly suppressed in the presence of EGTA (5 mM). Gravitaxis in parallel was also inhibited after addition of EGTA. Initially, negative gravitaxis was inverted into a positive one. Later, gravitaxis was almost undetectable.
High Current Systems for HyperV and PLX Plasma Railguns
NASA Astrophysics Data System (ADS)
Brockington, S.; Case, A.; Messer, S.; Elton, R.; Witherspoon, F. D.
2011-10-01
HyperV is developing gas fed, pulsed, plasma railgun accelerators for PLX and other high momentum plasma applications. The present 2.5 cm square-bore plasma railgun forms plasma armatures from high density neutral gas (argon), preionizes it electrothermally, and accelerates the armature with 30 cm long parallel-plate railgun electrodes driven by a pulse forming network (PFN). Recent experiments have successfully formed and accelerated plasma armatures of ~4 mg at 40 km/s, with PFN currents of ~400 kA. In order to further increase railgun performance to the PLX design goal of 8 mg at 50 km/s, the PFN was upgraded to support currents of up to ~750 kA. A high voltage, high current linear array spark-gap switch and flexible, low-inductance transmission line were designed and constructed to handle the increased current load. We will describe these systems and present initial performance data from high current operation of the plasma rail gun from spectroscopy, interferometry, and imaging systems as well as pressure, magnetic field, and optical diagnostics. High current performance of railgun bore materials for electrodes and insulators will also be discussed as well as plans for upcoming experimentation with advanced materials. Supported by the U.S. DOE Joint Program in HEDLP.
Komarov, Ivan; D'Souza, Roshan M
2012-01-01
The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
Field aligned flows driven by neutral puffing at MAST
NASA Astrophysics Data System (ADS)
Waters, I.; Frerichs, H.; Silburn, S.; Feng, Y.; Harrison, J.; Kirk, A.; Schmitz, O.
2018-06-01
Neutral deuterium gas puffing at the high field side of the mega ampere spherical tokamak (MAST) is shown to drive carbon impurity flows that are aligned with the trajectory of the magnetic field lines in the plasma scrape-off-layer. These impurity flows were directly imaged with emissions from C2+ ions at MAST by coherence imaging spectroscopy and were qualitatively reproduced in deuterium plasmas by modeling with the EMC3-EIRENE plasma edge fluid and kinetic neutral transport code. A reduced one-dimensional momentum and particle balance shows that a localized increase in the static plasma pressure in front of the neutral gas puff yields an acceleration of the plasma due to local ionization. Perpendicular particle transport yields a decay from which a parallel length scale can be determined. Parameter scans in EMC3-EIRENE were carried out to determine the sensitivity of the deuterium plasma flow phenomena to local fueling and diffusion parameters and it is found that these flows robustly form across a wide variety of plasma conditions. Finally, efforts to couple this behavior in the background plasma directly to the impurity flows observed experimentally in MAST using a trace impurity model are discussed. These results provide insight into the fueling and exhaust features at this pivotal point of the radial and parallel particle flux balance, which is a major part of the plasma fueling and exhaust characteristics in a magnetically confined fusion device.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poukey, J.W.; Coleman, P.D.; Sanford, T.W.L.
1985-10-01
MABE is a multistage linear electron accelerator which accelerates up to nine beams in parallel. Nominal parameters per beam are 25 kA, final energy 7 MeV, and guide field 20 kG. We report recent progress via theory and simulation in understanding the beam dynamics in such a system. In particular, we emphasize our results on the radial oscillations and emittance growth for a beam passing through a series of accelerating gaps.
Real-time text extraction based on the page layout analysis system
NASA Astrophysics Data System (ADS)
Soua, M.; Benchekroun, A.; Kachouri, R.; Akil, M.
2017-05-01
Several approaches were proposed in order to extract text from scanned documents. However, text extraction in heterogeneous documents stills a real challenge. Indeed, text extraction in this context is a difficult task because of the variation of the text due to the differences of sizes, styles and orientations, as well as to the complexity of the document region background. Recently, we have proposed the improved hybrid binarization based on Kmeans method (I-HBK)5 to extract suitably the text from heterogeneous documents. In this method, the Page Layout Analysis (PLA), part of the Tesseract OCR engine, is used to identify text and image regions. Afterwards our hybrid binarization is applied separately on each kind of regions. In one side, gamma correction is employed before to process image regions. In the other side, binarization is performed directly on text regions. Then, a foreground and background color study is performed to correct inverted region colors. Finally, characters are located from the binarized regions based on the PLA algorithm. In this work, we extend the integration of the PLA algorithm within the I-HBK method. In addition, to speed up the separation of text and image step, we employ an efficient GPU acceleration. Through the performed experiments, we demonstrate the high F-measure accuracy of the PLA algorithm reaching 95% on the LRDE dataset. In addition, we illustrate the sequential and the parallel compared PLA versions. The obtained results give a speedup of 3.7x when comparing the parallel PLA implementation on GPU GTX 660 to the CPU version.
An improved parallel fuzzy connected image segmentation method based on CUDA.
Wang, Liansheng; Li, Dong; Huang, Shaohui
2016-05-12
Fuzzy connectedness method (FC) is an effective method for extracting fuzzy objects from medical images. However, when FC is applied to large medical image datasets, its running time will be greatly expensive. Therefore, a parallel CUDA version of FC (CUDA-kFOE) was proposed by Ying et al. to accelerate the original FC. Unfortunately, CUDA-kFOE does not consider the edges between GPU blocks, which causes miscalculation of edge points. In this paper, an improved algorithm is proposed by adding a correction step on the edge points. The improved algorithm can greatly enhance the calculation accuracy. In the improved method, an iterative manner is applied. In the first iteration, the affinity computation strategy is changed and a look up table is employed for memory reduction. In the second iteration, the error voxels because of asynchronism are updated again. Three different CT sequences of hepatic vascular with different sizes were used in the experiments with three different seeds. NVIDIA Tesla C2075 is used to evaluate our improved method over these three data sets. Experimental results show that the improved algorithm can achieve a faster segmentation compared to the CPU version and higher accuracy than CUDA-kFOE. The calculation results were consistent with the CPU version, which demonstrates that it corrects the edge point calculation error of the original CUDA-kFOE. The proposed method has a comparable time cost and has less errors compared to the original CUDA-kFOE as demonstrated in the experimental results. In the future, we will focus on automatic acquisition method and automatic processing.
3D Resistive MHD Simulations of Formation, Compression, and Acceleration of Compact Tori
NASA Astrophysics Data System (ADS)
Woodruff, Simon; Meyer, Thomas; Stuber, James; Romero-Talamas, Carlos; Brown, Michael; Kaur, Manjit; Schaffner, David
2017-10-01
We present results from extended resistive 3D MHD simulations (NIMROD) pertaining to a new formation method for toroidal plasmas using a reconnection region that forms in a radial implosion, and results from the acceleration of CTs along a drift tube that are accelerated by a coil and are allowed to go tilt unstable and form a helical minimum energy state. The new formation method results from a reconnection region that is generated between two magnetic compression coils that are ramped to 320kV in 2 μs. When the compressing field is aligned anti-parallel to a pre-existing CT, a current sheet and reconnection region forms that accelerates plasma radially inwards up to 500km/s which stagnates and directed energy converts to thermal, raising temperatures to 500eV. When field is aligned parallel to the pre-existing CT, the configuration can be accelerated along a drift tube. For certain ratios of magnetic field to density, the CT goes tilt-unstable forming a twisted flux rope, which can also be accelerated and stagnated on an end wall, where temperature and field increases as the plasma compresses. We compare simulation results with adiabatic scaling relations. Work supported by ARPA-E ALPHA program and DARPA.
Rapid brain MRI acquisition techniques at ultra-high fields
Setsompop, Kawin; Feinberg, David A.; Polimeni, Jonathan R.
2017-01-01
Ultra-high-field MRI provides large increases in signal-to-noise ratio as well as enhancement of several contrast mechanisms in both structural and functional imaging. Combined, these gains result in a substantial boost in contrast-to-noise ratio that can be exploited for higher spatial resolution imaging to extract finer-scale information about the brain. With increased spatial resolution, however, is a concurrent increased image encoding burden that can cause unacceptably long scan times for structural imaging and slow temporal sampling of the hemodynamic response in functional MRI—particularly when whole-brain imaging is desired. To address this issue, new directions of imaging technology development—such as the move from conventional 2D slice-by-slice imaging to more efficient Simultaneous MultiSlice (SMS) or MultiBand imaging (which can be viewed as “pseudo-3D” encoding) as well as full 3D imaging—have provided dramatic improvements in acquisition speed. Such imaging paradigms provide higher SNR efficiency as well as improved encoding efficiency. Moreover, SMS and 3D imaging can make better use of coil sensitivity information in multi-channel receiver arrays used for parallel imaging acquisitions through controlled aliasing in multiple spatial directions. This has enabled unprecedented acceleration factors of an order of magnitude or higher in these imaging acquisition schemes, with low image artifact levels and high SNR. Here we review the latest developments of SMS and 3D imaging methods and related technologies at ultra-high field for rapid high-resolution functional and structural imaging of the brain. PMID:26835884
Can, Mehmet Mustafa; Kaymaz, Cihangir
2010-08-01
Pulmonary arterial hypertension (PAH) is a rare, fatal and progressive disease. There is an acceleration in the advent of new therapies in parallel to the development of the knowledge about etiogenesis and pathogenesis of PAH. Therefore, to optimize the goals of PAH-specific treatment and to determine the time to shift from monotherapy to combination therapy, simple, objective and reproducible end-points, which may predict the disease severity, progression rate and life expectancy are needed. The adventure of end points in PAH has started with six minute walk distance and functional capacity, and continues with new parameters (biochemical marker, time to clinical worsening, echocardiography and magnetic resonance imaging etc.), which can better reflect the clinical outcome.
NASA Technical Reports Server (NTRS)
Tilton, James C.
1988-01-01
Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
NASA Astrophysics Data System (ADS)
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-01-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
NASA Astrophysics Data System (ADS)
Ying, Jia-ju; Chen, Yu-dan; Liu, Jie; Wu, Dong-sheng; Lu, Jun
2016-10-01
The maladjustment of photoelectric instrument binocular optical axis parallelism will affect the observe effect directly. A binocular optical axis parallelism digital calibration system is designed. On the basis of the principle of optical axis binocular photoelectric instrument calibration, the scheme of system is designed, and the binocular optical axis parallelism digital calibration system is realized, which include four modules: multiband parallel light tube, optical axis translation, image acquisition system and software system. According to the different characteristics of thermal infrared imager and low-light-level night viewer, different algorithms is used to localize the center of the cross reticle. And the binocular optical axis parallelism calibration is realized for calibrating low-light-level night viewer and thermal infrared imager.
Flexible, 31 channel breast coil for enhanced parallel imaging performance at 3T
Hancu, Ileana; Fiveland, Eric; Park, Keith; Giaquinto, Randy O.; Rohling, Kenneth; Wiesinger, Florian
2015-01-01
Purpose To design, build and characterize the performance of a novel 3T, 31 channel breast coil. Methods A flexible breast coil, accommodating all breast sizes while preserving close to unity filling factors in all configurations, was designed and built. Its performance was compared to the performance of the current state-of-the-art, 16 channel breast coil (Sentinelle coil, Hologic, Bedford, MA, USA), in phantoms and in vivo. Results Better axilla coverage and lower inter-coil coupling (12% vs. 26%, as characterized by the average off-diagonal elements of the noise correlation matrix) was exhibited by our 31 channel coil compared to the 16 channel coil. Breast area SNR increases of 68% (phantom) and 28 ± 31% (in vivo) were demonstrated in the 3 volunteers studied when the 31 channel coil was used. For the 31 channel/16 channel arrays, respectively, two dimensional acceleration factors of L/R × S/I = 4.3 × 2.4 resulted in average g-factors of 1.10/1.68 (in vitro) and 1.28/2.75 (in vivo); acceleration factors of L/R × A/P = 3.0 × 2.8 resulted in average g-factors of 1.06/1.54 (in vitro) and 1.05/1.12 (in vivo). Conclusion A high performance breast coil was built; its capabilities were demonstrated in phantom and normal volunteer imaging experiments. PMID:25772214
Stability analysis applied to the early stages of viscous drop breakup by a high-speed gas stream
NASA Astrophysics Data System (ADS)
Padrino, Juan C.; Longmire, Ellen K.
2013-11-01
The instability of a liquid drop suddenly exposed to a high-speed gas stream behind a shock wave is studied by considering the gas-liquid motion at the drop interface. The discontinuous velocity profile given by the uniform, parallel flow of an inviscid, compressible gas over a viscous liquid is considered, and drop acceleration is included. Our analysis considers compressibility effects not only in the base flow, but also in the equations of motion for the perturbations. Recently published high-resolution images of the process of drop breakup by a passing shock have provided experimental evidence supporting the idea that a critical gas dynamic pressure can be found above which drop piercing by the growth of acceleration-driven instabilities gives way to drop breakup by liquid entrainment resulting from the gas shearing action. For a set of experimental runs from the literature, results show that, for shock Mach numbers >= 2, a band of rapidly growing waves forms in the region well upstream of the drop's equator at the location where the base flow passes from subsonic to supersonic, in agreement with experimental images. Also, the maximum growth rate can be used to predict the transition of the breakup mode from Rayleigh-Taylor piercing to shear-induced entrainment. The authors acknowledge support of the NSF (DMS-0908561).
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors
NASA Astrophysics Data System (ADS)
Yi, Hongsuk
2014-03-01
Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
Chiew, Mark; Graedel, Nadine N; Miller, Karla L
2018-07-01
Recent developments in highly accelerated fMRI data acquisition have employed low-rank and/or sparsity constraints for image reconstruction, as an alternative to conventional, time-independent parallel imaging. When under-sampling factors are high or the signals of interest are low-variance, however, functional data recovery can be poor or incomplete. We introduce a method for improving reconstruction fidelity using external constraints, like an experimental design matrix, to partially orient the estimated fMRI temporal subspace. Combining these external constraints with low-rank constraints introduces a new image reconstruction model that is analogous to using a mixture of subspace-decomposition (PCA/ICA) and regression (GLM) models in fMRI analysis. We show that this approach improves fMRI reconstruction quality in simulations and experimental data, focusing on the model problem of detecting subtle 1-s latency shifts between brain regions in a block-design task-fMRI experiment. Successful latency discrimination is shown at acceleration factors up to R = 16 in a radial-Cartesian acquisition. We show that this approach works with approximate, or not perfectly informative constraints, where the derived benefit is commensurate with the information content contained in the constraints. The proposed method extends low-rank approximation methods for under-sampled fMRI data acquisition by leveraging knowledge of expected task-based variance in the data, enabling improvements in the speed and efficiency of fMRI data acquisition without the loss of subtle features. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
NASA Astrophysics Data System (ADS)
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai; Ng, Cho-Kuen; Rivetta, Claudio
2017-10-01
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we present the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. The simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.
Discrete event performance prediction of speculatively parallel temperature-accelerated dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Richard James; Voter, Arthur F.; Perez, Danny
Due to its unrivaled ability to predict the dynamical evolution of interacting atoms, molecular dynamics (MD) is a widely used computational method in theoretical chemistry, physics, biology, and engineering. Despite its success, MD is only capable of modeling time scales within several orders of magnitude of thermal vibrations, leaving out many important phenomena that occur at slower rates. The Temperature Accelerated Dynamics (TAD) method overcomes this limitation by thermally accelerating the state-to-state evolution captured by MD. Due to the algorithmically complex nature of the serial TAD procedure, implementations have yet to improve performance by parallelizing the concurrent exploration of multiplemore » states. Here we utilize a discrete event-based application simulator to introduce and explore a new Speculatively Parallel TAD (SpecTAD) method. We investigate the SpecTAD algorithm, without a full-scale implementation, by constructing an application simulator proxy (SpecTADSim). Finally, following this method, we discover that a nontrivial relationship exists between the optimal SpecTAD parameter set and the number of CPU cores available at run-time. Furthermore, we find that a majority of the available SpecTAD boost can be achieved within an existing TAD application using relatively simple algorithm modifications.« less
Discrete event performance prediction of speculatively parallel temperature-accelerated dynamics
Zamora, Richard James; Voter, Arthur F.; Perez, Danny; ...
2016-12-01
Due to its unrivaled ability to predict the dynamical evolution of interacting atoms, molecular dynamics (MD) is a widely used computational method in theoretical chemistry, physics, biology, and engineering. Despite its success, MD is only capable of modeling time scales within several orders of magnitude of thermal vibrations, leaving out many important phenomena that occur at slower rates. The Temperature Accelerated Dynamics (TAD) method overcomes this limitation by thermally accelerating the state-to-state evolution captured by MD. Due to the algorithmically complex nature of the serial TAD procedure, implementations have yet to improve performance by parallelizing the concurrent exploration of multiplemore » states. Here we utilize a discrete event-based application simulator to introduce and explore a new Speculatively Parallel TAD (SpecTAD) method. We investigate the SpecTAD algorithm, without a full-scale implementation, by constructing an application simulator proxy (SpecTADSim). Finally, following this method, we discover that a nontrivial relationship exists between the optimal SpecTAD parameter set and the number of CPU cores available at run-time. Furthermore, we find that a majority of the available SpecTAD boost can be achieved within an existing TAD application using relatively simple algorithm modifications.« less
Accelerated speckle imaging with the ATST visible broadband imager
NASA Astrophysics Data System (ADS)
Wöger, Friedrich; Ferayorni, Andrew
2012-09-01
The Advanced Technology Solar Telescope (ATST), a 4 meter class telescope for observations of the solar atmosphere currently in construction phase, will generate data at rates of the order of 10 TB/day with its state of the art instrumentation. The high-priority ATST Visible Broadband Imager (VBI) instrument alone will create two data streams with a bandwidth of 960 MB/s each. Because of the related data handling issues, these data will be post-processed with speckle interferometry algorithms in near-real time at the telescope using the cost-effective Graphics Processing Unit (GPU) technology that is supported by the ATST Data Handling System. In this contribution, we lay out the VBI-specific approach to its image processing pipeline, put this into the context of the underlying ATST Data Handling System infrastructure, and finally describe the details of how the algorithms were redesigned to exploit data parallelism in the speckle image reconstruction algorithms. An algorithm re-design is often required to efficiently speed up an application using GPU technology; we have chosen NVIDIA's CUDA language as basis for our implementation. We present our preliminary results of the algorithm performance using our test facilities, and base a conservative estimate on the requirements of a full system that could achieve near real-time performance at ATST on these results.
GPU-completeness: theory and implications
NASA Astrophysics Data System (ADS)
Lin, I.-Jong
2011-01-01
This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
Watanabe, Noriya; Sakagami, Masamichi; Haruno, Masahiko
2013-03-06
Learning does not only depend on rationality, because real-life learning cannot be isolated from emotion or social factors. Therefore, it is intriguing to determine how emotion changes learning, and to identify which neural substrates underlie this interaction. Here, we show that the task-independent presentation of an emotional face before a reward-predicting cue increases the speed of cue-reward association learning in human subjects compared with trials in which a neutral face is presented. This phenomenon was attributable to an increase in the learning rate, which regulates reward prediction errors. Parallel to these behavioral findings, functional magnetic resonance imaging demonstrated that presentation of an emotional face enhanced reward prediction error (RPE) signal in the ventral striatum. In addition, we also found a functional link between this enhanced RPE signal and increased activity in the amygdala following presentation of an emotional face. Thus, this study revealed an acceleration of cue-reward association learning by emotion, and underscored a role of striatum-amygdala interactions in the modulation of the reward prediction errors by emotion.
Badal, Andreu; Badano, Aldo
2009-11-01
It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDATM programming model (NVIDIA Corporation, Santa Clara, CA). An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
GPU-accelerated FDTD modeling of radio-frequency field-tissue interactions in high-field MRI.
Chi, Jieru; Liu, Feng; Weber, Ewald; Li, Yu; Crozier, Stuart
2011-06-01
The analysis of high-field RF field-tissue interactions requires high-performance finite-difference time-domain (FDTD) computing. Conventional CPU-based FDTD calculations offer limited computing performance in a PC environment. This study presents a graphics processing unit (GPU)-based parallel-computing framework, producing substantially boosted computing efficiency (with a two-order speedup factor) at a PC-level cost. Specific details of implementing the FDTD method on a GPU architecture have been presented and the new computational strategy has been successfully applied to the design of a novel 8-element transceive RF coil system at 9.4 T. Facilitated by the powerful GPU-FDTD computing, the new RF coil array offers optimized fields (averaging 25% improvement in sensitivity, and 20% reduction in loop coupling compared with conventional array structures of the same size) for small animal imaging with a robust RF configuration. The GPU-enabled acceleration paves the way for FDTD to be applied for both detailed forward modeling and inverse design of MRI coils, which were previously impractical.
Peker, Musa; Şen, Baha; Gürüler, Hüseyin
2015-02-01
The effect of anesthesia on the patient is referred to as depth of anesthesia. Rapid classification of appropriate depth level of anesthesia is a matter of great importance in surgical operations. Similarly, accelerating classification algorithms is important for the rapid solution of problems in the field of biomedical signal processing. However numerous, time-consuming mathematical operations are required when training and testing stages of the classification algorithms, especially in neural networks. In this study, to accelerate the process, parallel programming and computing platform (Nvidia CUDA) facilitates dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU) was utilized. The system was employed to detect anesthetic depth level on related electroencephalogram (EEG) data set. This dataset is rather complex and large. Moreover, the achieving more anesthetic levels with rapid response is critical in anesthesia. The proposed parallelization method yielded high accurate classification results in a faster time.
Accelerating Families of Fuzzy K-Means Algorithms for Vector Quantization Codebook Design
Mata, Edson; Bandeira, Silvio; de Mattos Neto, Paulo; Lopes, Waslon; Madeiro, Francisco
2016-01-01
The performance of signal processing systems based on vector quantization depends on codebook design. In the image compression scenario, the quality of the reconstructed images depends on the codebooks used. In this paper, alternatives are proposed for accelerating families of fuzzy K-means algorithms for codebook design. The acceleration is obtained by reducing the number of iterations of the algorithms and applying efficient nearest neighbor search techniques. Simulation results concerning image vector quantization have shown that the acceleration obtained so far does not decrease the quality of the reconstructed images. Codebook design time savings up to about 40% are obtained by the accelerated versions with respect to the original versions of the algorithms. PMID:27886061
Accelerating Families of Fuzzy K-Means Algorithms for Vector Quantization Codebook Design.
Mata, Edson; Bandeira, Silvio; de Mattos Neto, Paulo; Lopes, Waslon; Madeiro, Francisco
2016-11-23
The performance of signal processing systems based on vector quantization depends on codebook design. In the image compression scenario, the quality of the reconstructed images depends on the codebooks used. In this paper, alternatives are proposed for accelerating families of fuzzy K-means algorithms for codebook design. The acceleration is obtained by reducing the number of iterations of the algorithms and applying efficient nearest neighbor search techniques. Simulation results concerning image vector quantization have shown that the acceleration obtained so far does not decrease the quality of the reconstructed images. Codebook design time savings up to about 40% are obtained by the accelerated versions with respect to the original versions of the algorithms.
The modern temperature-accelerated dynamics approach
Zamora, Richard J.; Uberuaga, Blas P.; Perez, Danny; ...
2016-06-01
Accelerated molecular dynamics (AMD) is a class of MD-based methods used to simulate atomistic systems in which the metastable state-to-state evolution is slow compared with thermal vibrations. Temperature-accelerated dynamics (TAD) is a particularly efficient AMD procedure in which the predicted evolution is hastened by elevating the temperature of the system and then recovering the correct state-to-state dynamics at the temperature of interest. TAD has been used to study various materials applications, often revealing surprising behavior beyond the reach of direct MD. This success has inspired several algorithmic performance enhancements, as well as the analysis of its mathematical framework. Recently, thesemore » enhancements have leveraged parallel programming techniques to enhance both the spatial and temporal scaling of the traditional approach. Here, we review the ongoing evolution of the modern TAD method and introduce the latest development: speculatively parallel TAD.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poukey, J.W.; Coleman, P.D.; Sanford, T.W.L.
1985-01-01
MABE is a multistage linear electron accelerator which accelerates up to nine beams in parallel. Nominal parameters per beam are 25 kA, final energy 7 MeV, and guide field 20 kG. We report recent progress via theory and simulation in understanding the beam dynamics in such a system. In particular, we emphasize our results on the radial oscillations and emittance growth for a beam passing through a series of accelerating gaps. 12 refs., 8 figs.
NASA Astrophysics Data System (ADS)
Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.
2016-05-01
This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Shear Elasticity and Shear Viscosity Imaging in Soft Tissue
NASA Astrophysics Data System (ADS)
Yang, Yiqun
In this thesis, a new approach is introduced that provides estimates of shear elasticity and shear viscosity using time-domain measurements of shear waves in viscoelastic media. Simulations of shear wave particle displacements induced by an acoustic radiation force are accelerated significantly by a GPU. The acoustic radiation force is first calculated using the fast near field method (FNM) and the angular spectrum approach (ASA). The shear waves induced by the acoustic radiation force are then simulated in elastic and viscoelastic media using Green's functions. A parallel algorithm is developed to perform these calculations on a GPU, where the shear wave particle displacements at different observation points are calculated in parallel. The resulting speed increase enables rapid evaluation of shear waves at discrete points, in 2D planes, and for push beams with different spatial samplings and for different values of the f-number (f/#). The results of these simulations show that push beams with smaller f/# require a higher spatial sampling rate. The significant amount of acceleration achieved by this approach suggests that shear wave simulations with the Green's function approach are ideally suited for high-performance GPUs. Shear wave elasticity imaging determines the mechanical parameters of soft tissue by analyzing measured shear waves induced by an acoustic radiation force. To estimate the shear elasticity value, the widely used time-of-flight method calculates the correlation between shear wave particle velocities at adjacent lateral observation points. Although this method provides accurate estimates of the shear elasticity in purely elastic media, our experience suggests that the time-of-flight (TOF) method consistently overestimates the shear elasticity values in viscoelastic media because the combined effects of diffraction, attenuation, and dispersion are not considered. To address this problem, we have developed an approach that directly accounts for all of these effects when estimating the shear elasticity. This new approach simulates shear wave particle velocities using a Green's function-based approach for the Voigt model, where the shear elasticity and viscosity values are estimated using an optimization-based approach that compares measured shear wave particle velocities with simulated shear wave particle velocities in the time-domain. The results are evaluated on a point-by-point basis to generate images. There is good agreement between the simulated and measured shear wave particle velocities, where the new approach yields much better images of the shear elasticity and shear viscosity than the TOF method. The new estimation approach is accelerated with an approximate viscoelastic Green's function model that is evaluated with shear wave data obtained from in vivo human livers. Instead of calculating shear waves with combinations of different shear elasticities and shear viscosities, shear waves are calculated with different shear elasticities on the GPU and then convolved with a viscous loss model, which accelerates the calculation dramatically. The shear elasticity and shear viscosity values are then estimated using an optimization-based approach by minimizing the difference between measured and simulated shear wave particle velocities. Shear elasticity and shear viscosity images are generated at every spatial point in a two-dimensional (2D) field-of-view (FOV). The new approach is applied to measured shear wave data obtained from in vivo human livers, and the results show that this new approach successfully generates shear elasticity and shear viscosity images from this data. The results also indicate that the shear elasticity values estimated with this approach are significantly smaller than the values estimated with the conventional TOF method and that the new approach demonstrates more consistent values for these estimates compared with the TOF method. This experience suggests that the new method is an effective approach for estimating the shear elasticity and the shear viscosity in liver and in other soft tissue.
Sharp, G C; Kandasamy, N; Singh, H; Folkert, M
2007-10-07
This paper shows how to significantly accelerate cone-beam CT reconstruction and 3D deformable image registration using the stream-processing model. We describe data-parallel designs for the Feldkamp, Davis and Kress (FDK) reconstruction algorithm, and the demons deformable registration algorithm, suitable for use on a commodity graphics processing unit. The streaming versions of these algorithms are implemented using the Brook programming environment and executed on an NVidia 8800 GPU. Performance results using CT data of a preserved swine lung indicate that the GPU-based implementations of the FDK and demons algorithms achieve a substantial speedup--up to 80 times for FDK and 70 times for demons when compared to an optimized reference implementation on a 2.8 GHz Intel processor. In addition, the accuracy of the GPU-based implementations was found to be excellent. Compared with CPU-based implementations, the RMS differences were less than 0.1 Hounsfield unit for reconstruction and less than 0.1 mm for deformable registration.
Accelerating Wright–Fisher Forward Simulations on the Graphics Processing Unit
Lawrie, David S.
2017-01-01
Forward Wright–Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright–Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called “embarrassingly parallel,” consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright–Fisher simulation, or “GO Fish” for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/. PMID:28768689
Skinner, Jack T; Robison, Ryan K; Elder, Christopher P; Newton, Allen T; Damon, Bruce M; Quarles, C Chad
2014-12-01
Perfusion-based changes in MR signal intensity can occur in response to the introduction of exogenous contrast agents and endogenous tissue properties (e.g. blood oxygenation). MR measurements aimed at capturing these changes often implement single-shot echo planar imaging (ssEPI). In recent years ssEPI readouts have been combined with parallel imaging (PI) to allow fast dynamic multi-slice imaging as well as the incorporation of multiple echoes. A multiple spin- and gradient-echo (SAGE) EPI acquisition has recently been developed to allow measurement of transverse relaxation rate (R2 and R2(*)) changes in dynamic susceptibility contrast (DSC)-MRI experiments in the brain. With SAGE EPI, the use of PI can influence image quality, temporal resolution, and achievable echo times. The effect of PI on dynamic SAGE measurements, however, has not been evaluated. In this work, a SAGE EPI acquisition utilizing SENSE PI and partial Fourier (PF) acceleration was developed and evaluated. Voxel-wise measures of R2 and R2(*) in healthy brain were compared using SAGE EPI and conventional non-EPI multiple echo acquisitions with varying SENSE and PF acceleration. A conservative SENSE factor of 2 with PF factor of 0.73 was found to provide accurate measures of R2 and R2(*) in white (WM) (rR2=[0.55-0.79], rR2*=[0.47-0.71]) and gray (GM) matter (rR2=[0.26-0.59], rR2*=[0.39-0.74]) across subjects. The combined use of SENSE and PF allowed the first dynamic SAGE EPI measurements in muscle, with a SENSE factor of 3 and PF factor of 0.6 providing reliable relaxation rate estimates when compared to multi-echo methods. Application of the optimized SAGE protocol in DSC-MRI of high-grade glioma patients provided T1 leakage-corrected estimates of CBV and CBF as well as mean vessel diameter (mVD) and simultaneous measures of DCE-MRI parameters K(trans) and ve. Likewise, application of SAGE in a muscle reperfusion model allowed dynamic measures of R2', a parameter that has been shown to correlate with muscle oxy-hemoglobin saturation. Copyright © 2014 Elsevier Inc. All rights reserved.
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.
Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F
2018-03-01
Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.
Limited angle tomographic breast imaging: A comparison of parallel beam and pinhole collimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wessell, D.E.; Kadrmas, D.J.; Frey, E.C.
1996-12-31
Results from clinical trials have suggested no improvement in lesion detection with parallel hole SPECT scintimammography (SM) with Tc-99m over parallel hole planar SM. In this initial investigation, we have elucidated some of the unique requirements of SPECT SM. With these requirements in mind, we have begun to develop practical data acquisition and reconstruction strategies that can reduce image artifacts and improve image quality. In this paper we investigate limited angle orbits for both parallel hole and pinhole SPECT SM. Singular Value Decomposition (SVD) is used to analyze the artifacts associated with the limited angle orbits. Maximum likelihood expectation maximizationmore » (MLEM) reconstructions are then used to examine the effects of attenuation compensation on the quality of the reconstructed image. All simulations are performed using the 3D-MCAT breast phantom. The results of these simulation studies demonstrate that limited angle SPECT SM is feasible, that attenuation correction is needed for accurate reconstructions, and that pinhole SPECT SM may have an advantage over parallel hole SPECT SM in terms of improved image quality and reduced image artifacts.« less
BarraCUDA - a fast short read sequence aligner using graphics processing units
2012-01-01
Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
NASA Technical Reports Server (NTRS)
Turner, D. L.; Omidi, N.; Sibeck, D. G.; Angelopoulos, V.
2011-01-01
Earth?s foreshock, which is the quasi-parallel region upstream of the bow shock, is a unique plasma region capable of generating several kinds of large-scale phenomena, each of which can impact the magnetosphere resulting in global effects. Interestingly, such phenomena have also been observed at planetary foreshocks throughout our solar system. Recently, a new type of foreshock phenomena has been predicted: foreshock bubbles, which are large-scale disruptions of both the foreshock and incident solar wind plasmas that can result in global magnetospheric disturbances. Here we present unprecedented, multi-point observations of foreshock bubbles at Earth using a combination of spacecraft and ground observations primarily from the Time History of Events and Macroscale Interactions during Substorms (THEMIS) mission, and we include detailed analysis of the events? global effects on the magnetosphere and the energetic ions and electrons accelerated by them, potentially by a combination of first and second order Fermi and shock drift acceleration processes. This new phenomena should play a role in energetic particle acceleration at collisionless, quasi-parallel shocks throughout the Universe.
Parallel algorithm of real-time infrared image restoration based on total variation theory
NASA Astrophysics Data System (ADS)
Zhu, Ran; Li, Miao; Long, Yunli; Zeng, Yaoyuan; An, Wei
2015-10-01
Image restoration is a necessary preprocessing step for infrared remote sensing applications. Traditional methods allow us to remove the noise but penalize too much the gradients corresponding to edges. Image restoration techniques based on variational approaches can solve this over-smoothing problem for the merits of their well-defined mathematical modeling of the restore procedure. The total variation (TV) of infrared image is introduced as a L1 regularization term added to the objective energy functional. It converts the restoration process to an optimization problem of functional involving a fidelity term to the image data plus a regularization term. Infrared image restoration technology with TV-L1 model exploits the remote sensing data obtained sufficiently and preserves information at edges caused by clouds. Numerical implementation algorithm is presented in detail. Analysis indicates that the structure of this algorithm can be easily implemented in parallelization. Therefore a parallel implementation of the TV-L1 filter based on multicore architecture with shared memory is proposed for infrared real-time remote sensing systems. Massive computation of image data is performed in parallel by cooperating threads running simultaneously on multiple cores. Several groups of synthetic infrared image data are used to validate the feasibility and effectiveness of the proposed parallel algorithm. Quantitative analysis of measuring the restored image quality compared to input image is presented. Experiment results show that the TV-L1 filter can restore the varying background image reasonably, and that its performance can achieve the requirement of real-time image processing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fujimoto, Keizo, E-mail: keizo.fujimoto@nao.ac.jp; Takamoto, Makoto
2016-01-15
We have investigated the ion and electron dynamics generating the Hall current in the reconnection exhaust far downstream of the x-line where the exhaust width is much larger than the ion gyro-radius. A large-scale particle-in-cell simulation shows that most ions are accelerated through the Speiser-type motion in the current sheet formed at the center of the exhaust. The transition layers formed at the exhaust boundary are not identified as slow mode shocks. (The layers satisfy mostly the Rankine-Hugoniot conditions for a slow mode shock, but the energy conversion hardly occurs there.) We find that the ion drift velocity is modifiedmore » around the layer due to a finite Larmor radius effect. As a result, the ions are accumulated in the downstream side of the layer, so that collimated ion jets are generated. The electrons experience two steps of acceleration in the exhaust. The first is a parallel acceleration due to the out-of-plane electric field E{sub y} which has a parallel component in most area of the exhaust. The second is a perpendicular acceleration due to E{sub y} at the center of the current sheet and the motion is converted to the parallel direction. Because of the second acceleration, the electron outflow velocity becomes almost uniform over the exhaust. The difference in the outflow profile between the ions and electrons results in the Hall current in large area of the exhaust. The present study demonstrates the importance of the kinetic treatments for collisionless magnetic reconnection even far downstream from the x-line.« less
Energetic particles in laboratory, space and astrophysical plasmas
NASA Astrophysics Data System (ADS)
McClements, K. G.; Turnyanskiy, M. R.
2017-01-01
Some recent studies of energetic particles in laboratory, space and astrophysical plasmas are discussed, and a number of common themes identified. Such comparative studies can elucidate the underlying physical processes. For example microwave bursts observed during edge localised modes (ELMs) in the mega amp spherical tokamak (MAST) can be attributed to energetic electrons accelerated by parallel electric fields associated with the ELMs. The very large numbers of electrons known to be accelerated in solar flares must also arise from parallel electric fields, and the demonstration of energetic electron production during ELMs suggests close links at the kinetic level between ELMs and flares. Energetic particle studies in solar flares have focussed largely on electrons rather than ions, since bremsstrahlung from deka-keV electrons provides the best available explanation of flare hard x-ray emission. However ion acceleration (but not electron acceleration) has been observed during merging startup of plasmas in MAST with dimensionless parameters similar to those of the solar corona during flares. Recent measurements in the Earth’s radiation belts demonstrate clearly a direct link between ion cyclotron emission (ICE) and fast particle population inversion, supporting the hypothesis that ICE in tokamaks is driven by fast particle distributions of this type. Shear Alfvén waves in plasmas with beta less than the electron to ion mass ratio have a parallel electric field that, in the solar corona, could accelerate electrons to hard x-ray-emitting energies; an extension of this calculation to plasmas with Alfvén speed arbitrarily close to the speed of light suggests that the mechanism could play a role in the production of cosmic ray electrons.
NASA Astrophysics Data System (ADS)
Krawczynski, H.
2007-04-01
In this paper we discuss models of the X-ray and TeV γ-ray emission from BL Lac objects based on parallel electron-positron or electron-proton beams that form close to the central black hole, due to the strong electric fields generated by the accretion disk and possibly also by the black hole itself. Fitting the energy spectrum of the BL Lac object Mrk 501, we obtain tight constraints on the beam properties. Launching a sufficiently energetic beam requires rather strong magnetic fields close to the black hole (~100-1000 G). However, the model fits imply that the magnetic field in the emission region is only ~0.02 G. Thus, the particles are accelerated close to the black hole and propagate a considerable distance before instabilities trigger the dissipation of energy through synchrotron and self-Compton emission. We discuss various approaches to generate enough power to drive the jet and, at the same time, to accelerate particles to ~20 TeV energies. Although the parallel beam model has its own problems, it explains some of the long-standing problems that plague models based on Fermi-type particle acceleration, such as the presence of a very high minimum Lorentz factor of accelerated particles. We conclude with a brief discussion of the implications of the model for the difference between the processes of jet formation in BL Lac-type objects and those in quasars.
NASA Astrophysics Data System (ADS)
Krawczynski, Henric
2007-04-01
In this contribution we discuss models of the X-rays and TeV gamma-ray emission from BL Lac objects based on parallel electron-positron or electron-proton beams that form close to the central black hole owing to the strong electric fields generated by the accretion disk and possibly also by the black hole itself. Fitting the energy spectrum of the BL Lac object Mrk 501, we obtain tight constrains on the beam properties. Launching a sufficiently energetic beam requires rather strong magnetic fields close to the black hole 100-1000 G. However, the model fits imply that the magnetic field in the emission region is only 0.02 G. Thus, the particles are accelerated close to the black hole and propagate a considerable distance before instabilities trigger the dissipation of energy through synchrotron and self-Compton emission. We discuss various approaches to generate enough power to drive the jet and, at the same time, to accelerate particles to 20 TeV energies. Although the parallel beam model has its own problems, it explains some of the long-standing problems that plague models based on Fermi type particle acceleration, like the presence of a very high minimum Lorentz factor of accelerated particles. We conclude with a brief discussion of the implications of the model for the difference between the processes of jet formation in BL Lac type objects and in quasars.
Chang, Herng-Hua; Chang, Yu-Ning
2017-04-01
Bilateral filters have been substantially exploited in numerous magnetic resonance (MR) image restoration applications for decades. Due to the deficiency of theoretical basis on the filter parameter setting, empirical manipulation with fixed values and noise variance-related adjustments has generally been employed. The outcome of these strategies is usually sensitive to the variation of the brain structures and not all the three parameter values are optimal. This article is in an attempt to investigate the optimal setting of the bilateral filter, from which an accelerated and automated restoration framework is developed. To reduce the computational burden of the bilateral filter, parallel computing with the graphics processing unit (GPU) architecture is first introduced. The NVIDIA Tesla K40c GPU with the compute unified device architecture (CUDA) functionality is specifically utilized to emphasize thread usages and memory resources. To correlate the filter parameters with image characteristics for automation, optimal image texture features are subsequently acquired based on the sequential forward floating selection (SFFS) scheme. Subsequently, the selected features are introduced into the back propagation network (BPN) model for filter parameter estimation. Finally, the k-fold cross validation method is adopted to evaluate the accuracy of the proposed filter parameter prediction framework. A wide variety of T1-weighted brain MR images with various scenarios of noise levels and anatomic structures were utilized to train and validate this new parameter decision system with CUDA-based bilateral filtering. For a common brain MR image volume of 256 × 256 × 256 pixels, the speed-up gain reached 284. Six optimal texture features were acquired and associated with the BPN to establish a "high accuracy" parameter prediction system, which achieved a mean absolute percentage error (MAPE) of 5.6%. Automatic restoration results on 2460 brain MR images received an average relative error in terms of peak signal-to-noise ratio (PSNR) less than 0.1%. In comparison with many state-of-the-art filters, the proposed automation framework with CUDA-based bilateral filtering provided more favorable results both quantitatively and qualitatively. Possessing unique characteristics and demonstrating exceptional performances, the proposed CUDA-based bilateral filter adequately removed random noise in multifarious brain MR images for further study in neurosciences and radiological sciences. It requires no prior knowledge of the noise variance and automatically restores MR images while preserving fine details. The strategy of exploiting the CUDA to accelerate the computation and incorporating texture features into the BPN to completely automate the bilateral filtering process is achievable and validated, from which the best performance is reached. © 2017 American Association of Physicists in Medicine.
Golden-ratio rotated stack-of-stars acquisition for improved volumetric MRI.
Zhou, Ziwu; Han, Fei; Yan, Lirong; Wang, Danny J J; Hu, Peng
2017-12-01
To develop and evaluate an improved stack-of-stars radial sampling strategy for reducing streaking artifacts. The conventional stack-of-stars sampling strategy collects the same radial angle for every partition (slice) encoding. In an undersampled acquisition, such an aligned acquisition generates coherent aliasing patterns and introduces strong streaking artifacts. We show that by rotating the radial spokes in a golden-angle manner along the partition-encoding direction, the aliasing pattern is modified, resulting in improved image quality for gridding and more advanced reconstruction methods. Computer simulations were performed and phantom as well as in vivo images for three different applications were acquired. Simulation, phantom, and in vivo experiments confirmed that the proposed method was able to generate images with less streaking artifact and sharper structures based on undersampled acquisitions in comparison with the conventional aligned approach at the same acceleration factors. By combining parallel imaging and compressed sensing in the reconstruction, streaking artifacts were mostly removed with improved delineation of fine structures using the proposed strategy. We present a simple method to reduce streaking artifacts and improve image quality in 3D stack-of-stars acquisitions by re-arranging the radial spoke angles in the 3D partition direction, which can be used for rapid volumetric imaging. Magn Reson Med 78:2290-2298, 2017. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Alfvén Waves and the Aurora (Hannes Alfvén Medal Lecture)
NASA Astrophysics Data System (ADS)
Lysak, Robert
2015-04-01
The most compelling visual evidence of plasma processes in the magnetosphere of Earth as well as the other magnetized planets is the aurora. Over 40 years of research have indicated that the aurora is a consequence of the acceleration of charged particles toward the neutral atmosphere, where the excitation of neutral atoms and their subsequent relaxation to the ground state produces the auroral light. Much of this acceleration can be described by acceleration in a quasi-static electric field parallel to the geomagnetic field, producing nearly monoenergetic beams of electrons. While a variety of quasi-static models to describe such parallel electric fields have been developed, the dynamics of how these fields evolve is still an open question. Satellite measurements have indicated that a primary source of energy to support these fields is the Poynting flux associated with shear Alfvén waves propagating along auroral field lines. These Alfvén waves are generated in the magnetosphere and reflect from the ionosphere. On closed field lines, Alfvén waves bouncing between conjugate ionospheres produce field line resonances that have be observed both in space and by ground magnetometers. However, some auroral emissions do not follow this scenario. In these cases, the accelerated electrons are observed to have a broad energy spectrum, rather than a monoenergetic peak. Such a spectrum is suggestive of a time-dependent acceleration process that operates on a time scale of a few seconds, comparable to the electron transit time across the acceleration region. While field line resonances have a time scale on the order of minutes, waves with periods of a few seconds can be produced by partial reflections in the Ionospheric Alfvén Resonator, a resonant cavity formed by the rapid decrease of the plasma density and increase of the Alfvén speed above the ionosphere. In order to develop a parallel electric field that can accelerate auroral particles, these Alfvén waves must develop small spatial scales, where MHD theory breaks down. In this regime, the waves are called kinetic Alfvén waves. These small scales can be produced most simply be phase mixing, although ionospheric feedback and nonlinear effects may also be important. Since kinetic Alfvén waves require perpendicular wavelengths the order of a few kilometers, this model also provides a natural explanation of the narrow scales of discrete auroral arcs. These interactions between magnetosphere and ionosphere and the development of parallel electric fields have been described by means of numerical simulations that serve to illustrate these complex processes.
Parallel Guessing: A Strategy for High-Speed Computation
1984-09-19
for using additional hardware to obtain higher processing speed). In this paper we argue that parallel guessing for image analysis is a useful...from a true solution, or the correctness of a guess, can be readily checked. We review image - analysis algorithms having a parallel guessing or
Wavelet Transforms in Parallel Image Processing
1994-01-27
NUMBER OF PAGES Object Segmentation, Texture Segmentation, Image Compression, Image 137 Halftoning , Neural Network, Parallel Algorithms, 2D and 3D...Vector Quantization of Wavelet Transform Coefficients ........ ............................. 57 B.1.f Adaptive Image Halftoning based on Wavelet...application has been directed to the adaptive image halftoning . The gray information at a pixel, including its gray value and gradient, is represented by
Radio frequency quadrupole resonator for linear accelerator
Moretti, Alfred
1985-01-01
An RFQ resonator for a linear accelerator having a reduced level of interfering modes and producing a quadrupole mode for focusing, bunching and accelerating beams of heavy charged particles, with the construction being characterized by four elongated resonating rods within a cylinder with the rods being alternately shorted and open electrically to the shell at common ends of the rods to provide an LC parallel resonant circuit when activated by a magnetic field transverse to the longitudinal axis.
Radio-frequency quadrupole resonator for linear accelerator
Moretti, A.
1982-10-19
An RFQ resonator for a linear accelerator having a reduced level of interfering modes and producing a quadrupole mode for focusing, bunching and accelerating beams of heavy charged particles, with the construction being characterized by four elongated resonating rods within a cylinder with the rods being alternately shorted and open electrically to the shell at common ends of the rods to provide an LC parallel resonant circuit when activated by a magnetic field transverse to the longitudinal axis.
An embedded multi-core parallel model for real-time stereo imaging
NASA Astrophysics Data System (ADS)
He, Wenjing; Hu, Jian; Niu, Jingyu; Li, Chuanrong; Liu, Guangyu
2018-04-01
The real-time processing based on embedded system will enhance the application capability of stereo imaging for LiDAR and hyperspectral sensor. The task partitioning and scheduling strategies for embedded multiprocessor system starts relatively late, compared with that for PC computer. In this paper, aimed at embedded multi-core processing platform, a parallel model for stereo imaging is studied and verified. After analyzing the computing amount, throughout capacity and buffering requirements, a two-stage pipeline parallel model based on message transmission is established. This model can be applied to fast stereo imaging for airborne sensors with various characteristics. To demonstrate the feasibility and effectiveness of the parallel model, a parallel software was designed using test flight data, based on the 8-core DSP processor TMS320C6678. The results indicate that the design performed well in workload distribution and had a speed-up ratio up to 6.4.
NASA Astrophysics Data System (ADS)
Timchenko, Leonid; Yarovyi, Andrii; Kokriatskaya, Nataliya; Nakonechna, Svitlana; Abramenko, Ludmila; Ławicki, Tomasz; Popiel, Piotr; Yesmakhanova, Laura
2016-09-01
The paper presents a method of parallel-hierarchical transformations for rapid recognition of dynamic images using GPU technology. Direct parallel-hierarchical transformations based on cluster CPU-and GPU-oriented hardware platform. Mathematic models of training of the parallel hierarchical (PH) network for the transformation are developed, as well as a training method of the PH network for recognition of dynamic images. This research is most topical for problems on organizing high-performance computations of super large arrays of information designed to implement multi-stage sensing and processing as well as compaction and recognition of data in the informational structures and computer devices. This method has such advantages as high performance through the use of recent advances in parallelization, possibility to work with images of ultra dimension, ease of scaling in case of changing the number of nodes in the cluster, auto scan of local network to detect compute nodes.
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Massively parallel information processing systems for space applications
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1979-01-01
NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
Parallel and Portable Monte Carlo Particle Transport
NASA Astrophysics Data System (ADS)
Lee, S. R.; Cummings, J. C.; Nolen, S. D.; Keen, N. D.
1997-08-01
We have developed a multi-group, Monte Carlo neutron transport code in C++ using object-oriented methods and the Parallel Object-Oriented Methods and Applications (POOMA) class library. This transport code, called MC++, currently computes k and α eigenvalues of the neutron transport equation on a rectilinear computational mesh. It is portable to and runs in parallel on a wide variety of platforms, including MPPs, clustered SMPs, and individual workstations. It contains appropriate classes and abstractions for particle transport and, through the use of POOMA, for portable parallelism. Current capabilities are discussed, along with physics and performance results for several test problems on a variety of hardware, including all three Accelerated Strategic Computing Initiative (ASCI) platforms. Current parallel performance indicates the ability to compute α-eigenvalues in seconds or minutes rather than days or weeks. Current and future work on the implementation of a general transport physics framework (TPF) is also described. This TPF employs modern C++ programming techniques to provide simplified user interfaces, generic STL-style programming, and compile-time performance optimization. Physics capabilities of the TPF will be extended to include continuous energy treatments, implicit Monte Carlo algorithms, and a variety of convergence acceleration techniques such as importance combing.
Fast iterative image reconstruction using sparse matrix factorization with GPU acceleration
NASA Astrophysics Data System (ADS)
Zhou, Jian; Qi, Jinyi
2011-03-01
Statistically based iterative approaches for image reconstruction have gained much attention in medical imaging. An accurate system matrix that defines the mapping from the image space to the data space is the key to high-resolution image reconstruction. However, an accurate system matrix is often associated with high computational cost and huge storage requirement. Here we present a method to address this problem by using sparse matrix factorization and parallel computing on a graphic processing unit (GPU).We factor the accurate system matrix into three sparse matrices: a sinogram blurring matrix, a geometric projection matrix, and an image blurring matrix. The sinogram blurring matrix models the detector response. The geometric projection matrix is based on a simple line integral model. The image blurring matrix is to compensate for the line-of-response (LOR) degradation due to the simplified geometric projection matrix. The geometric projection matrix is precomputed, while the sinogram and image blurring matrices are estimated by minimizing the difference between the factored system matrix and the original system matrix. The resulting factored system matrix has much less number of nonzero elements than the original system matrix and thus substantially reduces the storage and computation cost. The smaller size also allows an efficient implement of the forward and back projectors on GPUs, which have limited amount of memory. Our simulation studies show that the proposed method can dramatically reduce the computation cost of high-resolution iterative image reconstruction. The proposed technique is applicable to image reconstruction for different imaging modalities, including x-ray CT, PET, and SPECT.
NASA Technical Reports Server (NTRS)
Gorospe, George E., Jr.; Daigle, Matthew J.; Sankararaman, Shankar; Kulkarni, Chetan S.; Ng, Eley
2017-01-01
Prognostic methods enable operators and maintainers to predict the future performance for critical systems. However, these methods can be computationally expensive and may need to be performed each time new information about the system becomes available. In light of these computational requirements, we have investigated the application of graphics processing units (GPUs) as a computational platform for real-time prognostics. Recent advances in GPU technology have reduced cost and increased the computational capability of these highly parallel processing units, making them more attractive for the deployment of prognostic software. We present a survey of model-based prognostic algorithms with considerations for leveraging the parallel architecture of the GPU and a case study of GPU-accelerated battery prognostics with computational performance results.
A Parallel Trade Study Architecture for Design Optimization of Complex Systems
NASA Technical Reports Server (NTRS)
Kim, Hongman; Mullins, James; Ragon, Scott; Soremekun, Grant; Sobieszczanski-Sobieski, Jaroslaw
2005-01-01
Design of a successful product requires evaluating many design alternatives in a limited design cycle time. This can be achieved through leveraging design space exploration tools and available computing resources on the network. This paper presents a parallel trade study architecture to integrate trade study clients and computing resources on a network using Web services. The parallel trade study solution is demonstrated to accelerate design of experiments, genetic algorithm optimization, and a cost as an independent variable (CAIV) study for a space system application.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A.; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Particle acceleration by quasi-parallel shocks in the solar wind
NASA Astrophysics Data System (ADS)
Galinsky, V. L.; Shevchenko, V. I.
2008-11-01
The theoretical study of proton acceleration at a quasi-parallel shock due to interaction with Alfven waves self-consistently excited in both upstream and downstream regions was conducted using a scale-separation model [1]. The model uses conservation laws and resonance conditions to find where waves will be generated or dumped and hence particles will be pitch--angle scattered as well as the change of the wave energy due to instability or damping. It includes in consideration the total distribution function (the bulk plasma and high energy tail), so no any assumptions (e.g. seed populations, or some ad-hoc escape rate of accelerated particles) are required. The dynamics of ion acceleration by the November 11-12, 1978 interplanetary traveling shock was investigated and compared with the observations [2] as well as with solution obtained using the so-called convection-diffusion equation for distribution function of accelerated particles [3]. [1] Galinsky, V.L., and V.I. Shevchenko, Astrophys. J., 669, L109, 2007. [2] Kennel, C.F., F.W. Coroniti, F.L. Scarf, W.A. Livesey, C.T. Russell, E.J. Smith, K.P. Wenzel, and M. Scholer, J. Geophys. Res. 91, 11,917, 1986. [3] Gordon B.E., M.A. Lee, E. Mobius, and K.J. Trattner, J. Geophys. Res., 104, 28,263, 1990.
High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patlolla, Dilip R; Surendran Nair, Sujithkumar; Graves, Daniel A.
For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, themore » estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasti, D.E.; Ramirez, J.J.; Coleman, P.D.
1985-01-01
The Megamp Accelerator and Beam Experiment (MABE) was the technology development testbed for the multiple beam, linear induction accelerator approach for Hermes III, a new 20 MeV, 0.8 MA, 40 ns accelerator being developed at Sandia for gamma-ray simulation. Experimental studies of a high-current, single-beam accelerator (8 MeV, 80 kA), and a nine-beam injector (1.4 MeV, 25 kA/beam) have been completed, and experiments on a nine-beam linear induction accelerator are in progress. A two-beam linear induction accelerator is designed and will be built as a gamma-ray simulator to be used in parallel with Hermes III. The MABE pulsed power systemmore » and accelerator for the multiple beam experiments is described. Results from these experiments and the two-beam design are discussed. 11 refs., 6 figs.« less
Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai; ...
2017-10-10
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we presentmore » the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. Furthermore, the simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.« less
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we presentmore » the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. Furthermore, the simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.« less
Shock Acceleration of Solar Energetic Protons: The First 10 Minutes
NASA Technical Reports Server (NTRS)
Ng, Chee K.; Reames, Donald V.
2008-01-01
Proton acceleration at a parallel coronal shock is modeled with self-consistent Alfven wave excitation and shock transmission. 18 - 50 keV seed protons at 0.1% of plasma proton density are accelerated in 10 minutes to a power-law intensity spectrum rolling over at 300 MeV by a 2500km s-1 shock traveling outward from 3.5 solar radius, for typical coronal conditions and low ambient wave intensities. Interaction of high-energy protons of large pitch-angles with Alfven waves amplified by low-energy protons of small pitch angles is key to rapid acceleration. Shock acceleration is not significantly retarded by sunward streaming protons interacting with downstream waves. There is no significant second-order Fermi acceleration.
Anatomy of a Venusian hot spot - Geology, gravity, and mantle dynamics of Eistla Regio
NASA Technical Reports Server (NTRS)
Grimm, Robert E.; Phillips, Roger J.
1992-01-01
Results of a study of the western and central portions of the Venusian hot spot Eistla Regio are presented. Magellan radar images were mapped to elucidate the general geologic history of the region. Radial fracture systems both on the rises and volcanoes indicate that uplift and associated faulting accompanied volcanic construction. Prominent fracture zones strike WNW to NW, parallel to the long axis of the highlands. The largest of these, Guor Linea, exhibits a progressive deformation history that may include minor clockwise rotation in addition to bulk NNE-SSW extension. Pioneer Venus line-of-sight accelerations were inverted for vertical gravity which, when combined with topography, were used to solve for mass anomalies on the crust-mantle boundary and in the upper levels of the mantle convective system.
Real-time computation of parameter fitting and image reconstruction using graphical processing units
NASA Astrophysics Data System (ADS)
Locans, Uldis; Adelmann, Andreas; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Günther; Wang, Qiulin
2017-06-01
In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup. During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version were more than × 40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.
Nana, Roger; Hu, Xiaoping
2010-01-01
k-space-based reconstruction in parallel imaging depends on the reconstruction kernel setting, including its support. An optimal choice of the kernel depends on the calibration data, coil geometry and signal-to-noise ratio, as well as the criterion used. In this work, data consistency, imposed by the shift invariance requirement of the kernel, is introduced as a goodness measure of k-space-based reconstruction in parallel imaging and demonstrated. Data consistency error (DCE) is calculated as the sum of squared difference between the acquired signals and their estimates obtained based on the interpolation of the estimated missing data. A resemblance between DCE and the mean square error in the reconstructed image was found, demonstrating DCE's potential as a metric for comparing or choosing reconstructions. When used for selecting the kernel support for generalized autocalibrating partially parallel acquisition (GRAPPA) reconstruction and the set of frames for calibration as well as the kernel support in temporal GRAPPA reconstruction, DCE led to improved images over existing methods. Data consistency error is efficient to evaluate, robust for selecting reconstruction parameters and suitable for characterizing and optimizing k-space-based reconstruction in parallel imaging.
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Zhu, Xiang; Zhang, Dianwen
2013-01-01
We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
Filli, Lukas; Piccirelli, Marco; Kenkel, David; Boss, Andreas; Manoliu, Andrei; Andreisek, Gustav; Bhat, Himanshu; Runge, Val M; Guggenberger, Roman
2016-06-01
To investigate the feasibility of MR diffusion tensor imaging (DTI) of the median nerve using simultaneous multi-slice echo planar imaging (EPI) with blipped CAIPIRINHA. After federal ethics board approval, MR imaging of the median nerves of eight healthy volunteers (mean age, 29.4 years; range, 25-32) was performed at 3 T using a 16-channel hand/wrist coil. An EPI sequence (b-value, 1,000 s/mm(2); 20 gradient directions) was acquired without acceleration as well as with twofold and threefold slice acceleration. Fractional anisotropy (FA), mean diffusivity (MD) and quality of nerve tractography (number of tracks, average track length, track homogeneity, anatomical accuracy) were compared between the acquisitions using multivariate ANOVA and the Kruskal-Wallis test. Acquisition time was 6:08 min for standard DTI, 3:38 min for twofold and 2:31 min for threefold acceleration. No differences were found regarding FA (standard DTI: 0.620 ± 0.058; twofold acceleration: 0.642 ± 0.058; threefold acceleration: 0.644 ± 0.061; p ≥ 0.217) and MD (standard DTI: 1.076 ± 0.080 mm(2)/s; twofold acceleration: 1.016 ± 0.123 mm(2)/s; threefold acceleration: 0.979 ± 0.153 mm(2)/s; p ≥ 0.074). Twofold acceleration yielded similar tractography quality compared to standard DTI (p > 0.05). With threefold acceleration, however, average track length and track homogeneity decreased (p = 0.004-0.021). Accelerated DTI of the median nerve is feasible. Twofold acceleration yields similar results to standard DTI. • Standard DTI of the median nerve is limited by its long acquisition time. • Simultaneous multi-slice acquisition is a new technique for accelerated DTI. • Accelerated DTI of the median nerve yields similar results to standard DTI.
Miniature plasma accelerating detonator and method of detonating insensitive materials
Bickes, R.W. Jr.; Kopczewski, M.R.; Schwarz, A.C.
1985-01-04
The invention is a detonator for use with high explosives. The detonator comprises a pair of parallel rail electrodes connected to a power supply. By shorting the electrodes at one end, a plasma is generated and accelerated toward the other end to impact against explosives. A projectile can be arranged between the rails to be accelerated by the plasma. An alternative arrangement is to a coaxial electrode construction. The invention also relates to a method of detonating explosives. 3 figs.
Miniature plasma accelerating detonator and method of detonating insensitive materials
Bickes, Jr., Robert W.; Kopczewski, Michael R.; Schwarz, Alfred C.
1986-01-01
The invention is a detonator for use with high explosives. The detonator comprises a pair of parallel rail electrodes connected to a power supply. By shorting the electrodes at one end, a plasma is generated and accelerated toward the other end to impact against explosives. A projectile can be arranged between the rails to be accelerated by the plasma. An alternative arrangement is to a coaxial electrode construction. The invention also relates to a method of detonating explosives.
NASA Technical Reports Server (NTRS)
Rowland, H. L.; Palmadesso, P. J.
1983-01-01
Large amplitude ion cyclotron waves have been observed on auroral field lines. In the presence of an electric field parallel to the ambient magnetic field these waves prevent the acceleration of the bulk of the plasma electrons leading to the formation of a runaway tail. It is shown that low-frequency turbulence can also limit the acceleration of high-velocity runaway electrons via pitch angle scattering at the anomalous Doppler resonance.
On the modulation of the Jovian decametric radiation by Io. I - Acceleration of charged particles
NASA Technical Reports Server (NTRS)
Smith, R. A.; Goertz, C. K.
1978-01-01
A steady-state analysis of the current circuit between Io and the Jovian ionosphere is performed, assuming that the current is carried by electrons accelerated through potential double layers in the Io flux tube. The circuit analysis indicates that electrons may be accelerated up to energies of several hundred keV. Several problems associated with the formation of double layers are also discussed. The parallel potential drops decouple the flux tube from the satellite's orbital motion.
Online gamma-camera imaging of 103Pd seeds (OGIPS) for permanent breast seed implantation
NASA Astrophysics Data System (ADS)
Ravi, Ananth; Caldwell, Curtis B.; Keller, Brian M.; Reznik, Alla; Pignol, Jean-Philippe
2007-09-01
Permanent brachytherapy seed implantation is being investigated as a mode of accelerated partial breast irradiation for early stage breast cancer patients. Currently, the seeds are poorly visualized during the procedure making it difficult to perform a real-time correction of the implantation if required. The objective was to determine if a customized gamma-camera can accurately localize the seeds during implantation. Monte Carlo simulations of a CZT based gamma-camera were used to assess whether images of suitable quality could be derived by detecting the 21 keV photons emitted from 74 MBq 103Pd brachytherapy seeds. A hexagonal parallel hole collimator with a hole length of 38 mm, hole diameter of 1.2 mm and 0.2 mm septa, was modeled. The design of the gamma-camera was evaluated on a realistic model of the breast and three layers of the seed distribution (55 seeds) based on a pre-implantation CT treatment plan. The Monte Carlo simulations showed that the gamma-camera was able to localize the seeds with a maximum error of 2.0 mm, using only two views and 20 s of imaging. A gamma-camera can potentially be used as an intra-procedural image guidance system for quality assurance for permanent breast seed implantation.
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
A review of snapshot multidimensional optical imaging: measuring photon tags in parallel
Gao, Liang; Wang, Lihong V.
2015-01-01
Multidimensional optical imaging has seen remarkable growth in the past decade. Rather than measuring only the two-dimensional spatial distribution of light, as in conventional photography, multidimensional optical imaging captures light in up to nine dimensions, providing unprecedented information about incident photons’ spatial coordinates, emittance angles, wavelength, time, and polarization. Multidimensional optical imaging can be accomplished either by scanning or parallel acquisition. Compared with scanning-based imagers, parallel acquisition—also dubbed snapshot imaging—has a prominent advantage in maximizing optical throughput, particularly when measuring a datacube of high dimensions. Here, we first categorize snapshot multidimensional imagers based on their acquisition and image reconstruction strategies, then highlight the snapshot advantage in the context of optical throughput, and finally we discuss their state-of-the-art implementations and applications. PMID:27134340
NASA Astrophysics Data System (ADS)
Vnukov, A. A.; Shershnev, M. B.
2018-01-01
The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
GPU COMPUTING FOR PARTICLE TRACKING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishimura, Hiroshi; Song, Kai; Muriki, Krishna
2011-03-25
This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less
Wang, Zihao; Chen, Yu; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Liu, Zhiyong; Sun, Fei; Zhang, Fa
2018-03-01
Electron tomography (ET) is an important technique for studying the three-dimensional structures of the biological ultrastructure. Recently, ET has reached sub-nanometer resolution for investigating the native and conformational dynamics of macromolecular complexes by combining with the sub-tomogram averaging approach. Due to the limited sampling angles, ET reconstruction typically suffers from the "missing wedge" problem. Using a validation procedure, iterative compressed-sensing optimized nonuniform fast Fourier transform (NUFFT) reconstruction (ICON) demonstrates its power in restoring validated missing information for a low-signal-to-noise ratio biological ET dataset. However, the huge computational demand has become a bottleneck for the application of ICON. In this work, we implemented a parallel acceleration technology ICON-many integrated core (MIC) on Xeon Phi cards to address the huge computational demand of ICON. During this step, we parallelize the element-wise matrix operations and use the efficient summation of a matrix to reduce the cost of matrix computation. We also developed parallel versions of NUFFT on MIC to achieve a high acceleration of ICON by using more efficient fast Fourier transform (FFT) calculation. We then proposed a hybrid task allocation strategy (two-level load balancing) to improve the overall performance of ICON-MIC by making full use of the idle resources on Tianhe-2 supercomputer. Experimental results using two different datasets show that ICON-MIC has high accuracy in biological specimens under different noise levels and a significant acceleration, up to 13.3 × , compared with the CPU version. Further, ICON-MIC has good scalability efficiency and overall performance on Tianhe-2 supercomputer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bai, T; UT Southwestern Medical Center, Dallas, TX; Yan, H
2014-06-15
Purpose: To develop a 3D dictionary learning based statistical reconstruction algorithm on graphic processing units (GPU), to improve the quality of low-dose cone beam CT (CBCT) imaging with high efficiency. Methods: A 3D dictionary containing 256 small volumes (atoms) of 3x3x3 voxels was trained from a high quality volume image. During reconstruction, we utilized a Cholesky decomposition based orthogonal matching pursuit algorithm to find a sparse representation on this dictionary basis of each patch in the reconstructed image, in order to regularize the image quality. To accelerate the time-consuming sparse coding in the 3D case, we implemented our algorithm inmore » a parallel fashion by taking advantage of the tremendous computational power of GPU. Evaluations are performed based on a head-neck patient case. FDK reconstruction with full dataset of 364 projections is used as the reference. We compared the proposed 3D dictionary learning based method with a tight frame (TF) based one using a subset data of 121 projections. The image qualities under different resolutions in z-direction, with or without statistical weighting are also studied. Results: Compared to the TF-based CBCT reconstruction, our experiments indicated that 3D dictionary learning based CBCT reconstruction is able to recover finer structures, to remove more streaking artifacts, and is less susceptible to blocky artifacts. It is also observed that statistical reconstruction approach is sensitive to inconsistency between the forward and backward projection operations in parallel computing. Using high a spatial resolution along z direction helps improving the algorithm robustness. Conclusion: 3D dictionary learning based CBCT reconstruction algorithm is able to sense the structural information while suppressing noise, and hence to achieve high quality reconstruction. The GPU realization of the whole algorithm offers a significant efficiency enhancement, making this algorithm more feasible for potential clinical application. A high zresolution is preferred to stabilize statistical iterative reconstruction. This work was supported in part by NIH(1R01CA154747-01), NSFC((No. 61172163), Research Fund for the Doctoral Program of Higher Education of China (No. 20110201110011), China Scholarship Council.« less
NASA Astrophysics Data System (ADS)
Wang, Ting; Plecháč, Petr
2017-12-01
Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuo, Wangda; McNeil, Andrew; Wetter, Michael
2013-05-23
Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach wasmore » evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.« less
Precision Parameter Estimation and Machine Learning
NASA Astrophysics Data System (ADS)
Wandelt, Benjamin D.
2008-12-01
I discuss the strategy of ``Acceleration by Parallel Precomputation and Learning'' (AP-PLe) that can vastly accelerate parameter estimation in high-dimensional parameter spaces and costly likelihood functions, using trivially parallel computing to speed up sequential exploration of parameter space. This strategy combines the power of distributed computing with machine learning and Markov-Chain Monte Carlo techniques efficiently to explore a likelihood function, posterior distribution or χ2-surface. This strategy is particularly successful in cases where computing the likelihood is costly and the number of parameters is moderate or large. We apply this technique to two central problems in cosmology: the solution of the cosmological parameter estimation problem with sufficient accuracy for the Planck data using PICo; and the detailed calculation of cosmological helium and hydrogen recombination with RICO. Since the APPLe approach is designed to be able to use massively parallel resources to speed up problems that are inherently serial, we can bring the power of distributed computing to bear on parameter estimation problems. We have demonstrated this with the CosmologyatHome project.
Acceleration of discrete stochastic biochemical simulation using GPGPU.
Sumiyoshi, Kei; Hirata, Kazuki; Hiroi, Noriko; Funahashi, Akira
2015-01-01
For systems made up of a small number of molecules, such as a biochemical network in a single cell, a simulation requires a stochastic approach, instead of a deterministic approach. The stochastic simulation algorithm (SSA) simulates the stochastic behavior of a spatially homogeneous system. Since stochastic approaches produce different results each time they are used, multiple runs are required in order to obtain statistical results; this results in a large computational cost. We have implemented a parallel method for using SSA to simulate a stochastic model; the method uses a graphics processing unit (GPU), which enables multiple realizations at the same time, and thus reduces the computational time and cost. During the simulation, for the purpose of analysis, each time course is recorded at each time step. A straightforward implementation of this method on a GPU is about 16 times faster than a sequential simulation on a CPU with hybrid parallelization; each of the multiple simulations is run simultaneously, and the computational tasks within each simulation are parallelized. We also implemented an improvement to the memory access and reduced the memory footprint, in order to optimize the computations on the GPU. We also implemented an asynchronous data transfer scheme to accelerate the time course recording function. To analyze the acceleration of our implementation on various sizes of model, we performed SSA simulations on different model sizes and compared these computation times to those for sequential simulations with a CPU. When used with the improved time course recording function, our method was shown to accelerate the SSA simulation by a factor of up to 130.
Acceleration of discrete stochastic biochemical simulation using GPGPU
Sumiyoshi, Kei; Hirata, Kazuki; Hiroi, Noriko; Funahashi, Akira
2015-01-01
For systems made up of a small number of molecules, such as a biochemical network in a single cell, a simulation requires a stochastic approach, instead of a deterministic approach. The stochastic simulation algorithm (SSA) simulates the stochastic behavior of a spatially homogeneous system. Since stochastic approaches produce different results each time they are used, multiple runs are required in order to obtain statistical results; this results in a large computational cost. We have implemented a parallel method for using SSA to simulate a stochastic model; the method uses a graphics processing unit (GPU), which enables multiple realizations at the same time, and thus reduces the computational time and cost. During the simulation, for the purpose of analysis, each time course is recorded at each time step. A straightforward implementation of this method on a GPU is about 16 times faster than a sequential simulation on a CPU with hybrid parallelization; each of the multiple simulations is run simultaneously, and the computational tasks within each simulation are parallelized. We also implemented an improvement to the memory access and reduced the memory footprint, in order to optimize the computations on the GPU. We also implemented an asynchronous data transfer scheme to accelerate the time course recording function. To analyze the acceleration of our implementation on various sizes of model, we performed SSA simulations on different model sizes and compared these computation times to those for sequential simulations with a CPU. When used with the improved time course recording function, our method was shown to accelerate the SSA simulation by a factor of up to 130. PMID:25762936
Deep learning for medical image segmentation - using the IBM TrueNorth neurosynaptic system
NASA Astrophysics Data System (ADS)
Moran, Steven; Gaonkar, Bilwaj; Whitehead, William; Wolk, Aidan; Macyszyn, Luke; Iyer, Subramanian S.
2018-03-01
Deep convolutional neural networks have found success in semantic image segmentation tasks in computer vision and medical imaging. These algorithms are executed on conventional von Neumann processor architectures or GPUs. This is suboptimal. Neuromorphic processors that replicate the structure of the brain are better-suited to train and execute deep learning models for image segmentation by relying on massively-parallel processing. However, given that they closely emulate the human brain, on-chip hardware and digital memory limitations also constrain them. Adapting deep learning models to execute image segmentation tasks on such chips, requires specialized training and validation. In this work, we demonstrate for the first-time, spinal image segmentation performed using a deep learning network implemented on neuromorphic hardware of the IBM TrueNorth Neurosynaptic System and validate the performance of our network by comparing it to human-generated segmentations of spinal vertebrae and disks. To achieve this on neuromorphic hardware, the training model constrains the coefficients of individual neurons to {-1,0,1} using the Energy Efficient Deep Neuromorphic (EEDN)1 networks training algorithm. Given the 1 million neurons and 256 million synapses, the scale and size of the neural network implemented by the IBM TrueNorth allows us to execute the requisite mapping between segmented images and non-uniform intensity MR images >20 times faster than on a GPU-accelerated network and using <0.1 W. This speed and efficiency implies that a trained neuromorphic chip can be deployed in intra-operative environments where real-time medical image segmentation is necessary.
Heinrich, Andreas; Teichgräber, Ulf K; Güttler, Felix V
2015-12-01
The standard ASTM F2119 describes a test method for measuring the size of a susceptibility artifact based on the example of a passive implant. A pixel in an image is considered to be a part of an image artifact if the intensity is changed by at least 30% in the presence of a test object, compared to a reference image in which the test object is absent (reference value). The aim of this paper is to simplify and accelerate the test method using a histogram-based reference value. Four test objects were scanned parallel and perpendicular to the main magnetic field, and the largest susceptibility artifacts were measured using two methods of reference value determination (reference image-based and histogram-based reference value). The results between both methods were compared using the Mann-Whitney U-test. The difference between both reference values was 42.35 ± 23.66. The difference of artifact size was 0.64 ± 0.69 mm. The artifact sizes of both methods did not show significant differences; the p-value of the Mann-Whitney U-test was between 0.710 and 0.521. A standard-conform method for a rapid, objective, and reproducible evaluation of susceptibility artifacts could be implemented. The result of the histogram-based method does not significantly differ from the ASTM-conform method.
Cooled particle accelerator target
Degtiarenko, Pavel V.
2005-06-14
A novel particle beam target comprising: a rotating target disc mounted on a retainer and thermally coupled to a first array of spaced-apart parallel plate fins that extend radially inwardly from the retainer and mesh without physical contact with a second array of spaced-apart parallel plate fins that extend radially outwardly from and are thermally coupled to a cooling mechanism capable of removing heat from said second array of spaced-apart fins and located within the first array of spaced-apart parallel fins. Radiant thermal exchange between the two arrays of parallel plate fins provides removal of heat from the rotating disc. A method of cooling the rotating target is also described.
Parallel and Serial Grouping of Image Elements in Visual Perception
ERIC Educational Resources Information Center
Houtkamp, Roos; Roelfsema, Pieter R.
2010-01-01
The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some…
Non-Thermal Spectra from Pulsar Magnetospheres in the Full Electromagnetic Cascade Scenario
NASA Astrophysics Data System (ADS)
Peng, Qi-Yong; Zhang, Li
2008-08-01
We simulated non-thermal emission from a pulsar magnetosphere within the framework of a full polar-cap cascade scenario by taking the acceleration gap into account, using the Monte Carlo method. For a given electric field parallel to open field lines located at some height above the surface of a neutron star, primary electrons were accelerated by parallel electric fields and lost their energies by curvature radiation; these photons were converted to electron-positron pairs, which emitted photons through subsequent quantum synchrotron radiation and inverse Compton scattering, leading to a cascade. In our calculations, the acceleration gap was assumed to be high above the stellar surface (about several stellar radii); the primary and secondary particles and photons emitted during the journey of those particles in the magnetosphere were traced using the Monte Carlo method. In such a scenario, we calculated the non-thermal photon spectra for different pulsar parameters and compared the model results for two normal pulsars and one millisecond pulsar with the observed data.
GPU accelerated particle visualization with Splotch
NASA Astrophysics Data System (ADS)
Rivi, M.; Gheller, C.; Dykes, T.; Krokos, M.; Dolag, K.
2014-07-01
Splotch is a rendering algorithm for exploration and visual discovery in particle-based datasets coming from astronomical observations or numerical simulations. The strengths of the approach are production of high quality imagery and support for very large-scale datasets through an effective mix of the OpenMP and MPI parallel programming paradigms. This article reports our experiences in re-designing Splotch for exploiting emerging HPC architectures nowadays increasingly populated with GPUs. A performance model is introduced to guide our re-factoring of Splotch. A number of parallelization issues are discussed, in particular relating to race conditions and workload balancing, towards achieving optimal performances. Our implementation was accomplished by using the CUDA programming paradigm. Our strategy is founded on novel schemes achieving optimized data organization and classification of particles. We deploy a reference cosmological simulation to present performance results on acceleration gains and scalability. We finally outline our vision for future work developments including possibilities for further optimizations and exploitation of hybrid systems and emerging accelerators.
In plane oscillation of a bifilar pendulum
NASA Astrophysics Data System (ADS)
Hinrichsen, Peter F.
2016-11-01
The line tensions, the horizontal and vertical accelerations as well as the period of large angle oscillations parallel to the plane of a bifilar suspension are presented and have been experimentally investigated using strain gauges and a smart phone. This system has a number of advantages over the simple pendulum for studying large angle oscillations, and for measuring the acceleration due to gravity.
Introduction of Parallel GPGPU Acceleration Algorithms for the Solution of Radiative Transfer
NASA Technical Reports Server (NTRS)
Godoy, William F.; Liu, Xu
2011-01-01
General-purpose computing on graphics processing units (GPGPU) is a recent technique that allows the parallel graphics processing unit (GPU) to accelerate calculations performed sequentially by the central processing unit (CPU). To introduce GPGPU to radiative transfer, the Gauss-Seidel solution of the well-known expressions for 1-D and 3-D homogeneous, isotropic media is selected as a test case. Different algorithms are introduced to balance memory and GPU-CPU communication, critical aspects of GPGPU. Results show that speed-ups of one to two orders of magnitude are obtained when compared to sequential solutions. The underlying value of GPGPU is its potential extension in radiative solvers (e.g., Monte Carlo, discrete ordinates) at a minimal learning curve.
Accelerating Climate and Weather Simulations through Hybrid Computing
NASA Technical Reports Server (NTRS)
Zhou, Shujia; Cruz, Carlos; Duffy, Daniel; Tucker, Robert; Purcell, Mark
2011-01-01
Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead.
Acceleration of barium ions near 8000 km above an aurora
NASA Technical Reports Server (NTRS)
Stenbaek-Nielsen, H. C.; Hallinan, T. J.; Wescott, E. M.; Foeppl, H.
1984-01-01
A barium shaped charge, named Limerick, was released from a rocket launched from Poker Flat Research Range, Alaska, on March 30, 1982, at 1033 UT. The release took place in a small auroral breakup. The jet of ionized barium reached an altitude of 8100 km 14.5 min after release, indicating that there were no parallel electric fields below this altitude. At 8100 km the jet appeared to stop. Analysis shows that the barium at this altitude was effectively removed from the tip. It is concluded that the barium was actually accelerated upward, resulting in a large decrease in the line-of-sight density and hence the optical intensity. The parallel electric potential in the acceleration region must have been greater than 1 kV over an altitude interval of less than 200 km. The acceleration region, although presumably auroral in origin, did not seem to be related to individual auroral structures, but appeared to be a large-scale horizontal structure. The perpendicular electric field below, as deduced from the drift of the barium, was temporally and spatially very uniform and showed no variation related to individual auroral structures passing through.
NASA Astrophysics Data System (ADS)
Wright, Andrew N.; Allan, W.; Ruderman, Michael S.; Elphic, R. C.
2002-07-01
The acceleration of current carriers in an Alfvén wave current system is considered. The model incorporates a dipole magnetic field geometry, and we present an analytical solution of the two-fluid equations by successive approximations. The leading solution corresponds to the familiar single-fluid toroidal oscillations. The next order describes the nonlinear dynamics of electrons responsible for carrying a few μAm-2 field aligned current into the ionosphere. The solution shows how most of the electron acceleration in the magnetosphere occurs within 1 RE of the ionosphere, and that a parallel electric field of the order of 1 mVm-1 is responsible for energising the electrons to 1 keV. The limitations of the electron fluid approximation are considered, and a qualitative solution including electron beams and a modified E∥ is developed in accord with observations. We find that the electron acceleration can be nonlinear, (ve∥∇∥)ve∥ > ωve∥, as a result of our nonuniform equilibrium field geometry even when ve∥ is less than the Alfvén speed. Our calculation also elucidates the processes through which E∥ is generated and supported.
Accelerating sino-atrium computer simulations with graphic processing units.
Zhang, Hong; Xiao, Zheng; Lin, Shien-fong
2015-01-01
Sino-atrial node cells (SANCs) play a significant role in rhythmic firing. To investigate their role in arrhythmia and interactions with the atrium, computer simulations based on cellular dynamic mathematical models are generally used. However, the large-scale computation usually makes research difficult, given the limited computational power of Central Processing Units (CPUs). In this paper, an accelerating approach with Graphic Processing Units (GPUs) is proposed in a simulation consisting of the SAN tissue and the adjoining atrium. By using the operator splitting method, the computational task was made parallel. Three parallelization strategies were then put forward. The strategy with the shortest running time was further optimized by considering block size, data transfer and partition. The results showed that for a simulation with 500 SANCs and 30 atrial cells, the execution time taken by the non-optimized program decreased 62% with respect to a serial program running on CPU. The execution time decreased by 80% after the program was optimized. The larger the tissue was, the more significant the acceleration became. The results demonstrated the effectiveness of the proposed GPU-accelerating methods and their promising applications in more complicated biological simulations.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].
Furuta, Takuya; Sato, Tatsuhiko
2015-01-01
Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
Traveling wave linear accelerator with RF power flow outside of accelerating cavities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolgashev, Valery A.
A high power RF traveling wave accelerator structure includes a symmetric RF feed, an input matching cell coupled to the symmetric RF feed, a sequence of regular accelerating cavities coupled to the input matching cell at an input beam pipe end of the sequence, one or more waveguides parallel to and coupled to the sequence of regular accelerating cavities, an output matching cell coupled to the sequence of regular accelerating cavities at an output beam pipe end of the sequence, and output waveguide circuit or RF loads coupled to the output matching cell. Each of the regular accelerating cavities hasmore » a nose cone that cuts off field propagating into the beam pipe and therefore all power flows in a traveling wave along the structure in the waveguide.« less
NASA Astrophysics Data System (ADS)
van Marle, Allard Jan; Casse, Fabien; Marcowith, Alexandre
2018-01-01
We present simulations of magnetized astrophysical shocks taking into account the interplay between the thermal plasma of the shock and suprathermal particles. Such interaction is depicted by combining a grid-based magnetohydrodynamics description of the thermal fluid with particle in cell techniques devoted to the dynamics of suprathermal particles. This approach, which incorporates the use of adaptive mesh refinement features, is potentially a key to simulate astrophysical systems on spatial scales that are beyond the reach of pure particle-in-cell simulations. We consider in this study non-relativistic shocks with various Alfvénic Mach numbers and magnetic field obliquity. We recover all the features of both magnetic field amplification and particle acceleration from previous studies when the magnetic field is parallel to the normal to the shock. In contrast with previous particle-in-cell-hybrid simulations, we find that particle acceleration and magnetic field amplification also occur when the magnetic field is oblique to the normal to the shock but on larger time-scales than in the parallel case. We show that in our simulations, the suprathermal particles are experiencing acceleration thanks to a pre-heating process of the particle similar to a shock drift acceleration leading to the corrugation of the shock front. Such oscillations of the shock front and the magnetic field locally help the particles to enter the upstream region and to initiate a non-resonant streaming instability and finally to induce diffuse particle acceleration.
Effects of mechanostimulation on gravitropism and signal persistence in flax roots.
John, Susan P; Hasenstein, Karl H
2011-09-01
Gravitropism describes curvature of plants in response to gravity or differential acceleration and clinorotation is commonly used to compensate unilateral effect of gravity. We report on experiments that examine the persistence of the gravity signal and separate mechanostimulation from gravistimulation. Flax roots were reoriented (placed horizontally for 5, 10 or 15 min) and clinorotated at a rate of 0.5 to 5 rpm either vertically (parallel to the gravity vector and root axis) or horizontally (perpendicular to the gravity vector and parallel to the root axis). Image sequences showed that horizontal clinorotation did not affect root growth rate (0.81 ± 0.03 mm h-1) but vertical clinorotation reduced root growth by about 7%. The angular velocity (speed of clinorotation) did not affect growth for either direction. However, maximal curvature for vertical clinorotation decreased with increasing rate of rotation and produced straight roots at 5 rpm. In contrast, horizontal clinorotation increased curvature with increasing angular velocity. The point of maximal curvature was used to determine the longevity (memory) of the gravity signal, which lasted about 120 min. The data indicate that mechanostimulation modifies the magnitude of the graviresponse but does not affect memory persistence.
Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
Efficient Geometric Sound Propagation Using Visibility Culling
NASA Astrophysics Data System (ADS)
Chandak, Anish
2011-07-01
Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario.
NASA Astrophysics Data System (ADS)
Yu, Zhongzhi; Liu, Shaocong; Sun, Shiyi; Kuang, Cuifang; Liu, Xu
2018-06-01
Parallel detection, which can use the additional information of a pinhole plane image taken at every excitation scan position, could be an efficient method to enhance the resolution of a confocal laser scanning microscope. In this paper, we discuss images obtained under different conditions and using different image restoration methods with parallel detection to quantitatively compare the imaging quality. The conditions include different noise levels and different detector array settings. The image restoration methods include linear deconvolution and pixel reassignment with Richard-Lucy deconvolution and with maximum-likelihood estimation deconvolution. The results show that the linear deconvolution share properties such as high-efficiency and the best performance under all different conditions, and is therefore expected to be of use for future biomedical routine research.
Parallel image logical operations using cross correlation
NASA Technical Reports Server (NTRS)
Strong, J. P., III
1972-01-01
Methods are presented for counting areas in an image in a parallel manner using noncoherent optical techniques. The techniques presented include the Levialdi algorithm for counting, optical techniques for binary operations, and cross-correlation.
GPU accelerated generation of digitally reconstructed radiographs for 2-D/3-D image registration.
Dorgham, Osama M; Laycock, Stephen D; Fisher, Mark H
2012-09-01
Recent advances in programming languages for graphics processing units (GPUs) provide developers with a convenient way of implementing applications which can be executed on the CPU and GPU interchangeably. GPUs are becoming relatively cheap, powerful, and widely available hardware components, which can be used to perform intensive calculations. The last decade of hardware performance developments shows that GPU-based computation is progressing significantly faster than CPU-based computation, particularly if one considers the execution of highly parallelisable algorithms. Future predictions illustrate that this trend is likely to continue. In this paper, we introduce a way of accelerating 2-D/3-D image registration by developing a hybrid system which executes on the CPU and utilizes the GPU for parallelizing the generation of digitally reconstructed radiographs (DRRs). Based on the advancements of the GPU over the CPU, it is timely to exploit the benefits of many-core GPU technology by developing algorithms for DRR generation. Although some previous work has investigated the rendering of DRRs using the GPU, this paper investigates approximations which reduce the computational overhead while still maintaining a quality consistent with that needed for 2-D/3-D registration with sufficient accuracy to be clinically acceptable in certain applications of radiation oncology. Furthermore, by comparing implementations of 2-D/3-D registration on the CPU and GPU, we investigate current performance and propose an optimal framework for PC implementations addressing the rigid registration problem. Using this framework, we are able to render DRR images from a 256×256×133 CT volume in ~24 ms using an NVidia GeForce 8800 GTX and in ~2 ms using NVidia GeForce GTX 580. In addition to applications requiring fast automatic patient setup, these levels of performance suggest image-guided radiation therapy at video frame rates is technically feasible using relatively low cost PC architecture.
GPU-accelerated adjoint algorithmic differentiation
NASA Astrophysics Data System (ADS)
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2016-03-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the ;tape;. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.