Parallel processing considerations for image recognition tasks
NASA Astrophysics Data System (ADS)
Simske, Steven J.
2011-01-01
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
Parallel Algorithms for Image Analysis.
1982-06-01
8217 _ _ _ _ _ _ _ 4. TITLE (aid Subtitle) S. TYPE OF REPORT & PERIOD COVERED PARALLEL ALGORITHMS FOR IMAGE ANALYSIS TECHNICAL 6. PERFORMING O4G. REPORT NUMBER TR-1180...Continue on reverse side it neceesary aid Identlfy by block number) Image processing; image analysis ; parallel processing; cellular computers. 20... IMAGE ANALYSIS TECHNICAL 6. PERFORMING ONG. REPORT NUMBER TR-1180 - 7. AUTHOR(&) S. CONTRACT OR GRANT NUMBER(s) Azriel Rosenfeld AFOSR-77-3271 9
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.
1997-12-01
Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
NASA Astrophysics Data System (ADS)
Akil, Mohamed
2017-05-01
The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
Image Processing Using a Parallel Architecture.
1987-12-01
ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than
Zhu, Xiang; Zhang, Dianwen
2013-01-01
We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
NASA Astrophysics Data System (ADS)
Yarovyi, Andrii A.; Timchenko, Leonid I.; Kozhemiako, Volodymyr P.; Kokriatskaia, Nataliya I.; Hamdi, Rami R.; Savchuk, Tamara O.; Kulyk, Oleksandr O.; Surtel, Wojciech; Amirgaliyev, Yedilkhan; Kashaganova, Gulzhan
2017-08-01
The paper deals with a problem of insufficient productivity of existing computer means for large image processing, which do not meet modern requirements posed by resource-intensive computing tasks of laser beam profiling. The research concentrated on one of the profiling problems, namely, real-time processing of spot images of the laser beam profile. Development of a theory of parallel-hierarchic transformation allowed to produce models for high-performance parallel-hierarchical processes, as well as algorithms and software for their implementation based on the GPU-oriented architecture using GPGPU technologies. The analyzed performance of suggested computerized tools for processing and classification of laser beam profile images allows to perform real-time processing of dynamic images of various sizes.
Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.
2014-01-01
Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P
2014-07-01
Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
NASA Astrophysics Data System (ADS)
Enomoto, Ayano; Hirata, Hiroshi
2014-02-01
This article describes a feasibility study of parallel image-acquisition using a two-channel surface coil array in continuous-wave electron paramagnetic resonance (CW-EPR) imaging. Parallel EPR imaging was performed by multiplexing of EPR detection in the frequency domain. The parallel acquisition system consists of two surface coil resonators and radiofrequency (RF) bridges for EPR detection. To demonstrate the feasibility of this method of parallel image-acquisition with a surface coil array, three-dimensional EPR imaging was carried out using a tube phantom. Technical issues in the multiplexing method of EPR detection were also clarified. We found that degradation in the signal-to-noise ratio due to the interference of RF carriers is a key problem to be solved.
NASA Astrophysics Data System (ADS)
Timchenko, Leonid; Yarovyi, Andrii; Kokriatskaya, Nataliya; Nakonechna, Svitlana; Abramenko, Ludmila; Ławicki, Tomasz; Popiel, Piotr; Yesmakhanova, Laura
2016-09-01
The paper presents a method of parallel-hierarchical transformations for rapid recognition of dynamic images using GPU technology. Direct parallel-hierarchical transformations based on cluster CPU-and GPU-oriented hardware platform. Mathematic models of training of the parallel hierarchical (PH) network for the transformation are developed, as well as a training method of the PH network for recognition of dynamic images. This research is most topical for problems on organizing high-performance computations of super large arrays of information designed to implement multi-stage sensing and processing as well as compaction and recognition of data in the informational structures and computer devices. This method has such advantages as high performance through the use of recent advances in parallelization, possibility to work with images of ultra dimension, ease of scaling in case of changing the number of nodes in the cluster, auto scan of local network to detect compute nodes.
Li, Ye; Pang, Yong; Vigneron, Daniel; Glenn, Orit; Xu, Duan; Zhang, Xiaoliang
2011-01-01
Fetal MRI on 1.5T clinical scanner has been increasingly becoming a powerful imaging tool for studying fetal brain abnormalities in vivo. Due to limited availability of dedicated fetal phased arrays, commercial torso or cardiac phased arrays are routinely used for fetal scans, which are unable to provide optimized SNR and parallel imaging performance with a small number coil elements, and insufficient coverage and filling factor. This poses a demand for the investigation and development of dedicated and efficient radiofrequency (RF) hardware to improve fetal imaging. In this work, an investigational approach to simulate the performance of multichannel flexible phased arrays is proposed to find a better solution to fetal MR imaging. A 32 channel fetal array is presented to increase coil sensitivity, coverage and parallel imaging performance. The electromagnetic field distribution of each element of the fetal array is numerically simulated by using finite-difference time-domain (FDTD) method. The array performance, including B1 coverage, parallel reconstructed images and artifact power, is then theoretically calculated and compared with the torso array. Study results show that the proposed array is capable of increasing B1 field strength as well as sensitivity homogeneity in the entire area of uterus. This would ensure high quality imaging regardless of the location of the fetus in the uterus. In addition, the paralleling imaging performance of the proposed fetal array is validated by using artifact power comparison with torso array. These results demonstrate the feasibility of the 32 channel flexible array for fetal MR imaging at 1.5T. PMID:22408747
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-01-01
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-04-07
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.
Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F
2018-03-01
Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.
An L1-norm phase constraint for half-Fourier compressed sensing in 3D MR imaging.
Li, Guobin; Hennig, Jürgen; Raithel, Esther; Büchert, Martin; Paul, Dominik; Korvink, Jan G; Zaitsev, Maxim
2015-10-01
In most half-Fourier imaging methods, explicit phase replacement is used. In combination with parallel imaging, or compressed sensing, half-Fourier reconstruction is usually performed in a separate step. The purpose of this paper is to report that integration of half-Fourier reconstruction into iterative reconstruction minimizes reconstruction errors. The L1-norm phase constraint for half-Fourier imaging proposed in this work is compared with the L2-norm variant of the same algorithm, with several typical half-Fourier reconstruction methods. Half-Fourier imaging with the proposed phase constraint can be seamlessly combined with parallel imaging and compressed sensing to achieve high acceleration factors. In simulations and in in-vivo experiments half-Fourier imaging with the proposed L1-norm phase constraint enables superior performance both reconstruction of image details and with regard to robustness against phase estimation errors. The performance and feasibility of half-Fourier imaging with the proposed L1-norm phase constraint is reported. Its seamless combination with parallel imaging and compressed sensing enables use of greater acceleration in 3D MR imaging.
Parallel algorithm of real-time infrared image restoration based on total variation theory
NASA Astrophysics Data System (ADS)
Zhu, Ran; Li, Miao; Long, Yunli; Zeng, Yaoyuan; An, Wei
2015-10-01
Image restoration is a necessary preprocessing step for infrared remote sensing applications. Traditional methods allow us to remove the noise but penalize too much the gradients corresponding to edges. Image restoration techniques based on variational approaches can solve this over-smoothing problem for the merits of their well-defined mathematical modeling of the restore procedure. The total variation (TV) of infrared image is introduced as a L1 regularization term added to the objective energy functional. It converts the restoration process to an optimization problem of functional involving a fidelity term to the image data plus a regularization term. Infrared image restoration technology with TV-L1 model exploits the remote sensing data obtained sufficiently and preserves information at edges caused by clouds. Numerical implementation algorithm is presented in detail. Analysis indicates that the structure of this algorithm can be easily implemented in parallelization. Therefore a parallel implementation of the TV-L1 filter based on multicore architecture with shared memory is proposed for infrared real-time remote sensing systems. Massive computation of image data is performed in parallel by cooperating threads running simultaneously on multiple cores. Several groups of synthetic infrared image data are used to validate the feasibility and effectiveness of the proposed parallel algorithm. Quantitative analysis of measuring the restored image quality compared to input image is presented. Experiment results show that the TV-L1 filter can restore the varying background image reasonably, and that its performance can achieve the requirement of real-time image processing.
NASA Technical Reports Server (NTRS)
Tilton, James C.
1988-01-01
Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-06-01
We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Wiens, Curtis N.; Artz, Nathan S.; Jang, Hyungseok; McMillan, Alan B.; Reeder, Scott B.
2017-01-01
Purpose To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. Theory and Methods A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Results Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. Conclusion A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. PMID:27403613
Comparing an FPGA to a Cell for an Image Processing Application
NASA Astrophysics Data System (ADS)
Rakvic, Ryan N.; Ngo, Hau; Broussard, Randy P.; Ives, Robert W.
2010-12-01
Modern advancements in configurable hardware, most notably Field-Programmable Gate Arrays (FPGAs), have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high performance. In this research project, our aim is to study the differences in performance of a modern image processing algorithm on these two hardware platforms. In particular, Iris Recognition Systems have recently become an attractive identification method because of their extremely high accuracy. Iris matching, a repeatedly executed portion of a modern iris recognition algorithm, is parallelized on an FPGA system and a Cell processor. We demonstrate a 2.5 times speedup of the parallelized algorithm on the FPGA system when compared to a Cell processor-based version.
GPU-based parallel algorithm for blind image restoration using midfrequency-based methods
NASA Astrophysics Data System (ADS)
Xie, Lang; Luo, Yi-han; Bao, Qi-liang
2013-08-01
GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
A high-performance spatial database based approach for pathology imaging algorithm evaluation
Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.
2013-01-01
Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-01-01
We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Parallel imaging of knee cartilage at 3 Tesla.
Zuo, Jin; Li, Xiaojuan; Banerjee, Suchandrima; Han, Eric; Majumdar, Sharmila
2007-10-01
To evaluate the feasibility and reproducibility of quantitative cartilage imaging with parallel imaging at 3T and to determine the impact of the acceleration factor (AF) on morphological and relaxation measurements. An eight-channel phased-array knee coil was employed for conventional and parallel imaging on a 3T scanner. The imaging protocol consisted of a T2-weighted fast spin echo (FSE), a 3D-spoiled gradient echo (SPGR), a custom 3D-SPGR T1rho, and a 3D-SPGR T2 sequence. Parallel imaging was performed with an array spatial sensitivity technique (ASSET). The left knees of six healthy volunteers were scanned with both conventional and parallel imaging (AF = 2). Morphological parameters and relaxation maps from parallel imaging methods (AF = 2) showed comparable results with conventional method. The intraclass correlation coefficient (ICC) of the two methods for cartilage volume, mean cartilage thickness, T1rho, and T2 were 0.999, 0.977, 0.964, and 0.969, respectively, while demonstrating excellent reproducibility. No significant measurement differences were found when AF reached 3 despite the low signal-to-noise ratio (SNR). The study demonstrated that parallel imaging can be applied to current knee cartilage quantification at AF = 2 without degrading measurement accuracy with good reproducibility while effectively reducing scan time. Shorter imaging times can be achieved with higher AF at the cost of SNR. (c) 2007 Wiley-Liss, Inc.
Wiens, Curtis N; Artz, Nathan S; Jang, Hyungseok; McMillan, Alan B; Reeder, Scott B
2017-06-01
To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. Magn Reson Med 77:2303-2309, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
NASA Astrophysics Data System (ADS)
Vnukov, A. A.; Shershnev, M. B.
2018-01-01
The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
A novel parallel architecture for local histogram equalization
NASA Astrophysics Data System (ADS)
Ohannessian, Mesrob I.; Choueiter, Ghinwa F.; Diab, Hassan
2005-07-01
Local histogram equalization is an image enhancement algorithm that has found wide application in the pre-processing stage of areas such as computer vision, pattern recognition and medical imaging. The computationally intensive nature of the procedure, however, is a main limitation when real time interactive applications are in question. This work explores the possibility of performing parallel local histogram equalization, using an array of special purpose elementary processors, through an HDL implementation that targets FPGA or ASIC platforms. A novel parallelization scheme is presented and the corresponding architecture is derived. The algorithm is reduced to pixel-level operations. Processing elements are assigned image blocks, to maintain a reasonable performance-cost ratio. To further simplify both processor and memory organizations, a bit-serial access scheme is used. A brief performance assessment is provided to illustrate and quantify the merit of the approach.
Bae, Jun Woo; Kim, Hee Reyoung
2018-01-01
Anti-scattering grid has been used to improve the image quality. However, applying a commonly used linear or parallel grid would cause image distortion, and focusing grid also requires a precise fabrication technology, which is expensive. To investigate and analyze whether using CO2 laser micromachining-based PMMA anti-scattering grid can improve the performance of the grid at a lower cost. Thus, improvement of grid performance would result in improvement of image quality. The cross-sectional shape of CO2 laser machined PMMA is similar to alphabet 'V'. The performance was characterized by contrast improvement factor (CIF) and Bucky. Four types of grid were tested, which include thin parallel, thick parallel, 'V'-type and 'inverse V'-type of grid. For a Bucky factor of 2.1, the CIF of the grid with both the "V" and inverse "V" had a value of 1.53, while the thick and thick parallel types had values of 1.43 and 1.65, respectively. The 'V' shape grid manufacture by CO2 laser micromachining showed higher CIF than parallel one, which had same shielding material channel width. It was thought that the 'V' shape grid would be replacement to the conventional parallel grid if it is hard to fabricate the high-aspect-ratio grid.
Demi, Libertario; Ramalli, Alessandro; Giannini, Gabriele; Mischi, Massimo
2015-01-01
In classic pulse-echo ultrasound imaging, the data acquisition rate is limited by the speed of sound. To overcome this, parallel beamforming techniques in transmit (PBT) and in receive (PBR) mode have been proposed. In particular, PBT techniques, based on the transmission of focused beams, are more suitable for harmonic imaging because they are capable of generating stronger harmonics. Recently, orthogonal frequency division multiplexing (OFDM) has been investigated as a means to obtain parallel beamformed tissue harmonic images. To date, only numerical studies and experiments in water have been performed, hence neglecting the effect of frequencydependent absorption. Here we present the first in vitro and in vivo tissue harmonic images obtained with PBT by means of OFDM, and we compare the results with classic B-mode tissue harmonic imaging. The resulting contrast-to-noise ratio, here used as a performance metric, is comparable. A reduction by 2 dB is observed for the case in which three parallel lines are reconstructed. In conclusion, the applicability of this technique to ultrasonography as a means to improve the data acquisition rate is confirmed.
Parallel traveling-wave MRI: a feasibility study.
Pang, Yong; Vigneron, Daniel B; Zhang, Xiaoliang
2012-04-01
Traveling-wave magnetic resonance imaging utilizes far fields of a single-piece patch antenna in the magnet bore to generate radio frequency fields for imaging large-size samples, such as the human body. In this work, the feasibility of applying the "traveling-wave" technique to parallel imaging is studied using microstrip patch antenna arrays with both the numerical analysis and experimental tests. A specific patch array model is built and each array element is a microstrip patch antenna. Bench tests show that decoupling between two adjacent elements is better than -26-dB while matching of each element reaches -36-dB, demonstrating excellent isolation performance and impedance match capability. The sensitivity patterns are simulated and g-factors are calculated for both unloaded and loaded cases. The results on B 1- sensitivity patterns and g-factors demonstrate the feasibility of the traveling-wave parallel imaging. Simulations also suggest that different array configuration such as patch shape, position and orientation leads to different sensitivity patterns and g-factor maps, which provides a way to manipulate B(1) fields and improve the parallel imaging performance. The proposed method is also validated by using 7T MR imaging experiments. Copyright © 2011 Wiley-Liss, Inc.
Tao, Shengzhen; Trzasko, Joshua D; Shu, Yunhong; Weavers, Paul T; Huston, John; Gray, Erin M; Bernstein, Matt A
2016-06-01
To describe how integrated gradient nonlinearity (GNL) correction can be used within noniterative partial Fourier (homodyne) and parallel (SENSE and GRAPPA) MR image reconstruction strategies, and demonstrate that performing GNL correction during, rather than after, these routines mitigates the image blurring and resolution loss caused by postreconstruction image domain based GNL correction. Starting from partial Fourier and parallel magnetic resonance imaging signal models that explicitly account for GNL, noniterative image reconstruction strategies for each accelerated acquisition technique are derived under the same core mathematical assumptions as their standard counterparts. A series of phantom and in vivo experiments on retrospectively undersampled data were performed to investigate the spatial resolution benefit of integrated GNL correction over conventional postreconstruction correction. Phantom and in vivo results demonstrate that the integrated GNL correction reduces the image blurring introduced by the conventional GNL correction, while still correcting GNL-induced coarse-scale geometrical distortion. Images generated from undersampled data using the proposed integrated GNL strategies offer superior depiction of fine image detail, for example, phantom resolution inserts and anatomical tissue boundaries. Noniterative partial Fourier and parallel imaging reconstruction methods with integrated GNL correction reduce the resolution loss that occurs during conventional postreconstruction GNL correction while preserving the computational efficiency of standard reconstruction techniques. Magn Reson Med 75:2534-2544, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Performance enhancement of various real-time image processing techniques via speculative execution
NASA Astrophysics Data System (ADS)
Younis, Mohamed F.; Sinha, Purnendu; Marlowe, Thomas J.; Stoyenko, Alexander D.
1996-03-01
In real-time image processing, an application must satisfy a set of timing constraints while ensuring the semantic correctness of the system. Because of the natural structure of digital data, pure data and task parallelism have been used extensively in real-time image processing to accelerate the handling time of image data. These types of parallelism are based on splitting the execution load performed by a single processor across multiple nodes. However, execution of all parallel threads is mandatory for correctness of the algorithm. On the other hand, speculative execution is an optimistic execution of part(s) of the program based on assumptions on program control flow or variable values. Rollback may be required if the assumptions turn out to be invalid. Speculative execution can enhance average, and sometimes worst-case, execution time. In this paper, we target various image processing techniques to investigate applicability of speculative execution. We identify opportunities for safe and profitable speculative execution in image compression, edge detection, morphological filters, and blob recognition.
An embedded multi-core parallel model for real-time stereo imaging
NASA Astrophysics Data System (ADS)
He, Wenjing; Hu, Jian; Niu, Jingyu; Li, Chuanrong; Liu, Guangyu
2018-04-01
The real-time processing based on embedded system will enhance the application capability of stereo imaging for LiDAR and hyperspectral sensor. The task partitioning and scheduling strategies for embedded multiprocessor system starts relatively late, compared with that for PC computer. In this paper, aimed at embedded multi-core processing platform, a parallel model for stereo imaging is studied and verified. After analyzing the computing amount, throughout capacity and buffering requirements, a two-stage pipeline parallel model based on message transmission is established. This model can be applied to fast stereo imaging for airborne sensors with various characteristics. To demonstrate the feasibility and effectiveness of the parallel model, a parallel software was designed using test flight data, based on the 8-core DSP processor TMS320C6678. The results indicate that the design performed well in workload distribution and had a speed-up ratio up to 6.4.
Comparison of multihardware parallel implementations for a phase unwrapping algorithm
NASA Astrophysics Data System (ADS)
Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo
2018-04-01
Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.
Implementing An Image Understanding System Architecture Using Pipe
NASA Astrophysics Data System (ADS)
Luck, Randall L.
1988-03-01
This paper will describe PIPE and how it can be used to implement an image understanding system. Image understanding is the process of developing a description of an image in order to make decisions about its contents. The tasks of image understanding are generally split into low level vision and high level vision. Low level vision is performed by PIPE -a high performance parallel processor with an architecture specifically designed for processing video images at up to 60 fields per second. High level vision is performed by one of several types of serial or parallel computers - depending on the application. An additional processor called ISMAP performs the conversion from iconic image space to symbolic feature space. ISMAP plugs into one of PIPE's slots and is memory mapped into the high level processor. Thus it forms the high speed link between the low and high level vision processors. The mechanisms for bottom-up, data driven processing and top-down, model driven processing are discussed.
Demi, Libertario; Viti, Jacopo; Kusters, Lieneke; Guidi, Francesco; Tortoli, Piero; Mischi, Massimo
2013-11-01
The speed of sound in the human body limits the achievable data acquisition rate of pulsed ultrasound scanners. To overcome this limitation, parallel beamforming techniques are used in ultrasound 2-D and 3-D imaging systems. Different parallel beamforming approaches have been proposed. They may be grouped into two major categories: parallel beamforming in reception and parallel beamforming in transmission. The first category is not optimal for harmonic imaging; the second category may be more easily applied to harmonic imaging. However, inter-beam interference represents an issue. To overcome these shortcomings and exploit the benefit of combining harmonic imaging and high data acquisition rate, a new approach has been recently presented which relies on orthogonal frequency division multiplexing (OFDM) to perform parallel beamforming in transmission. In this paper, parallel transmit beamforming using OFDM is implemented for the first time on an ultrasound scanner. An advanced open platform for ultrasound research is used to investigate the axial resolution and interbeam interference achievable with parallel transmit beamforming using OFDM. Both fundamental and second-harmonic imaging modalities have been considered. Results show that, for fundamental imaging, axial resolution in the order of 2 mm can be achieved in combination with interbeam interference in the order of -30 dB. For second-harmonic imaging, axial resolution in the order of 1 mm can be achieved in combination with interbeam interference in the order of -35 dB.
Limited angle tomographic breast imaging: A comparison of parallel beam and pinhole collimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wessell, D.E.; Kadrmas, D.J.; Frey, E.C.
1996-12-31
Results from clinical trials have suggested no improvement in lesion detection with parallel hole SPECT scintimammography (SM) with Tc-99m over parallel hole planar SM. In this initial investigation, we have elucidated some of the unique requirements of SPECT SM. With these requirements in mind, we have begun to develop practical data acquisition and reconstruction strategies that can reduce image artifacts and improve image quality. In this paper we investigate limited angle orbits for both parallel hole and pinhole SPECT SM. Singular Value Decomposition (SVD) is used to analyze the artifacts associated with the limited angle orbits. Maximum likelihood expectation maximizationmore » (MLEM) reconstructions are then used to examine the effects of attenuation compensation on the quality of the reconstructed image. All simulations are performed using the 3D-MCAT breast phantom. The results of these simulation studies demonstrate that limited angle SPECT SM is feasible, that attenuation correction is needed for accurate reconstructions, and that pinhole SPECT SM may have an advantage over parallel hole SPECT SM in terms of improved image quality and reduced image artifacts.« less
NASA Astrophysics Data System (ADS)
Zhao, Feng; Frietman, Edward E. E.; Han, Zhong; Chen, Ray T.
1999-04-01
A characteristic feature of a conventional von Neumann computer is that computing power is delivered by a single processing unit. Although increasing the clock frequency improves the performance of the computer, the switching speed of the semiconductor devices and the finite speed at which electrical signals propagate along the bus set the boundaries. Architectures containing large numbers of nodes can solve this performance dilemma, with the comment that main obstacles in designing such systems are caused by difficulties to come up with solutions that guarantee efficient communications among the nodes. Exchanging data becomes really a bottleneck should al nodes be connected by a shared resource. Only optics, due to its inherent parallelism, could solve that bottleneck. Here, we explore a multi-faceted free space image distributor to be used in optical interconnects in massively parallel processing. In this paper, physical and optical models of the image distributor are focused on from diffraction theory of light wave to optical simulations. the general features and the performance of the image distributor are also described. The new structure of an image distributor and the simulations for it are discussed. From the digital simulation and experiment, it is found that the multi-faceted free space image distributing technique is quite suitable for free space optical interconnection in massively parallel processing and new structure of the multifaceted free space image distributor would perform better.
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Bammer, Roland; Hope, Thomas A.; Aksoy, Murat; Alley, Marcus T.
2012-01-01
Exact knowledge of blood flow characteristics in the major cerebral vessels is of great relevance for diagnosing cerebrovascular abnormalities. This involves the assessment of hemodynamically critical areas as well as the derivation of biomechanical parameters such as wall shear stress and pressure gradients. A time-resolved, 3D phase-contrast (PC) MRI method using parallel imaging was implemented to measure blood flow in three dimensions at multiple instances over the cardiac cycle. The 4D velocity data obtained from 14 healthy volunteers were used to investigate dynamic blood flow with the use of multiplanar reformatting, 3D streamlines, and 4D particle tracing. In addition, the effects of magnetic field strength, parallel imaging, and temporal resolution on the data were investigated in a comparative evaluation at 1.5T and 3T using three different parallel imaging reduction factors and three different temporal resolutions in eight of the 14 subjects. Studies were consistently performed faster at 3T than at 1.5T because of better parallel imaging performance. A high temporal resolution (65 ms) was required to follow dynamic processes in the intracranial vessels. The 4D flow measurements provided a high degree of vascular conspicuity. Time-resolved streamline analysis provided features that have not been reported previously for the intracranial vasculature. PMID:17195166
The use of parallel imaging for MRI assessment of knees in children and adolescents.
Doria, Andrea S; Chaudry, Gulraiz A; Nasui, Cristina; Rayner, Tammy; Wang, Chenghua; Moineddin, Rahim; Babyn, Paul S; White, Larry M; Sussman, Marshall S
2010-03-01
Parallel imaging provides faster scanning at the cost of reduced signal-to-noise ratio (SNR) and increased artifacts. To compare the diagnostic performance of two parallel MRI protocols (PPs) for assessment of pathologic knees using an 8-channel knee coil (reference standard, conventional protocol [CP]) and to characterize the SNR losses associated with parallel imaging. Two radiologists blindly interpreted 1.5 Tesla knee MRI images in 21 children (mean 13 years, range 9-18 years) with clinical indications for an MRI scan. Sagittal proton density, T2-W fat-saturated FSE, axial T2-W fat-saturated FSE, and coronal T1-W (NEX of 1,1,1) images were obtained with both CP and PP. Images were read for soft tissue and osteochondral findings. There was a 75% decrease in acquisition time using PP in comparison to CP. The CP and PP protocols fell within excellent or upper limits of substantial agreement: CP, kappa coefficient, 0.81 (95% CIs, 0.73-0.89); PP, 0.80-0.81 (0.73-0.89). The sensitivity of the two PPs was similar for assessment of soft (0.98-1.00) and osteochondral (0.89-0.94) tissues. Phantom data indicated an SNR of 1.67, 1.6, and 1.51 (axial, sagittal and coronal planes) between CP and PP scans. Parallel MRI provides a reliable assessment for pediatric knees in a significantly reduced scan time without affecting the diagnostic performance of MRI.
NASA Astrophysics Data System (ADS)
Yu, Zhongzhi; Liu, Shaocong; Sun, Shiyi; Kuang, Cuifang; Liu, Xu
2018-06-01
Parallel detection, which can use the additional information of a pinhole plane image taken at every excitation scan position, could be an efficient method to enhance the resolution of a confocal laser scanning microscope. In this paper, we discuss images obtained under different conditions and using different image restoration methods with parallel detection to quantitatively compare the imaging quality. The conditions include different noise levels and different detector array settings. The image restoration methods include linear deconvolution and pixel reassignment with Richard-Lucy deconvolution and with maximum-likelihood estimation deconvolution. The results show that the linear deconvolution share properties such as high-efficiency and the best performance under all different conditions, and is therefore expected to be of use for future biomedical routine research.
An Approach Using Parallel Architecture to Storage DICOM Images in Distributed File System
NASA Astrophysics Data System (ADS)
Soares, Tiago S.; Prado, Thiago C.; Dantas, M. A. R.; de Macedo, Douglas D. J.; Bauer, Michael A.
2012-02-01
Telemedicine is a very important area in medical field that is expanding daily motivated by many researchers interested in improving medical applications. In Brazil was started in 2005, in the State of Santa Catarina has a developed server called the CyclopsDCMServer, which the purpose to embrace the HDF for the manipulation of medical images (DICOM) using a distributed file system. Since then, many researches were initiated in order to seek better performance. Our approach for this server represents an additional parallel implementation in I/O operations since HDF version 5 has an essential feature for our work which supports parallel I/O, based upon the MPI paradigm. Early experiments using four parallel nodes, provide good performance when compare to the serial HDF implemented in the CyclopsDCMServer.
Self-calibrated correlation imaging with k-space variant correlation functions.
Li, Yu; Edalati, Masoud; Du, Xingfu; Wang, Hui; Cao, Jie J
2018-03-01
Correlation imaging is a previously developed high-speed MRI framework that converts parallel imaging reconstruction into the estimate of correlation functions. The presented work aims to demonstrate this framework can provide a speed gain over parallel imaging by estimating k-space variant correlation functions. Because of Fourier encoding with gradients, outer k-space data contain higher spatial-frequency image components arising primarily from tissue boundaries. As a result of tissue-boundary sparsity in the human anatomy, neighboring k-space data correlation varies from the central to the outer k-space. By estimating k-space variant correlation functions with an iterative self-calibration method, correlation imaging can benefit from neighboring k-space data correlation associated with both coil sensitivity encoding and tissue-boundary sparsity, thereby providing a speed gain over parallel imaging that relies only on coil sensitivity encoding. This new approach is investigated in brain imaging and free-breathing neonatal cardiac imaging. Correlation imaging performs better than existing parallel imaging techniques in simulated brain imaging acceleration experiments. The higher speed enables real-time data acquisition for neonatal cardiac imaging in which physiological motion is fast and non-periodic. With k-space variant correlation functions, correlation imaging gives a higher speed than parallel imaging and offers the potential to image physiological motion in real-time. Magn Reson Med 79:1483-1494, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.
2014-01-01
The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
Parallel Reconstruction Using Null Operations (PRUNO)
Zhang, Jian; Liu, Chunlei; Moseley, Michael E.
2011-01-01
A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Parallel design of JPEG-LS encoder on graphics processing units
NASA Astrophysics Data System (ADS)
Duan, Hao; Fang, Yong; Huang, Bormin
2012-01-01
With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.
Choi, Hyungsuk; Choi, Woohyuk; Quan, Tran Minh; Hildebrand, David G C; Pfister, Hanspeter; Jeong, Won-Ki
2014-12-01
As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and even experienced programmers often fail to fully leverage the computing power of new parallel architectures due to their steep learning curve and programming complexity. In this paper, we propose Vivaldi, a new domain-specific language for volume processing and visualization on distributed heterogeneous computing systems. Vivaldi's Python-like grammar and parallel processing abstractions provide flexible programming tools for non-experts to easily write high-performance parallel computing code. Vivaldi provides commonly used functions and numerical operators for customized visualization and high-throughput image processing applications. We demonstrate the performance and usability of Vivaldi on several examples ranging from volume rendering to image segmentation.
Yip, Hon Ming; Li, John C. S.; Cui, Xin; Gao, Qiannan; Leung, Chi Chiu
2014-01-01
As microfluidics has been applied extensively in many cell and biochemical applications, monitoring the related processes is an important requirement. In this work, we design and fabricate a high-throughput microfluidic device which contains 32 microchambers to perform automated parallel microfluidic operations and monitoring on an automated stage of a microscope. Images are captured at multiple spots on the device during the operations for monitoring samples in microchambers in parallel; yet the device positions may vary at different time points throughout operations as the device moves back and forth on a motorized microscopic stage. Here, we report an image-based positioning strategy to realign the chamber position before every recording of microscopic image. We fabricate alignment marks at defined locations next to the chambers in the microfluidic device as reference positions. We also develop image processing algorithms to recognize the chamber positions in real-time, followed by realigning the chambers to their preset positions in the captured images. We perform experiments to validate and characterize the device functionality and the automated realignment operation. Together, this microfluidic realignment strategy can be a platform technology to achieve precise positioning of multiple chambers for general microfluidic applications requiring long-term parallel monitoring of cell and biochemical activities. PMID:25133248
QR-decomposition based SENSE reconstruction using parallel architecture.
Ullah, Irfan; Nisar, Habab; Raza, Haseeb; Qasim, Malik; Inam, Omair; Omer, Hammad
2018-04-01
Magnetic Resonance Imaging (MRI) is a powerful medical imaging technique that provides essential clinical information about the human body. One major limitation of MRI is its long scan time. Implementation of advance MRI algorithms on a parallel architecture (to exploit inherent parallelism) has a great potential to reduce the scan time. Sensitivity Encoding (SENSE) is a Parallel Magnetic Resonance Imaging (pMRI) algorithm that utilizes receiver coil sensitivities to reconstruct MR images from the acquired under-sampled k-space data. At the heart of SENSE lies inversion of a rectangular encoding matrix. This work presents a novel implementation of GPU based SENSE algorithm, which employs QR decomposition for the inversion of the rectangular encoding matrix. For a fair comparison, the performance of the proposed GPU based SENSE reconstruction is evaluated against single and multicore CPU using openMP. Several experiments against various acceleration factors (AFs) are performed using multichannel (8, 12 and 30) phantom and in-vivo human head and cardiac datasets. Experimental results show that GPU significantly reduces the computation time of SENSE reconstruction as compared to multi-core CPU (approximately 12x speedup) and single-core CPU (approximately 53x speedup) without any degradation in the quality of the reconstructed images. Copyright © 2018 Elsevier Ltd. All rights reserved.
Robot-assisted ultrasound imaging: overview and development of a parallel telerobotic system.
Monfaredi, Reza; Wilson, Emmanuel; Azizi Koutenaei, Bamshad; Labrecque, Brendan; Leroy, Kristen; Goldie, James; Louis, Eric; Swerdlow, Daniel; Cleary, Kevin
2015-02-01
Ultrasound imaging is frequently used in medicine. The quality of ultrasound images is often dependent on the skill of the sonographer. Several researchers have proposed robotic systems to aid in ultrasound image acquisition. In this paper we first provide a short overview of robot-assisted ultrasound imaging (US). We categorize robot-assisted US imaging systems into three approaches: autonomous US imaging, teleoperated US imaging, and human-robot cooperation. For each approach several systems are introduced and briefly discussed. We then describe a compact six degree of freedom parallel mechanism telerobotic system for ultrasound imaging developed by our research team. The long-term goal of this work is to enable remote ultrasound scanning through teleoperation. This parallel mechanism allows for both translation and rotation of an ultrasound probe mounted on the top plate along with force control. Our experimental results confirmed good mechanical system performance with a positioning error of < 1 mm. Phantom experiments by a radiologist showed promising results with good image quality.
NASA Astrophysics Data System (ADS)
Newman, Gregory A.
2014-01-01
Many geoscientific applications exploit electrostatic and electromagnetic fields to interrogate and map subsurface electrical resistivity—an important geophysical attribute for characterizing mineral, energy, and water resources. In complex three-dimensional geologies, where many of these resources remain to be found, resistivity mapping requires large-scale modeling and imaging capabilities, as well as the ability to treat significant data volumes, which can easily overwhelm single-core and modest multicore computing hardware. To treat such problems requires large-scale parallel computational resources, necessary for reducing the time to solution to a time frame acceptable to the exploration process. The recognition that significant parallel computing processes must be brought to bear on these problems gives rise to choices that must be made in parallel computing hardware and software. In this review, some of these choices are presented, along with the resulting trade-offs. We also discuss future trends in high-performance computing and the anticipated impact on electromagnetic (EM) geophysics. Topics discussed in this review article include a survey of parallel computing platforms, graphics processing units to multicore CPUs with a fast interconnect, along with effective parallel solvers and associated solver libraries effective for inductive EM modeling and imaging.
The AIS-5000 parallel processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmitt, L.A.; Wilson, S.S.
1988-05-01
The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In thismore » paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.« less
Photonic content-addressable memory system that uses a parallel-readout optical disk
NASA Astrophysics Data System (ADS)
Krishnamoorthy, Ashok V.; Marchand, Philippe J.; Yayla, Gökçe; Esener, Sadik C.
1995-11-01
We describe a high-performance associative-memory system that can be implemented by means of an optical disk modified for parallel readout and a custom-designed silicon integrated circuit with parallel optical input. The system can achieve associative recall on 128 \\times 128 bit images and also on variable-size subimages. The system's behavior and performance are evaluated on the basis of experimental results on a motionless-head parallel-readout optical-disk system, logic simulations of the very-large-scale integrated chip, and a software emulation of the overall system.
Performance evaluation of canny edge detection on a tiled multicore architecture
NASA Astrophysics Data System (ADS)
Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald
2011-01-01
In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John
2016-01-01
Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Chang, Gregory; Friedrich, Klaus M; Wang, Ligong; Vieira, Renata L R; Schweitzer, Mark E; Recht, Michael P; Wiggins, Graham C; Regatte, Ravinder R
2010-03-01
To determine the feasibility of performing MRI of the wrist at 7 Tesla (T) with parallel imaging and to evaluate how acceleration factors (AF) affect signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), and image quality. This study had institutional review board approval. A four-transmit eight-receive channel array coil was constructed in-house. Nine healthy subjects were scanned on a 7T whole-body MR scanner. Coronal and axial images of cartilage and trabecular bone micro-architecture (3D-Fast Low Angle Shot (FLASH) with and without fat suppression, repetition time/echo time = 20 ms/4.5 ms, flip angle = 10 degrees , 0.169-0.195 x 0.169-0.195 mm, 0.5-1 mm slice thickness) were obtained with AF 1, 2, 3, 4. T1-weighted fast spin-echo (FSE), proton density-weighted FSE, and multiple-echo data image combination (MEDIC) sequences were also performed. SNR and CNR were measured. Three musculoskeletal radiologists rated image quality. Linear correlation analysis and paired t-tests were performed. At higher AF, SNR and CNR decreased linearly for cartilage, muscle, and trabecular bone (r < -0.98). At AF 4, reductions in SNR/CNR were:52%/60% (cartilage), 72%/63% (muscle), 45%/50% (trabecular bone). Radiologists scored images with AF 1 and 2 as near-excellent, AF 3 as good-to-excellent (P = 0.075), and AF 4 as average-to-good (P = 0.11). It is feasible to perform high resolution 7T MRI of the wrist with parallel imaging. SNR and CNR decrease with higher AF, but image quality remains above-average.
Modeling of Composite Scenes Using Wires, Plates and Dielectric Parallelized (WIPL-DP)
2006-06-01
formation and solves the data communications problem. The ability to perform subsurface imaging to depths of 200’ have already been demonstrated by...perform subsurface imaging to depths of 200’ have already been demonstrated by Brown in [3] and presented in Figure 3 above. Furthermore, reference [3...transmitter platform for use in image formation and solves the data communications problem. The ability to perform subsurface imaging to depths of 200
Vasan, S N Swetadri; Ionita, Ciprian N; Titus, A H; Cartwright, A N; Bednarek, D R; Rudin, S
2012-02-23
We present the image processing upgrades implemented on a Graphics Processing Unit (GPU) in the Control, Acquisition, Processing, and Image Display System (CAPIDS) for the custom Micro-Angiographic Fluoroscope (MAF) detector. Most of the image processing currently implemented in the CAPIDS system is pixel independent; that is, the operation on each pixel is the same and the operation on one does not depend upon the result from the operation on the other, allowing the entire image to be processed in parallel. GPU hardware was developed for this kind of massive parallel processing implementation. Thus for an algorithm which has a high amount of parallelism, a GPU implementation is much faster than a CPU implementation. The image processing algorithm upgrades implemented on the CAPIDS system include flat field correction, temporal filtering, image subtraction, roadmap mask generation and display window and leveling. A comparison between the previous and the upgraded version of CAPIDS has been presented, to demonstrate how the improvement is achieved. By performing the image processing on a GPU, significant improvements (with respect to timing or frame rate) have been achieved, including stable operation of the system at 30 fps during a fluoroscopy run, a DSA run, a roadmap procedure and automatic image windowing and leveling during each frame.
Hardware implementation of hierarchical volume subdivision-based elastic registration.
Dandekar, Omkar; Walimbe, Vivek; Shekhar, Raj
2006-01-01
Real-time, elastic and fully automated 3D image registration is critical to the efficiency and effectiveness of many image-guided diagnostic and treatment procedures relying on multimodality image fusion or serial image comparison. True, real-time performance will make many 3D image registration-based techniques clinically viable. Hierarchical volume subdivision-based image registration techniques are inherently faster than most elastic registration techniques, e.g. free-form deformation (FFD)-based techniques, and are more amenable for achieving real-time performance through hardware acceleration. Our group has previously reported an FPGA-based architecture for accelerating FFD-based image registration. In this article we show how our existing architecture can be adapted to support hierarchical volume subdivision-based image registration. A proof-of-concept implementation of the architecture achieved speedups of 100 for elastic registration against an optimized software implementation on a 3.2 GHz Pentium III Xeon workstation. Due to inherent parallel nature of the hierarchical volume subdivision-based image registration techniques further speedup can be achieved by using several computing modules in parallel.
Simultaneous fluoroscopic and nuclear imaging: impact of collimator choice on nuclear image quality.
van der Velden, Sandra; Beijst, Casper; Viergever, Max A; de Jong, Hugo W A M
2017-01-01
X-ray-guided oncological interventions could benefit from the availability of simultaneously acquired nuclear images during the procedure. To this end, a real-time, hybrid fluoroscopic and nuclear imaging device, consisting of an X-ray c-arm combined with gamma imaging capability, is currently being developed (Beijst C, Elschot M, Viergever MA, de Jong HW. Radiol. 2015;278:232-238). The setup comprises four gamma cameras placed adjacent to the X-ray tube. The four camera views are used to reconstruct an intermediate three-dimensional image, which is subsequently converted to a virtual nuclear projection image that overlaps with the X-ray image. The purpose of the present simulation study is to evaluate the impact of gamma camera collimator choice (parallel hole versus pinhole) on the quality of the virtual nuclear image. Simulation studies were performed with a digital image quality phantom including realistic noise and resolution effects, with a dynamic frame acquisition time of 1 s and a total activity of 150 MBq. Projections were simulated for 3, 5, and 7 mm pinholes and for three parallel hole collimators (low-energy all-purpose (LEAP), low-energy high-resolution (LEHR) and low-energy ultra-high-resolution (LEUHR)). Intermediate reconstruction was performed with maximum likelihood expectation-maximization (MLEM) with point spread function (PSF) modeling. In the virtual projection derived therefrom, contrast, noise level, and detectability were determined and compared with the ideal projection, that is, as if a gamma camera were located at the position of the X-ray detector. Furthermore, image deformations and spatial resolution were quantified. Additionally, simultaneous fluoroscopic and nuclear images of a sphere phantom were acquired with a physical prototype system and compared with the simulations. For small hot spots, contrast is comparable for all simulated collimators. Noise levels are, however, 3 to 8 times higher in pinhole geometries than in parallel hole geometries. This results in higher contrast-to-noise ratios for parallel hole geometries. Smaller spheres can thus be detected with parallel hole collimators than with pinhole collimators (17 mm vs 28 mm). Pinhole geometries show larger image deformations than parallel hole geometries. Spatial resolution varied between 1.25 cm for the 3 mm pinhole and 4 cm for the LEAP collimator. The simulation method was successfully validated by the experiments with the physical prototype. A real-time hybrid fluoroscopic and nuclear imaging device is currently being developed. Image quality of nuclear images obtained with different collimators was compared in terms of contrast, noise, and detectability. Parallel hole collimators showed lower noise and better detectability than pinhole collimators. © 2016 American Association of Physicists in Medicine.
Ribot, Emeline J.; Wecker, Didier; Trotier, Aurélien J.; Dallaudière, Benjamin; Lefrançois, William; Thiaudière, Eric; Franconi, Jean-Michel; Miraux, Sylvain
2015-01-01
Introduction The purpose of this paper is to develop an easy method to generate both fat signal and banding artifact free 3D balanced Steady State Free Precession (bSSFP) images at high magnetic field. Methods In order to suppress fat signal and bSSFP banding artifacts, two or four images were acquired with the excitation frequency of the water-selective binomial radiofrequency pulse set On Resonance or shifted by a maximum of 3/4TR. Mice and human volunteers were imaged at 7T and 3T, respectively to perform whole-body and musculoskeletal imaging. “Sum-Of-Square” reconstruction was performed and combined or not with parallel imaging. Results The frequency selectivity of 1-2-3-2-1 or 1-3-3-1 binomial pulses was preserved after (3/4TR) frequency shifting. Consequently, whole body small animal 3D imaging was performed at 7T and enabled visualization of small structures within adipose tissue like lymph nodes. In parallel, this method allowed 3D musculoskeletal imaging in humans with high spatial resolution at 3T. The combination with parallel imaging allowed the acquisition of knee images with ~500μm resolution images in less than 2min. In addition, ankles, full head coverage and legs of volunteers were imaged, demonstrating the possible application of the method also for large FOV. Conclusion In conclusion, this robust method can be applied in small animals and humans at high magnetic fields. The high SNR and tissue contrast obtained in short acquisition times allows to prescribe bSSFP sequence for several preclinical and clinical applications. PMID:26426849
Target recognition of ladar range images using even-order Zernike moments.
Liu, Zheng-Jun; Li, Qi; Xia, Zhi-Wei; Wang, Qi
2012-11-01
Ladar range images have attracted considerable attention in automatic target recognition fields. In this paper, Zernike moments (ZMs) are applied to classify the target of the range image from an arbitrary azimuth angle. However, ZMs suffer from high computational costs. To improve the performance of target recognition based on small samples, even-order ZMs with serial-parallel backpropagation neural networks (BPNNs) are applied to recognize the target of the range image. It is found that the rotation invariance and classified performance of the even-order ZMs are both better than for odd-order moments and for moments compressed by principal component analysis. The experimental results demonstrate that combining the even-order ZMs with serial-parallel BPNNs can significantly improve the recognition rate for small samples.
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores
NASA Astrophysics Data System (ADS)
Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei
We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
Parallel computing in experimental mechanics and optical measurement: A review (II)
NASA Astrophysics Data System (ADS)
Wang, Tianyi; Kemao, Qian
2018-05-01
With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
performance on a low cost, low size, weight, and power (SWAP) computer : a Raspberry Pi Model B. For a comparison of performance, a baseline implementation...improvement factor of 2-3 compared to filtered backprojection. Execution on a single Raspberry Pi is too slow for real-time imaging. However, factorized...backprojection is easily parallelized, and we include a discussion of parallel implementation across multiple Pis .
Parallel evolution of image processing tools for multispectral imagery
NASA Astrophysics Data System (ADS)
Harvey, Neal R.; Brumby, Steven P.; Perkins, Simon J.; Porter, Reid B.; Theiler, James P.; Young, Aaron C.; Szymanski, John J.; Bloch, Jeffrey J.
2000-11-01
We describe the implementation and performance of a parallel, hybrid evolutionary-algorithm-based system, which optimizes image processing tools for feature-finding tasks in multi-spectral imagery (MSI) data sets. Our system uses an integrated spatio-spectral approach and is capable of combining suitably-registered data from different sensors. We investigate the speed-up obtained by parallelization of the evolutionary process via multiple processors (a workstation cluster) and develop a model for prediction of run-times for different numbers of processors. We demonstrate our system on Landsat Thematic Mapper MSI , covering the recent Cerro Grande fire at Los Alamos, NM, USA.
Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns.
Pan, Shaoming; Li, Yongkai; Xu, Zhengquan; Chong, Yanwen
2015-01-01
Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10-15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance.
Parallel transformation of K-SVD solar image denoising algorithm
NASA Astrophysics Data System (ADS)
Liang, Youwen; Tian, Yu; Li, Mei
2017-02-01
The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho
2014-01-01
The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho
2014-11-01
The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.
High Performance Input/Output for Parallel Computer Systems
NASA Technical Reports Server (NTRS)
Ligon, W. B.
1996-01-01
The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.
Parallel image reconstruction for 3D positron emission tomography from incomplete 2D projection data
NASA Astrophysics Data System (ADS)
Guerrero, Thomas M.; Ricci, Anthony R.; Dahlbom, Magnus; Cherry, Simon R.; Hoffman, Edward T.
1993-07-01
The problem of excessive computational time in 3D Positron Emission Tomography (3D PET) reconstruction is defined, and we present an approach for solving this problem through the construction of an inexpensive parallel processing system and the adoption of the FAVOR algorithm. Currently, the 3D reconstruction of the 610 images of a total body procedure would require 80 hours and the 3D reconstruction of the 620 images of a dynamic study would require 110 hours. An inexpensive parallel processing system for 3D PET reconstruction is constructed from the integration of board level products from multiple vendors. The system achieves its computational performance through the use of 6U VME four i860 processor boards, the processor boards from five manufacturers are discussed from our perspective. The new 3D PET reconstruction algorithm FAVOR, FAst VOlume Reconstructor, that promises a substantial speed improvement is adopted. Preliminary results from parallelizing FAVOR are utilized in formulating architectural improvements for this problem. In summary, we are addressing the problem of excessive computational time in 3D PET image reconstruction, through the construction of an inexpensive parallel processing system and the parallelization of a 3D reconstruction algorithm that uses the incomplete data set that is produced by current PET systems.
High speed parallel spectral-domain OCT using spectrally encoded line-field illumination
NASA Astrophysics Data System (ADS)
Lee, Kye-Sung; Hur, Hwan; Bae, Ji Yong; Kim, I. Jong; Kim, Dong Uk; Nam, Ki-Hwan; Kim, Geon-Hee; Chang, Ki Soo
2018-01-01
We report parallel spectral-domain optical coherence tomography (OCT) at 500 000 A-scan/s. This is the highest-speed spectral-domain (SD) OCT system using a single line camera. Spectrally encoded line-field scanning is proposed to increase the imaging speed in SD-OCT effectively, and the tradeoff between speed, depth range, and sensitivity is demonstrated. We show that three imaging modes of 125k, 250k, and 500k A-scan/s can be simply switched according to the sample to be imaged considering the depth range and sensitivity. To demonstrate the biological imaging performance of the high-speed imaging modes of the spectrally encoded line-field OCT system, human skin and a whole leaf were imaged at the speed of 250k and 500k A-scan/s, respectively. In addition, there is no sensitivity dependence in the B-scan direction, which is implicit in line-field parallel OCT using line focusing of a Gaussian beam with a cylindrical lens.
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Bowen, Jason D; Huang, Qiu; Ellin, Justin R; Lee, Tzu-Cheng; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho
2013-10-21
Single photon emission computed tomography (SPECT) myocardial perfusion imaging remains a critical tool in the diagnosis of coronary artery disease. However, after more than three decades of use, photon detection efficiency remains poor and unchanged. This is due to the continued reliance on parallel-hole collimators first introduced in 1964. These collimators possess poor geometric efficiency. Here we present the performance evaluation results of a newly designed multipinhole collimator with 20 pinhole apertures (PH20) for commercial SPECT systems. Computer simulations and numerical observer studies were used to assess the noise, bias and diagnostic imaging performance of a PH20 collimator in comparison with those of a low energy high resolution (LEHR) parallel-hole collimator. Ray-driven projector/backprojector pairs were used to model SPECT imaging acquisitions, including simulation of noiseless projection data and performing MLEM/OSEM image reconstructions. Poisson noise was added to noiseless projections for realistic projection data. Noise and bias performance were investigated for five mathematical cardiac and torso (MCAT) phantom anatomies imaged at two gantry orbit positions (19.5 and 25.0 cm). PH20 and LEHR images were reconstructed with 300 MLEM iterations and 30 OSEM iterations (ten subsets), respectively. Diagnostic imaging performance was assessed by a receiver operating characteristic (ROC) analysis performed on a single MCAT phantom; however, in this case PH20 images were reconstructed with 75 pixel-based OSEM iterations (four subsets). Four PH20 projection views from two positions of a dual-head camera acquisition and 60 LEHR projections were simulated for all studies. At uniformly-imposed resolution of 12.5 mm, significant improvements in SNR and diagnostic sensitivity (represented by the area under the ROC curve, or AUC) were realized when PH20 collimators are substituted for LEHR parallel-hole collimators. SNR improves by factors of 1.94-2.34 for the five patient anatomies and two orbital positions studied. For the ROC analysis the PH20 AUC is larger than the LEHR AUC with a p-value of 0.0067. Bias performance, however, decreases with the use of PH20 collimators. Systematic analyses showed PH20 collimators present improved diagnostic imaging performance over LEHR collimators, requiring only collimator exchange on existing SPECT cameras for their use.
NASA Astrophysics Data System (ADS)
Bowen, Jason D.; Huang, Qiu; Ellin, Justin R.; Lee, Tzu-Cheng; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho
2013-10-01
Single photon emission computed tomography (SPECT) myocardial perfusion imaging remains a critical tool in the diagnosis of coronary artery disease. However, after more than three decades of use, photon detection efficiency remains poor and unchanged. This is due to the continued reliance on parallel-hole collimators first introduced in 1964. These collimators possess poor geometric efficiency. Here we present the performance evaluation results of a newly designed multipinhole collimator with 20 pinhole apertures (PH20) for commercial SPECT systems. Computer simulations and numerical observer studies were used to assess the noise, bias and diagnostic imaging performance of a PH20 collimator in comparison with those of a low energy high resolution (LEHR) parallel-hole collimator. Ray-driven projector/backprojector pairs were used to model SPECT imaging acquisitions, including simulation of noiseless projection data and performing MLEM/OSEM image reconstructions. Poisson noise was added to noiseless projections for realistic projection data. Noise and bias performance were investigated for five mathematical cardiac and torso (MCAT) phantom anatomies imaged at two gantry orbit positions (19.5 and 25.0 cm). PH20 and LEHR images were reconstructed with 300 MLEM iterations and 30 OSEM iterations (ten subsets), respectively. Diagnostic imaging performance was assessed by a receiver operating characteristic (ROC) analysis performed on a single MCAT phantom; however, in this case PH20 images were reconstructed with 75 pixel-based OSEM iterations (four subsets). Four PH20 projection views from two positions of a dual-head camera acquisition and 60 LEHR projections were simulated for all studies. At uniformly-imposed resolution of 12.5 mm, significant improvements in SNR and diagnostic sensitivity (represented by the area under the ROC curve, or AUC) were realized when PH20 collimators are substituted for LEHR parallel-hole collimators. SNR improves by factors of 1.94-2.34 for the five patient anatomies and two orbital positions studied. For the ROC analysis the PH20 AUC is larger than the LEHR AUC with a p-value of 0.0067. Bias performance, however, decreases with the use of PH20 collimators. Systematic analyses showed PH20 collimators present improved diagnostic imaging performance over LEHR collimators, requiring only collimator exchange on existing SPECT cameras for their use.
Demi, Libertario; Verweij, Martin D; Van Dongen, Koen W A
2012-11-01
Real-time 2-D or 3-D ultrasound imaging systems are currently used for medical diagnosis. To achieve the required data acquisition rate, these systems rely on parallel beamforming, i.e., a single wide-angled beam is used for transmission and several narrow parallel beams are used for reception. When applied to harmonic imaging, the demand for high-amplitude pressure wave fields, necessary to generate the harmonic components, conflicts with the use of a wide-angled beam in transmission because this results in a large spatial decay of the acoustic pressure. To enhance the amplitude of the harmonics, it is preferable to do the reverse: transmit several narrow parallel beams and use a wide-angled beam in reception. Here, this concept is investigated to determine whether it can be used for harmonic imaging. The method proposed in this paper relies on orthogonal frequency division multiplexing (OFDM), which is used to create distinctive parallel beams in transmission. To test the proposed method, a numerical study has been performed, in which the transmit, receive, and combined beam profiles generated by a linear array have been simulated for the second-harmonic component. Compared with standard parallel beamforming, application of the proposed technique results in a gain of 12 dB for the main beam and in a reduction of the side lobes. Experimental verification in water has also been performed. Measurements obtained with a single-element emitting transducer and a hydrophone receiver confirm the possibility of exciting a practical ultrasound transducer with multiple Gaussian modulated pulses, each having a different center frequency, and the capability to generate distinguishable second-harmonic components.
F3D Image Processing and Analysis for Many - and Multi-core Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
F3D is written in OpenCL, so it achieve[sic] platform-portable parallelism on modern mutli-core CPUs and many-core GPUs. The interface and mechanims to access F3D core are written in Java as a plugin for Fiji/ImageJ to deliver several key image-processing algorithms necessary to remove artifacts from micro-tomography data. The algorithms consist of data parallel aware filters that can efficiently utilizes[sic] resources and can work on out of core datasets and scale efficiently across multiple accelerators. Optimizing for data parallel filters, streaming out of core datasets, and efficient resource and memory and data managements over complex execution sequence of filters greatly expeditesmore » any scientific workflow with image processing requirements. F3D performs several different types of 3D image processing operations, such as non-linear filtering using bilateral filtering and/or median filtering and/or morphological operators (MM). F3D gray-level MM operators are one-pass constant time methods that can perform morphological transformations with a line-structuring element oriented in discrete directions. Additionally, MM operators can be applied to gray-scale images, and consist of two parts: (a) a reference shape or structuring element, which is translated over the image, and (b) a mechanism, or operation, that defines the comparisons to be performed between the image and the structuring element. This tool provides a critical component within many complex pipelines such as those for performing automated segmentation of image stacks. F3D is also called a "descendent" of Quant-CT, another software we developed in the past. These two modules are to be integrated in a next version. Further details were reported in: D.M. Ushizima, T. Perciano, H. Krishnan, B. Loring, H. Bale, D. Parkinson, and J. Sethian. Structure recognition from high-resolution images of ceramic composites. IEEE International Conference on Big Data, October 2014.« less
Otazo, Ricardo; Lin, Fa-Hsuan; Wiggins, Graham; Jordan, Ramiro; Sodickson, Daniel; Posse, Stefan
2009-01-01
Standard parallel magnetic resonance imaging (MRI) techniques suffer from residual aliasing artifacts when the coil sensitivities vary within the image voxel. In this work, a parallel MRI approach known as Superresolution SENSE (SURE-SENSE) is presented in which acceleration is performed by acquiring only the central region of k-space instead of increasing the sampling distance over the complete k-space matrix and reconstruction is explicitly based on intra-voxel coil sensitivity variation. In SURE-SENSE, parallel MRI reconstruction is formulated as a superresolution imaging problem where a collection of low resolution images acquired with multiple receiver coils are combined into a single image with higher spatial resolution using coil sensitivities acquired with high spatial resolution. The effective acceleration of conventional gradient encoding is given by the gain in spatial resolution, which is dictated by the degree of variation of the different coil sensitivity profiles within the low resolution image voxel. Since SURE-SENSE is an ill-posed inverse problem, Tikhonov regularization is employed to control noise amplification. Unlike standard SENSE, for which acceleration is constrained to the phase-encoding dimension/s, SURE-SENSE allows acceleration along all encoding directions — for example, two-dimensional acceleration of a 2D echo-planar acquisition. SURE-SENSE is particularly suitable for low spatial resolution imaging modalities such as spectroscopic imaging and functional imaging with high temporal resolution. Application to echo-planar functional and spectroscopic imaging in human brain is presented using two-dimensional acceleration with a 32-channel receiver coil. PMID:19341804
UWGSP4: an imaging and graphics superworkstation and its medical applications
NASA Astrophysics Data System (ADS)
Jong, Jing-Ming; Park, Hyun Wook; Eo, Kilsu; Kim, Min-Hwan; Zhang, Peng; Kim, Yongmin
1992-05-01
UWGSP4 is configured with a parallel architecture for image processing and a pipelined architecture for computer graphics. The system's peak performance is 1,280 MFLOPS for image processing and over 200,000 Gouraud shaded 3-D polygons per second for graphics. The simulated sustained performance is about 50% of the peak performance in general image processing. Most of the 2-D image processing functions are efficiently vectorized and parallelized in UWGSP4. A performance of 770 MFLOPS in convolution and 440 MFLOPS in FFT is achieved. The real-time cine display, up to 32 frames of 1280 X 1024 pixels per second, is supported. In 3-D imaging, the update rate for the surface rendering is 10 frames of 20,000 polygons per second; the update rate for the volume rendering is 6 frames of 128 X 128 X 128 voxels per second. The system provides 1280 X 1024 X 32-bit double frame buffers and one 1280 X 1024 X 8-bit overlay buffer for supporting realistic animation, 24-bit true color, and text annotation. A 1280 X 1024- pixel, 66-Hz noninterlaced display screen with 1:1 aspect ratio can be windowed into the frame buffer for the display of any portion of the processed image or graphics.
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing
NASA Astrophysics Data System (ADS)
Johl, John T.; Baker, Nick C.
1988-10-01
The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
GPU computing in medical physics: a review.
Pratx, Guillem; Xing, Lei
2011-05-01
The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.
Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease
Shamonin, Denis P.; Bron, Esther E.; Lelieveldt, Boudewijn P. F.; Smits, Marion; Klein, Stefan; Staring, Marius
2013-01-01
Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: (i) parallelization on the CPU, to speed up the cost function derivative calculation; (ii) parallelization on the GPU building on and extending the OpenCL framework from ITKv4, to speed up the Gaussian pyramid computation and the image resampling step; (iii) exploitation of certain properties of the B-spline transformation model; (iv) further software optimizations. The accelerated registration tool is employed in a study on diagnostic classification of Alzheimer's disease and cognitively normal controls based on T1-weighted MRI. We selected 299 participants from the publicly available Alzheimer's Disease Neuroimaging Initiative database. Classification is performed with a support vector machine based on gray matter volumes as a marker for atrophy. We evaluated two types of strategies (voxel-wise and region-wise) that heavily rely on nonrigid image registration. Parallelization and optimization resulted in an acceleration factor of 4–5x on an 8-core machine. Using OpenCL a speedup factor of 2 was realized for computation of the Gaussian pyramids, and 15–60 for the resampling step, for larger images. The voxel-wise and the region-wise classification methods had an area under the receiver operator characteristic curve of 88 and 90%, respectively, both for standard and accelerated registration. We conclude that the image registration package elastix was substantially accelerated, with nearly identical results to the non-optimized version. The new functionality will become available in the next release of elastix as open source under the BSD license. PMID:24474917
Near Real-Time Image Reconstruction
NASA Astrophysics Data System (ADS)
Denker, C.; Yang, G.; Wang, H.
2001-08-01
In recent years, post-facto image-processing algorithms have been developed to achieve diffraction-limited observations of the solar surface. We present a combination of frame selection, speckle-masking imaging, and parallel computing which provides real-time, diffraction-limited, 256×256 pixel images at a 1-minute cadence. Our approach to achieve diffraction limited observations is complementary to adaptive optics (AO). At the moment, AO is limited by the fact that it corrects wavefront abberations only for a field of view comparable to the isoplanatic patch. This limitation does not apply to speckle-masking imaging. However, speckle-masking imaging relies on short-exposure images which limits its spectroscopic applications. The parallel processing of the data is performed on a Beowulf-class computer which utilizes off-the-shelf, mass-market technologies to provide high computational performance for scientific calculations and applications at low cost. Beowulf computers have a great potential, not only for image reconstruction, but for any kind of complex data reduction. Immediate access to high-level data products and direct visualization of dynamic processes on the Sun are two of the advantages to be gained.
NASA Astrophysics Data System (ADS)
Yu, A. R.; Park, S.-J.; Choi, Y. Y.; Kim, K. M.; Kim, H.-J.
2015-09-01
Triumph X-SPECT is a newly released CZT-based preclinical small-animal SPECT system with interchangeable collimators. The purpose of this work was to evaluate and systematically compare the imaging performances of three different collimators in the CZT-based preclinical small-animal system: a single-pinhole collimator (SPH), a multi-pinhole collimator (MPH) and a parallel-hole collimator. We measured the spatial resolutions and sensitivities of the three collimators with 99mTc sources, considering three distinct energy window widths (5, 10, and 20%), and used the NEMA NU4-2008 Image Quality phantom to test the imaging performance of the three collimators in terms of uniformity and spill-over ratio (SOR) for each energy window. With a 10% energy window width at a radius of rotation (ROR) of 30 mm, the system resolution of the SPH, MPH and parallel-hole collimators was 0.715, 0.855 and 3.270 mm FWHM, respectively. For the same energy window, the sensitivity of the system with SPH, MPH and parallel-hole collimators was 32.860, 152.514 and 49.205 counts/sec/MBq at a 100 mm source-to-detector distance and 6.790, 33.376 and 49.038 counts/sec/MBq at a 130 mm source-to-detector distance, respectively. The image noise and SORair for the three collimators were 20.137, 12.278 and 11.232 (%STDunif) and 0.106, 0.140 and 0.161, respectively. Overall, the results show that the SPH had better spatial resolution than the other collimators. The MPH had the highest sensitivity at 100 mm source-to-collimator distance, and the parallel-hole collimator had the highest sensitivity at 130 mm-source-to-detector distance. Therefore, the proper collimator for Triumph X-SPECT system must be determined by the task. These results provide valuable reference data and insight into the imaging performance of various collimators in CZT-based preclinical small-animal SPECT.
A Parallel Point Matching Algorithm for Landmark Based Image Registration Using Multicore Platform
Yang, Lin; Gong, Leiguang; Zhang, Hong; Nosher, John L.; Foran, David J.
2013-01-01
Point matching is crucial for many computer vision applications. Establishing the correspondence between a large number of data points is a computationally intensive process. Some point matching related applications, such as medical image registration, require real time or near real time performance if applied to critical clinical applications like image assisted surgery. In this paper, we report a new multicore platform based parallel algorithm for fast point matching in the context of landmark based medical image registration. We introduced a non-regular data partition algorithm which utilizes the K-means clustering algorithm to group the landmarks based on the number of available processing cores, which optimize the memory usage and data transfer. We have tested our method using the IBM Cell Broadband Engine (Cell/B.E.) platform. The results demonstrated a significant speed up over its sequential implementation. The proposed data partition and parallelization algorithm, though tested only on one multicore platform, is generic by its design. Therefore the parallel algorithm can be extended to other computing platforms, as well as other point matching related applications. PMID:24308014
Digital image processing using parallel computing based on CUDA technology
NASA Astrophysics Data System (ADS)
Skirnevskiy, I. P.; Pustovit, A. V.; Abdrashitova, M. O.
2017-01-01
This article describes expediency of using a graphics processing unit (GPU) in big data processing in the context of digital images processing. It provides a short description of a parallel computing technology and its usage in different areas, definition of the image noise and a brief overview of some noise removal algorithms. It also describes some basic requirements that should be met by certain noise removal algorithm in the projection to computer tomography. It provides comparison of the performance with and without using GPU as well as with different percentage of using CPU and GPU.
Parallel image registration with a thin client interface
NASA Astrophysics Data System (ADS)
Saiprasad, Ganesh; Lo, Yi-Jung; Plishker, William; Lei, Peng; Ahmad, Tabassum; Shekhar, Raj
2010-03-01
Despite its high significance, the clinical utilization of image registration remains limited because of its lengthy execution time and a lack of easy access. The focus of this work was twofold. First, we accelerated our course-to-fine, volume subdivision-based image registration algorithm by a novel parallel implementation that maintains the accuracy of our uniprocessor implementation. Second, we developed a thin-client computing model with a user-friendly interface to perform rigid and nonrigid image registration. Our novel parallel computing model uses the message passing interface model on a 32-core cluster. The results show that, compared with the uniprocessor implementation, the parallel implementation of our image registration algorithm is approximately 5 times faster for rigid image registration and approximately 9 times faster for nonrigid registration for the images used. To test the viability of such systems for clinical use, we developed a thin client in the form of a plug-in in OsiriX, a well-known open source PACS workstation and DICOM viewer, and used it for two applications. The first application registered the baseline and follow-up MR brain images, whose subtraction was used to track progression of multiple sclerosis. The second application registered pretreatment PET and intratreatment CT of radiofrequency ablation patients to demonstrate a new capability of multimodality imaging guidance. The registration acceleration coupled with the remote implementation using a thin client should ultimately increase accuracy, speed, and access of image registration-based interpretations in a number of diagnostic and interventional applications.
Interleaved EPI diffusion imaging using SPIRiT-based reconstruction with virtual coil compression.
Dong, Zijing; Wang, Fuyixue; Ma, Xiaodong; Zhang, Zhe; Dai, Erpeng; Yuan, Chun; Guo, Hua
2018-03-01
To develop a novel diffusion imaging reconstruction framework based on iterative self-consistent parallel imaging reconstruction (SPIRiT) for multishot interleaved echo planar imaging (iEPI), with computation acceleration by virtual coil compression. As a general approach for autocalibrating parallel imaging, SPIRiT improves the performance of traditional generalized autocalibrating partially parallel acquisitions (GRAPPA) methods in that the formulation with self-consistency is better conditioned, suggesting SPIRiT to be a better candidate in k-space-based reconstruction. In this study, a general SPIRiT framework is adopted to incorporate both coil sensitivity and phase variation information as virtual coils and then is applied to 2D navigated iEPI diffusion imaging. To reduce the reconstruction time when using a large number of coils and shots, a novel shot-coil compression method is proposed for computation acceleration in Cartesian sampling. Simulations and in vivo experiments were conducted to evaluate the performance of the proposed method. Compared with the conventional coil compression, the shot-coil compression achieved higher compression rates with reduced errors. The simulation and in vivo experiments demonstrate that the SPIRiT-based reconstruction outperformed the existing method, realigned GRAPPA, and provided superior images with reduced artifacts. The SPIRiT-based reconstruction with virtual coil compression is a reliable method for high-resolution iEPI diffusion imaging. Magn Reson Med 79:1525-1531, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.
Halic, Tansel; Ahn, Woojin; De, Suvranu
2014-01-01
This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
Algorithms and programming tools for image processing on the MPP:3
NASA Technical Reports Server (NTRS)
Reeves, Anthony P.
1987-01-01
This is the third and final report on the work done for NASA Grant 5-403 on Algorithms and Programming Tools for Image Processing on the MPP:3. All the work done for this grant is summarized in the introduction. Work done since August 1986 is reported in detail. Research for this grant falls under the following headings: (1) fundamental algorithms for the MPP; (2) programming utilities for the MPP; (3) the Parallel Pascal Development System; and (4) performance analysis. In this report, the results of two efforts are reported: region growing, and performance analysis of important characteristic algorithms. In each case, timing results from MPP implementations are included. A paper is included in which parallel algorithms for region growing on the MPP is discussed. These algorithms permit different sized regions to be merged in parallel. Details on the implementation and peformance of several important MPP algorithms are given. These include a number of standard permutations, the FFT, convolution, arbitrary data mappings, image warping, and pyramid operations, all of which have been implemented on the MPP. The permutation and image warping functions have been included in the standard development system library.
Hielscher, Andreas H; Bartel, Sebastian
2004-02-01
Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
High-Performance 3D Image Processing Architectures for Image-Guided Interventions
2008-01-01
Parallel architectures and algorithms for image understanding. Boston: Academic Press, 1991. [99] A. Bruhn, T. Jakob, M. Fischer, T. Kohlberger , J...Symposium on Pattern Recognition, vol. 2449(pp. 290-297, 2002. [100] A. Bruhn, T. Jakob, M. Fischer, T. Kohlberger , J. Weickert, U. Bruning, and C
NASA Technical Reports Server (NTRS)
Athale, R. A.; Lee, S. H.
1978-01-01
The paper describes the fabrication and operation of an optical parallel logic (OPAL) device which performs Boolean algebraic operations on binary images. Several logic operations on two input binary images were demonstrated using an 8 x 8 device with a CdS photoconductor and a twisted nematic liquid crystal. Two such OPAL devices can be interconnected to form a half-adder circuit which is one of the essential components of a CPU in a digital signal processor.
Two improved coherent optical feedback systems for optical information processing
NASA Technical Reports Server (NTRS)
Lee, S. H.; Bartholomew, B.; Cederquist, J.
1976-01-01
Coherent optical feedback systems are Fabry-Perot interferometers modified to perform optical information processing. Two new systems based on plane parallel and confocal Fabry-Perot interferometers are introduced. The plane parallel system can be used for contrast control, intensity level selection, and image thresholding. The confocal system can be used for image restoration and solving partial differential equations. These devices are simpler and less expensive than previous systems. Experimental results are presented to demonstrate their potential for optical information processing.
Distributed memory parallel Markov random fields using graph partitioning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heinemann, C.; Perciano, T.; Ushizima, D.
Markov random fields (MRF) based algorithms have attracted a large amount of interest in image analysis due to their ability to exploit contextual information about data. Image data generated by experimental facilities, though, continues to grow larger and more complex, making it more difficult to analyze in a reasonable amount of time. Applying image processing algorithms to large datasets requires alternative approaches to circumvent performance problems. Aiming to provide scientists with a new tool to recover valuable information from such datasets, we developed a general purpose distributed memory parallel MRF-based image analysis framework (MPI-PMRF). MPI-PMRF overcomes performance and memory limitationsmore » by distributing data and computations across processors. The proposed approach was successfully tested with synthetic and experimental datasets. Additionally, the performance of the MPI-PMRF framework is analyzed through a detailed scalability study. We show that a performance increase is obtained while maintaining an accuracy of the segmentation results higher than 98%. The contributions of this paper are: (a) development of a distributed memory MRF framework; (b) measurement of the performance increase of the proposed approach; (c) verification of segmentation accuracy in both synthetic and experimental, real-world datasets« less
Parallel Processing of Images in Mobile Devices using BOINC
NASA Astrophysics Data System (ADS)
Curiel, Mariela; Calle, David F.; Santamaría, Alfredo S.; Suarez, David F.; Flórez, Leonardo
2018-04-01
Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a) the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b) the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.
Hybrid Parallel-Slant Hole Collimators for SPECT Imaging
NASA Astrophysics Data System (ADS)
Bai, Chuanyong; Shao, Ling; Ye, Jinghan; Durbin, M.; Petrillo, M.
2004-06-01
We propose a new collimator geometry, the hybrid parallel-slant (HPS) hole geometry, to improve sensitivity for SPECT imaging with large field of view (LFOV) gamma cameras. A HPS collimator has one segment with parallel holes and one or more segments with slant holes. The collimator can be mounted on a conventional SPECT LFOV system that uses parallel-beam collimators, and no additional detector or collimator motion is required for data acquisition. The parallel segment of the collimator allows for the acquisition of a complete data set of the organs-of-interest and the slant segments provide additional data. In this work, simulation studies of an MCAT phantom were performed with a HPS collimator with one slant segment. The slant direction points from patient head to patient feet with a slant angle of 30/spl deg/. We simulated 64 projection views over 180/spl deg/ with the modeling of nonuniform attenuation effect, and then reconstructed images using an MLEM algorithm that incorporated the hybrid geometry. It was shown that sensitivity to the cardiac region of the phantom was increased by approximately 50% when using the HPS collimator compared with a parallel-hole collimator. No visible artifacts were observed in the myocardium and the signal-to-noise ratio (SNR) of the myocardium walls was improved. Compared with collimators with other geometries, using a HPS collimator has the following advantages: (a) significant sensitivity increase; (b) a complete data set obtained from the parallel segment that allows for artifact-free image reconstruction; and (c) no additional collimator or detector motion. This work demonstrates the potential value of hybrid geometry in collimator design for LFOV SPECT imaging.
Accelerating EPI distortion correction by utilizing a modern GPU-based parallel computation.
Yang, Yao-Hao; Huang, Teng-Yi; Wang, Fu-Nien; Chuang, Tzu-Chao; Chen, Nan-Kuei
2013-04-01
The combination of phase demodulation and field mapping is a practical method to correct echo planar imaging (EPI) geometric distortion. However, since phase dispersion accumulates in each phase-encoding step, the calculation complexity of phase modulation is Ny-fold higher than conventional image reconstructions. Thus, correcting EPI images via phase demodulation is generally a time-consuming task. Parallel computing by employing general-purpose calculations on graphics processing units (GPU) can accelerate scientific computing if the algorithm is parallelized. This study proposes a method that incorporates the GPU-based technique into phase demodulation calculations to reduce computation time. The proposed parallel algorithm was applied to a PROPELLER-EPI diffusion tensor data set. The GPU-based phase demodulation method reduced the EPI distortion correctly, and accelerated the computation. The total reconstruction time of the 16-slice PROPELLER-EPI diffusion tensor images with matrix size of 128 × 128 was reduced from 1,754 seconds to 101 seconds by utilizing the parallelized 4-GPU program. GPU computing is a promising method to accelerate EPI geometric correction. The resulting reduction in computation time of phase demodulation should accelerate postprocessing for studies performed with EPI, and should effectuate the PROPELLER-EPI technique for clinical practice. Copyright © 2011 by the American Society of Neuroimaging.
Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2
Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.
2014-01-01
SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745
Yuan, Jie; Xu, Guan; Yu, Yao; Zhou, Yu; Carson, Paul L; Wang, Xueding; Liu, Xiaojun
2013-08-01
Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction, and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The back-projection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real-time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel, was conducted to verify the performance of this system for imaging fast biological events. The GPU-based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/patrealtime.
Delakis, Ioannis; Hammad, Omer; Kitney, Richard I
2007-07-07
Wavelet-based de-noising has been shown to improve image signal-to-noise ratio in magnetic resonance imaging (MRI) while maintaining spatial resolution. Wavelet-based de-noising techniques typically implemented in MRI require that noise displays uniform spatial distribution. However, images acquired with parallel MRI have spatially varying noise levels. In this work, a new algorithm for filtering images with parallel MRI is presented. The proposed algorithm extracts the edges from the original image and then generates a noise map from the wavelet coefficients at finer scales. The noise map is zeroed at locations where edges have been detected and directional analysis is also used to calculate noise in regions of low-contrast edges that may not have been detected. The new methodology was applied on phantom and brain images and compared with other applicable de-noising techniques. The performance of the proposed algorithm was shown to be comparable with other techniques in central areas of the images, where noise levels are high. In addition, finer details and edges were maintained in peripheral areas, where noise levels are low. The proposed methodology is fully automated and can be applied on final reconstructed images without requiring sensitivity profiles or noise matrices of the receiver coils, therefore making it suitable for implementation in a clinical MRI setting.
NASA Technical Reports Server (NTRS)
Reif, John H.
1987-01-01
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sewell, Christopher Meyer
This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.
2D Seismic Imaging of Elastic Parameters by Frequency Domain Full Waveform Inversion
NASA Astrophysics Data System (ADS)
Brossier, R.; Virieux, J.; Operto, S.
2008-12-01
Thanks to recent advances in parallel computing, full waveform inversion is today a tractable seismic imaging method to reconstruct physical parameters of the earth interior at different scales ranging from the near- surface to the deep crust. We present a massively parallel 2D frequency-domain full-waveform algorithm for imaging visco-elastic media from multi-component seismic data. The forward problem (i.e. the resolution of the frequency-domain 2D PSV elastodynamics equations) is based on low-order Discontinuous Galerkin (DG) method (P0 and/or P1 interpolations). Thanks to triangular unstructured meshes, the DG method allows accurate modeling of both body waves and surface waves in case of complex topography for a discretization of 10 to 15 cells per shear wavelength. The frequency-domain DG system is solved efficiently for multiple sources with the parallel direct solver MUMPS. The local inversion procedure (i.e. minimization of residuals between observed and computed data) is based on the adjoint-state method which allows to efficiently compute the gradient of the objective function. Applying the inversion hierarchically from the low frequencies to the higher ones defines a multiresolution imaging strategy which helps convergence towards the global minimum. In place of expensive Newton algorithm, the combined use of the diagonal terms of the approximate Hessian matrix and optimization algorithms based on quasi-Newton methods (Conjugate Gradient, LBFGS, ...) allows to improve the convergence of the iterative inversion. The distribution of forward problem solutions over processors driven by a mesh partitioning performed by METIS allows to apply most of the inversion in parallel. We shall present the main features of the parallel modeling/inversion algorithm, assess its scalability and illustrate its performances with realistic synthetic case studies.
A 32-Channel Combined RF and B0 Shim Array for 3T Brain Imaging
Stockmann, Jason P.; Witzel, Thomas; Keil, Boris; Polimeni, Jonathan R.; Mareyam, Azma; LaPierre, Cristen; Setsompop, Kawin; Wald, Lawrence L.
2016-01-01
Purpose We add user-controllable direct currents (DC) to the individual elements of a 32-channel radio-frequency (RF) receive array to provide B0 shimming ability while preserving the array’s reception sensitivity and parallel imaging performance. Methods Shim performance using constrained DC current (±2.5A) is simulated for brain arrays ranging from 8 to 128 elements. A 32-channel 3-tesla brain array is realized using inductive chokes to bridge the tuning capacitors on each RF loop. The RF and B0 shimming performance is assessed in bench and imaging measurements. Results The addition of DC currents to the 32-channel RF array is achieved with minimal disruption of the RF performance and/or negative side effects such as conductor heating or mechanical torques. The shimming results agree well with simulations and show performance superior to third-order spherical harmonic (SH) shimming. Imaging tests show the ability to reduce the standard frontal lobe susceptibility-induced fields and improve echo planar imaging geometric distortion. The simulation of 64- and 128-channel brain arrays suggest that even further shimming improvement is possible (equivalent to up to 6th-order SH shim coils). Conclusion Including user-controlled shim currents on the loops of a conventional highly parallel brain array coil is feasible with modest current levels and produces improved B0 shimming performance over standard second-order SH shimming. PMID:25689977
ImageJ: Image processing and analysis in Java
NASA Astrophysics Data System (ADS)
Rasband, W. S.
2012-06-01
ImageJ is a public domain Java image processing program inspired by NIH Image. It can display, edit, analyze, process, save and print 8-bit, 16-bit and 32-bit images. It can read many image formats including TIFF, GIF, JPEG, BMP, DICOM, FITS and "raw". It supports "stacks", a series of images that share a single window. It is multithreaded, so time-consuming operations such as image file reading can be performed in parallel with other operations.
A survey of GPU-based acceleration techniques in MRI reconstructions
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou
2018-01-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community. PMID:29675361
A survey of GPU-based acceleration techniques in MRI reconstructions.
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou; Liang, Dong
2018-03-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community.
JANUS: A Compilation System for Balancing Parallelism and Performance in OpenVX
NASA Astrophysics Data System (ADS)
Omidian, Hossein; Lemieux, Guy G. F.
2018-04-01
Embedded systems typically do not have enough on-chip memory for entire an image buffer. Programming systems like OpenCV operate on entire image frames at each step, making them use excessive memory bandwidth and power. In contrast, the paradigm used by OpenVX is much more efficient; it uses image tiling, and the compilation system is allowed to analyze and optimize the operation sequence, specified as a compute graph, before doing any pixel processing. In this work, we are building a compilation system for OpenVX that can analyze and optimize the compute graph to take advantage of parallel resources in many-core systems or FPGAs. Using a database of prewritten OpenVX kernels, it automatically adjusts the image tile size as well as using kernel duplication and coalescing to meet a defined area (resource) target, or to meet a specified throughput target. This allows a single compute graph to target implementations with a wide range of performance needs or capabilities, e.g. from handheld to datacenter, that use minimal resources and power to reach the performance target.
Concentric Rings K-Space Trajectory for Hyperpolarized 13C MR Spectroscopic Imaging
Jiang, Wenwen; Lustig, Michael; Larson, Peder E.Z.
2014-01-01
Purpose To develop a robust and rapid imaging technique for hyperpolarized 13C MR Spectroscopic Imaging (MRSI) and investigate its performance. Methods A concentric rings readout trajectory with constant angular velocity is proposed for hyperpolarized 13C spectroscopic imaging and its properties are analyzed. Quantitative analyses of design tradeoffs are presented for several imaging scenarios. The first application of concentric rings on 13C phantoms and in vivo animal hyperpolarized 13C MRSI studies were performed to demonstrate the feasibility of the proposed method. Finally, a parallel imaging accelerated concentric rings study is presented. Results The concentric rings MRSI trajectory has the advantages of acquisition timesaving compared to echo-planar spectroscopic imaging (EPSI). It provides sufficient spectral bandwidth with relatively high SNR efficiency compared to EPSI and spiral techniques. Phantom and in vivo animal studies showed good image quality with half the scan time and reduced pulsatile flow artifacts compared to EPSI. Parallel imaging accelerated concentric rings showed advantages over Cartesian sampling in g-factor simulations and demonstrated aliasing-free image quality in a hyperpolarized 13C in vivo study. Conclusion The concentric rings trajectory is a robust and rapid imaging technique that fits very well with the speed, bandwidth, and resolution requirements of hyperpolarized 13C MRSI. PMID:25533653
Non-Cartesian Parallel Imaging Reconstruction
Wright, Katherine L.; Hamilton, Jesse I.; Griswold, Mark A.; Gulani, Vikas; Seiberlich, Nicole
2014-01-01
Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be employed to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the non-homogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian GRAPPA, and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. PMID:24408499
Performance of the Wavelet Decomposition on Massively Parallel Architectures
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek A.; LeMoigne, Jacqueline; Zukor, Dorothy (Technical Monitor)
2001-01-01
Traditionally, Fourier Transforms have been utilized for performing signal analysis and representation. But although it is straightforward to reconstruct a signal from its Fourier transform, no local description of the signal is included in its Fourier representation. To alleviate this problem, Windowed Fourier transforms and then wavelet transforms have been introduced, and it has been proven that wavelets give a better localization than traditional Fourier transforms, as well as a better division of the time- or space-frequency plane than Windowed Fourier transforms. Because of these properties and after the development of several fast algorithms for computing the wavelet representation of any signal, in particular the Multi-Resolution Analysis (MRA) developed by Mallat, wavelet transforms have increasingly been applied to signal analysis problems, especially real-life problems, in which speed is critical. In this paper we present and compare efficient wavelet decomposition algorithms on different parallel architectures. We report and analyze experimental measurements, using NASA remotely sensed images. Results show that our algorithms achieve significant performance gains on current high performance parallel systems, and meet scientific applications and multimedia requirements. The extensive performance measurements collected over a number of high-performance computer systems have revealed important architectural characteristics of these systems, in relation to the processing demands of the wavelet decomposition of digital images.
Discrimination of portraits using a hybrid parallel joint transform correlator system
NASA Astrophysics Data System (ADS)
Inaba, Rieko; Hashimoto, Asako; Kodate, Kashiko
1999-05-01
A hybrid parallel joint transform correlation system is demonstrated through the introduction of a five-channel binary zone plate array and is applied to the discrimination of portraits for a presumed criminal investigation. In order to improve performance, we adopt pe-processing of images with white area of 20%. Furthermore, we discuss the robustness.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, K.R.; Hansen, F.R.; Napolitano, L.M.
1992-01-01
DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate ( C'' or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability bymore » using DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, K.R.; Hansen, F.R.; Napolitano, L.M.
1992-01-01
DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate (``C`` or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability by usingmore » DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
NASA Astrophysics Data System (ADS)
Zhang, Lei; Yang, Fengbao; Ji, Linna; Lv, Sheng
2018-01-01
Diverse image fusion methods perform differently. Each method has advantages and disadvantages compared with others. One notion is that the advantages of different image methods can be effectively combined. A multiple-algorithm parallel fusion method based on algorithmic complementarity and synergy is proposed. First, in view of the characteristics of the different algorithms and difference-features among images, an index vector-based feature-similarity is proposed to define the degree of complementarity and synergy. This proposed index vector is a reliable evidence indicator for algorithm selection. Second, the algorithms with a high degree of complementarity and synergy are selected. Then, the different degrees of various features and infrared intensity images are used as the initial weights for the nonnegative matrix factorization (NMF). This avoids randomness of the NMF initialization parameter. Finally, the fused images of different algorithms are integrated using the NMF because of its excellent data fusing performance on independent features. Experimental results demonstrate that the visual effect and objective evaluation index of the fused images obtained using the proposed method are better than those obtained using traditional methods. The proposed method retains all the advantages that individual fusion algorithms have.
Three Element Phased Array Coil for Imaging of Rat Spinal Cord at 7T
Mogatadakala, Kishore V.; Bankson, James A.; Narayana, Ponnada A.
2008-01-01
In order to overcome some of the limitations of an implantable coil, including its invasive nature and limited spatial coverage, a three element phased array coil is described for high resolution magnetic resonance imaging (MRI) of rat spinal cord. This coil allows imaging both thoracic and cervical segments of rat spinal cord. In the current design, coupling between the nearest neighbors was minimized by overlapping the coil elements. A simple capacitive network was used for decoupling the next neighbor elements. The dimensions of individual coils in the array were determined based on the signal-to-noise ratio (SNR) measurements performed on a phantom with three different surface coils. SNR measurements on a phantom demonstrated higher SNR of the phased array coil relative to two different volume coils. In-vivo images acquired on rat spinal cord with our coil demonstrated excellent gray and white matter contrast. To evaluate the performance of the phased array coil under parallel imaging, g-factor maps were obtained for two different acceleration factors of 2 and 3. These simulations indicate that parallel imaging with acceleration factor of 2 would be possible without significant image reconstruction related noise amplifications. PMID:19025892
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
Algorithms and programming tools for image processing on the MPP, part 2
NASA Technical Reports Server (NTRS)
Reeves, Anthony P.
1986-01-01
A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.
Filli, Lukas; Piccirelli, Marco; Kenkel, David; Guggenberger, Roman; Andreisek, Gustav; Beck, Thomas; Runge, Val M; Boss, Andreas
2015-07-01
The aim of this study was to investigate the feasibility of accelerated diffusion tensor imaging (DTI) of skeletal muscle using echo planar imaging (EPI) applying simultaneous multislice excitation with a blipped controlled aliasing in parallel imaging results in higher acceleration unaliasing technique. After federal ethics board approval, the lower leg muscles of 8 healthy volunteers (mean [SD] age, 29.4 [2.9] years) were examined in a clinical 3-T magnetic resonance scanner using a 15-channel knee coil. The EPI was performed at a b value of 500 s/mm2 without slice acceleration (conventional DTI) as well as with 2-fold and 3-fold acceleration. Fractional anisotropy (FA) and mean diffusivity (MD) were measured in all 3 acquisitions. Fiber tracking performance was compared between the acquisitions regarding the number of tracks, average track length, and anatomical precision using multivariate analysis of variance and Mann-Whitney U tests. Acquisition time was 7:24 minutes for conventional DTI, 3:53 minutes for 2-fold acceleration, and 2:38 minutes for 3-fold acceleration. Overall FA and MD values ranged from 0.220 to 0.378 and 1.595 to 1.829 mm2/s, respectively. Two-fold acceleration yielded similar FA and MD values (P ≥ 0.901) and similar fiber tracking performance compared with conventional DTI. Three-fold acceleration resulted in comparable MD (P = 0.199) but higher FA values (P = 0.006) and significantly impaired fiber tracking in the soleus and tibialis anterior muscles (number of tracks, P < 0.001; anatomical precision, P ≤ 0.005). Simultaneous multislice EPI with blipped controlled aliasing in parallel imaging results in higher acceleration can remarkably reduce acquisition time in DTI of skeletal muscle with similar image quality and quantification accuracy of diffusion parameters. This may increase the clinical applicability of muscle anisotropy measurements.
Yan, Susu; Bowsher, James; Tough, MengHeng; Cheng, Lin; Yin, Fang-Fang
2014-01-01
Purpose: To construct a robotic SPECT system and to demonstrate its capability to image a thorax phantom on a radiation therapy flat-top couch, as a step toward onboard functional and molecular imaging in radiation therapy. Methods: A robotic SPECT imaging system was constructed utilizing a gamma camera detector (Digirad 2020tc) and a robot (KUKA KR150 L110 robot). An imaging study was performed with a phantom (PET CT PhantomTM), which includes five spheres of 10, 13, 17, 22, and 28 mm diameters. The phantom was placed on a flat-top couch. SPECT projections were acquired either with a parallel-hole collimator or a single-pinhole collimator, both without background in the phantom and with background at 1/10th the sphere activity concentration. The imaging trajectories of parallel-hole and pinhole collimated detectors spanned 180° and 228°, respectively. The pinhole detector viewed an off-centered spherical common volume which encompassed the 28 and 22 mm spheres. The common volume for parallel-hole system was centered at the phantom which encompassed all five spheres in the phantom. The maneuverability of the robotic system was tested by navigating the detector to trace the phantom and flat-top table while avoiding collision and maintaining the closest possible proximity to the common volume. The robot base and tool coordinates were used for image reconstruction. Results: The robotic SPECT system was able to maneuver parallel-hole and pinhole collimated SPECT detectors in close proximity to the phantom, minimizing impact of the flat-top couch on detector radius of rotation. Without background, all five spheres were visible in the reconstructed parallel-hole image, while four spheres, all except the smallest one, were visible in the reconstructed pinhole image. With background, three spheres of 17, 22, and 28 mm diameters were readily observed with the parallel-hole imaging, and the targeted spheres (22 and 28 mm diameters) were readily observed in the pinhole region-of-interest imaging. Conclusions: Onboard SPECT could be achieved by a robot maneuvering a SPECT detector about patients in position for radiation therapy on a flat-top couch. The robot inherent coordinate frames could be an effective means to estimate detector pose for use in SPECT image reconstruction. PMID:25370663
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Susu, E-mail: susu.yan@duke.edu; Tough, MengHeng; Bowsher, James
Purpose: To construct a robotic SPECT system and to demonstrate its capability to image a thorax phantom on a radiation therapy flat-top couch, as a step toward onboard functional and molecular imaging in radiation therapy. Methods: A robotic SPECT imaging system was constructed utilizing a gamma camera detector (Digirad 2020tc) and a robot (KUKA KR150 L110 robot). An imaging study was performed with a phantom (PET CT Phantom{sup TM}), which includes five spheres of 10, 13, 17, 22, and 28 mm diameters. The phantom was placed on a flat-top couch. SPECT projections were acquired either with a parallel-hole collimator ormore » a single-pinhole collimator, both without background in the phantom and with background at 1/10th the sphere activity concentration. The imaging trajectories of parallel-hole and pinhole collimated detectors spanned 180° and 228°, respectively. The pinhole detector viewed an off-centered spherical common volume which encompassed the 28 and 22 mm spheres. The common volume for parallel-hole system was centered at the phantom which encompassed all five spheres in the phantom. The maneuverability of the robotic system was tested by navigating the detector to trace the phantom and flat-top table while avoiding collision and maintaining the closest possible proximity to the common volume. The robot base and tool coordinates were used for image reconstruction. Results: The robotic SPECT system was able to maneuver parallel-hole and pinhole collimated SPECT detectors in close proximity to the phantom, minimizing impact of the flat-top couch on detector radius of rotation. Without background, all five spheres were visible in the reconstructed parallel-hole image, while four spheres, all except the smallest one, were visible in the reconstructed pinhole image. With background, three spheres of 17, 22, and 28 mm diameters were readily observed with the parallel-hole imaging, and the targeted spheres (22 and 28 mm diameters) were readily observed in the pinhole region-of-interest imaging. Conclusions: Onboard SPECT could be achieved by a robot maneuvering a SPECT detector about patients in position for radiation therapy on a flat-top couch. The robot inherent coordinate frames could be an effective means to estimate detector pose for use in SPECT image reconstruction.« less
Mitra, Ayan; Politte, David G; Whiting, Bruce R; Williamson, Jeffrey F; O'Sullivan, Joseph A
2017-01-01
Model-based image reconstruction (MBIR) techniques have the potential to generate high quality images from noisy measurements and a small number of projections which can reduce the x-ray dose in patients. These MBIR techniques rely on projection and backprojection to refine an image estimate. One of the widely used projectors for these modern MBIR based technique is called branchless distance driven (DD) projection and backprojection. While this method produces superior quality images, the computational cost of iterative updates keeps it from being ubiquitous in clinical applications. In this paper, we provide several new parallelization ideas for concurrent execution of the DD projectors in multi-GPU systems using CUDA programming tools. We have introduced some novel schemes for dividing the projection data and image voxels over multiple GPUs to avoid runtime overhead and inter-device synchronization issues. We have also reduced the complexity of overlap calculation of the algorithm by eliminating the common projection plane and directly projecting the detector boundaries onto image voxel boundaries. To reduce the time required for calculating the overlap between the detector edges and image voxel boundaries, we have proposed a pre-accumulation technique to accumulate image intensities in perpendicular 2D image slabs (from a 3D image) before projection and after backprojection to ensure our DD kernels run faster in parallel GPU threads. For the implementation of our iterative MBIR technique we use a parallel multi-GPU version of the alternating minimization (AM) algorithm with penalized likelihood update. The time performance using our proposed reconstruction method with Siemens Sensation 16 patient scan data shows an average of 24 times speedup using a single TITAN X GPU and 74 times speedup using 3 TITAN X GPUs in parallel for combined projection and backprojection.
The Goddard Space Flight Center Program to develop parallel image processing systems
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1972-01-01
Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Zhou, Lili; Clifford Chao, K S; Chang, Jenghwa
2012-11-01
Simulated projection images of digital phantoms constructed from CT scans have been widely used for clinical and research applications but their quality and computation speed are not optimal for real-time comparison with the radiography acquired with an x-ray source of different energies. In this paper, the authors performed polyenergetic forward projections using open computing language (OpenCL) in a parallel computing ecosystem consisting of CPU and general purpose graphics processing unit (GPGPU) for fast and realistic image formation. The proposed polyenergetic forward projection uses a lookup table containing the NIST published mass attenuation coefficients (μ∕ρ) for different tissue types and photon energies ranging from 1 keV to 20 MeV. The CT images of interested sites are first segmented into different tissue types based on the CT numbers and converted to a three-dimensional attenuation phantom by linking each voxel to the corresponding tissue type in the lookup table. The x-ray source can be a radioisotope or an x-ray generator with a known spectrum described as weight w(n) for energy bin E(n). The Siddon method is used to compute the x-ray transmission line integral for E(n) and the x-ray fluence is the weighted sum of the exponential of line integral for all energy bins with added Poisson noise. To validate this method, a digital head and neck phantom constructed from the CT scan of a Rando head phantom was segmented into three (air, gray∕white matter, and bone) regions for calculating the polyenergetic projection images for the Mohan 4 MV energy spectrum. To accelerate the calculation, the authors partitioned the workloads using the task parallelism and data parallelism and scheduled them in a parallel computing ecosystem consisting of CPU and GPGPU (NVIDIA Tesla C2050) using OpenCL only. The authors explored the task overlapping strategy and the sequential method for generating the first and subsequent DRRs. A dispatcher was designed to drive the high-degree parallelism of the task overlapping strategy. Numerical experiments were conducted to compare the performance of the OpenCL∕GPGPU-based implementation with the CPU-based implementation. The projection images were similar to typical portal images obtained with a 4 or 6 MV x-ray source. For a phantom size of 512 × 512 × 223, the time for calculating the line integrals for a 512 × 512 image panel was 16.2 ms on GPGPU for one energy bin in comparison to 8.83 s on CPU. The total computation time for generating one polyenergetic projection image of 512 × 512 was 0.3 s (141 s for CPU). The relative difference between the projection images obtained with the CPU-based and OpenCL∕GPGPU-based implementations was on the order of 10(-6) and was virtually indistinguishable. The task overlapping strategy was 5.84 and 1.16 times faster than the sequential method for the first and the subsequent digitally reconstruction radiographies, respectively. The authors have successfully built digital phantoms using anatomic CT images and NIST μ∕ρ tables for simulating realistic polyenergetic projection images and optimized the processing speed with parallel computing using GPGPU∕OpenCL-based implementation. The computation time was fast (0.3 s per projection image) enough for real-time IGRT (image-guided radiotherapy) applications.
INVITED TOPICAL REVIEW: Parallel magnetic resonance imaging
NASA Astrophysics Data System (ADS)
Larkman, David J.; Nunes, Rita G.
2007-04-01
Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed.
NASA Astrophysics Data System (ADS)
Lazcano, R.; Madroñal, D.; Fabelo, H.; Ortega, S.; Salvador, R.; Callicó, G. M.; Juárez, E.; Sanz, C.
2017-10-01
Hyperspectral Imaging (HI) assembles high resolution spectral information from hundreds of narrow bands across the electromagnetic spectrum, thus generating 3D data cubes in which each pixel gathers the spectral information of the reflectance of every spatial pixel. As a result, each image is composed of large volumes of data, which turns its processing into a challenge, as performance requirements have been continuously tightened. For instance, new HI applications demand real-time responses. Hence, parallel processing becomes a necessity to achieve this requirement, so the intrinsic parallelism of the algorithms must be exploited. In this paper, a spatial-spectral classification approach has been implemented using a dataflow language known as RVCCAL. This language represents a system as a set of functional units, and its main advantage is that it simplifies the parallelization process by mapping the different blocks over different processing units. The spatial-spectral classification approach aims at refining the classification results previously obtained by using a K-Nearest Neighbors (KNN) filtering process, in which both the pixel spectral value and the spatial coordinates are considered. To do so, KNN needs two inputs: a one-band representation of the hyperspectral image and the classification results provided by a pixel-wise classifier. Thus, spatial-spectral classification algorithm is divided into three different stages: a Principal Component Analysis (PCA) algorithm for computing the one-band representation of the image, a Support Vector Machine (SVM) classifier, and the KNN-based filtering algorithm. The parallelization of these algorithms shows promising results in terms of computational time, as the mapping of them over different cores presents a speedup of 2.69x when using 3 cores. Consequently, experimental results demonstrate that real-time processing of hyperspectral images is achievable.
NASA Astrophysics Data System (ADS)
Wang, Yang; Wang, Qianqian
2008-12-01
When laser ranger is transported or used in field operations, the transmitting axis, receiving axis and aiming axis may be not parallel. The nonparallelism of the three-light-axis will affect the range-measuring ability or make laser ranger not be operated exactly. So testing and adjusting the three-light-axis parallelity in the production and maintenance of laser ranger is important to ensure using laser ranger reliably. The paper proposes a new measurement method using digital image processing based on the comparison of some common measurement methods for the three-light-axis parallelity. It uses large aperture off-axis paraboloid reflector to get the images of laser spot and white light cross line, and then process the images on LabVIEW platform. The center of white light cross line can be achieved by the matching arithmetic in LABVIEW DLL. And the center of laser spot can be achieved by gradation transformation, binarization and area filter in turn. The software system can set CCD, detect the off-axis paraboloid reflector, measure the parallelity of transmitting axis and aiming axis and control the attenuation device. The hardware system selects SAA7111A, a programmable vedio decoding chip, to perform A/D conversion. FIFO (first-in first-out) is selected as buffer.USB bus is used to transmit data to PC. The three-light-axis parallelity can be achieved according to the position bias between them. The device based on this method has been already used. The application proves this method has high precision, speediness and automatization.
Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin
2013-12-01
Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency. Copyright © 2013 Elsevier B.V. All rights reserved.
Huellner, Martin W; Appenzeller, Philippe; Kuhn, Félix P; Husmann, Lars; Pietsch, Carsten M; Burger, Irene A; Porto, Miguel; Delso, Gaspar; von Schulthess, Gustav K; Veit-Haibach, Patrick
2014-12-01
To assess the diagnostic performance of whole-body non-contrast material-enhanced positron emission tomography (PET)/magnetic resonance (MR) imaging and PET/computed tomography (CT) for staging and restaging of cancers and provide guidance for modality and sequence selection. This study was approved by the institutional review board and national government authorities. One hundred six consecutive patients (median age, 68 years; 46 female and 60 male patients) referred for staging or restaging of oncologic malignancies underwent whole-body imaging with a sequential trimodality PET/CT/MR system. The MR protocol included short inversion time inversion-recovery ( STIR short inversion time inversion-recovery ), Dixon-type liver accelerated volume acquisition ( LAVA liver accelerated volume acquisition ; GE Healthcare, Waukesha, Wis), and respiratory-gated periodically rotated overlapping parallel lines with enhanced reconstruction ( PROPELLER periodically rotated overlapping parallel lines with enhanced reconstruction ; GE Healthcare) sequences. Primary tumors (n = 43), local lymph node metastases (n = 74), and distant metastases (n = 66) were evaluated for conspicuity (scored 0-4), artifacts (scored 0-2), and reader confidence on PET/CT and PET/MR images. Subanalysis for lung lesions (n = 46) was also performed. Relevant incidental findings with both modalities were compared. Interreader agreement was analyzed with intraclass correlation coefficients and κ statistics. Lesion conspicuity, image artifacts, and incidental findings were analyzed with nonparametric tests. Primary tumors were less conspicuous on STIR short inversion time inversion-recovery (3.08, P = .016) and LAVA liver accelerated volume acquisition (2.64, P = .002) images than on CT images (3.49), while findings with the PROPELLER periodically rotated overlapping parallel lines with enhanced reconstruction sequence (3.70, P = .436) were comparable to those at CT. In distant metastases, the PROPELLER periodically rotated overlapping parallel lines with enhanced reconstruction sequence (3.84) yielded better results than CT (2.88, P < .001). Subanalysis for lung lesions yielded similar results (primary lung tumors: CT, 3.71; STIR short inversion time inversion-recovery , 3.32 [P = .014]; LAVA liver accelerated volume acquisition , 2.52 [P = .002]; PROPELLER periodically rotated overlapping parallel lines with enhanced reconstruction , 3.64 [P = .546]). Readers classified lesions more confidently with PET/MR than PET/CT. However, PET/CT showed more incidental findings than PET/MR (P = .039), especially in the lung (P < .001). MR images had more artifacts than CT images. PET/MR performs comparably to PET/CT in whole-body oncology and neoplastic lung disease, with the use of appropriate sequences. Further studies are needed to define regionalized PET/MR protocols with sequences tailored to specific tumor entities. © RSNA, 2014 Online supplemental material is available for this article.
Super-resolved Parallel MRI by Spatiotemporal Encoding
Schmidt, Rita; Baishya, Bikash; Ben-Eliezer, Noam; Seginer, Amir; Frydman, Lucio
2016-01-01
Recent studies described an alternative “ultrafast” scanning method based on spatiotemporal (SPEN) principles. SPEN demonstrates numerous potential advantages over EPI-based alternatives, at no additional expense in experimental complexity. An important aspect that SPEN still needs to achieve for providing a competitive acquisition alternative entails exploiting parallel imaging algorithms, without compromising its proven capabilities. The present work introduces a combination of multi-band frequency-swept pulses simultaneously encoding multiple, partial fields-of-view; together with a new algorithm merging a Super-Resolved SPEN image reconstruction and SENSE multiple-receiving methods. The ensuing approach enables one to reduce both the excitation and acquisition times of ultrafast SPEN acquisitions by the customary acceleration factor R, without compromises in either the ensuing spatial resolution, SAR deposition, or the capability to operate in multi-slice mode. The performance of these new single-shot imaging sequences and their ancillary algorithms were explored on phantoms and human volunteers at 3T. The gains of the parallelized approach were particularly evident when dealing with heterogeneous systems subject to major T2/T2* effects, as is the case upon single-scan imaging near tissue/air interfaces. PMID:24120293
The correlation study of parallel feature extractor and noise reduction approaches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dewi, Deshinta Arrova; Sundararajan, Elankovan; Prabuwono, Anton Satria
2015-05-15
This paper presents literature reviews that show variety of techniques to develop parallel feature extractor and finding its correlation with noise reduction approaches for low light intensity images. Low light intensity images are normally displayed as darker images and low contrast. Without proper handling techniques, those images regularly become evidences of misperception of objects and textures, the incapability to section them. The visual illusions regularly clues to disorientation, user fatigue, poor detection and classification performance of humans and computer algorithms. Noise reduction approaches (NR) therefore is an essential step for other image processing steps such as edge detection, image segmentation,more » image compression, etc. Parallel Feature Extractor (PFE) meant to capture visual contents of images involves partitioning images into segments, detecting image overlaps if any, and controlling distributed and redistributed segments to extract the features. Working on low light intensity images make the PFE face challenges and closely depend on the quality of its pre-processing steps. Some papers have suggested many well established NR as well as PFE strategies however only few resources have suggested or mentioned the correlation between them. This paper reviews best approaches of the NR and the PFE with detailed explanation on the suggested correlation. This finding may suggest relevant strategies of the PFE development. With the help of knowledge based reasoning, computational approaches and algorithms, we present the correlation study between the NR and the PFE that can be useful for the development and enhancement of other existing PFE.« less
Parallel detecting super-resolution microscopy using correlation based image restoration
NASA Astrophysics Data System (ADS)
Yu, Zhongzhi; Liu, Shaocong; Zhu, Dazhao; Kuang, Cuifang; Liu, Xu
2017-12-01
A novel approach to achieve the image restoration is proposed in which each detector's relative position in the detector array is no longer a necessity. We can identify each detector's relative location by extracting a certain area from one of the detector's image and scanning it on other detectors' images. According to this location, we can generate the point spread functions (PSF) for each detector and perform deconvolution for image restoration. Equipped with this method, the microscope with discretionally designed detector array can be easily constructed without the concern of exact relative locations of detectors. The simulated results and experimental results show the total improvement in resolution with a factor of 1.7 compared to conventional confocal fluorescence microscopy. With the significant enhancement in resolution and easiness for application of this method, this novel method should have potential for a wide range of application in fluorescence microscopy based on parallel detecting.
Development of an optical inspection platform for surface defect detection in touch panel glass
NASA Astrophysics Data System (ADS)
Chang, Ming; Chen, Bo-Cheng; Gabayno, Jacque Lynn; Chen, Ming-Fu
2016-04-01
An optical inspection platform combining parallel image processing with high resolution opto-mechanical module was developed for defect inspection of touch panel glass. Dark field images were acquired using a 12288-pixel line CCD camera with 3.5 µm per pixel resolution and 12 kHz line rate. Key features of the glass surface were analyzed by parallel image processing on combined CPU and GPU platforms. Defect inspection of touch panel glass, which provided 386 megapixel image data per sample, was completed in roughly 5 seconds. High detection rate of surface scratches on the touch panel glass was realized with minimum defects size of about 10 µm after inspection. The implementation of a custom illumination source significantly improved the scattering efficiency on the surface, therefore enhancing the contrast in the acquired images and overall performance of the inspection system.
Thread concept for automatic task parallelization in image analysis
NASA Astrophysics Data System (ADS)
Lueckenhaus, Maximilian; Eckstein, Wolfgang
1998-09-01
Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
NASA Astrophysics Data System (ADS)
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Plaza, Javier; Paz, Abel
2010-10-01
Latest generation remote sensing instruments (called hyperspectral imagers) are now able to generate hundreds of images, corresponding to different wavelength channels, for the same area on the surface of the Earth. In previous work, we have reported that the scalability of parallel processing algorithms dealing with these high-dimensional data volumes is affected by the amount of data to be exchanged through the communication network of the system. However, large messages are common in hyperspectral imaging applications since processing algorithms are pixel-based, and each pixel vector to be exchanged through the communication network is made up of hundreds of spectral values. Thus, decreasing the amount of data to be exchanged could improve the scalability and parallel performance. In this paper, we propose a new framework based on intelligent utilization of wavelet-based data compression techniques for improving the scalability of a standard hyperspectral image processing chain on heterogeneous networks of workstations. This type of parallel platform is quickly becoming a standard in hyperspectral image processing due to the distributed nature of collected hyperspectral data as well as its flexibility and low cost. Our experimental results indicate that adaptive lossy compression can lead to improvements in the scalability of the hyperspectral processing chain without sacrificing analysis accuracy, even at sub-pixel precision levels.
Novel wavelength diversity technique for high-speed atmospheric turbulence compensation
NASA Astrophysics Data System (ADS)
Arrasmith, William W.; Sullivan, Sean F.
2010-04-01
The defense, intelligence, and homeland security communities are driving a need for software dominant, real-time or near-real time atmospheric turbulence compensated imagery. The development of parallel processing capabilities are finding application in diverse areas including image processing, target tracking, pattern recognition, and image fusion to name a few. A novel approach to the computationally intensive case of software dominant optical and near infrared imaging through atmospheric turbulence is addressed in this paper. Previously, the somewhat conventional wavelength diversity method has been used to compensate for atmospheric turbulence with great success. We apply a new correlation based approach to the wavelength diversity methodology using a parallel processing architecture enabling high speed atmospheric turbulence compensation. Methods for optical imaging through distributed turbulence are discussed, simulation results are presented, and computational and performance assessments are provided.
Tracking moving radar targets with parallel, velocity-tuned filters
Bickel, Douglas L.; Harmony, David W.; Bielek, Timothy P.; Hollowell, Jeff A.; Murray, Margaret S.; Martinez, Ana
2013-04-30
Radar data associated with radar illumination of a movable target is processed to monitor motion of the target. A plurality of filter operations are performed in parallel on the radar data so that each filter operation produces target image information. The filter operations are defined to have respectively corresponding velocity ranges that differ from one another. The target image information produced by one of the filter operations represents the target more accurately than the target image information produced by the remainder of the filter operations when a current velocity of the target is within the velocity range associated with the one filter operation. In response to the current velocity of the target being within the velocity range associated with the one filter operation, motion of the target is tracked based on the target image information produced by the one filter operation.
A high performance parallel computing architecture for robust image features
NASA Astrophysics Data System (ADS)
Zhou, Renyan; Liu, Leibo; Wei, Shaojun
2014-03-01
A design of parallel architecture for image feature detection and description is proposed in this article. The major component of this architecture is a 2D cellular network composed of simple reprogrammable processors, enabling the Hessian Blob Detector and Haar Response Calculation, which are the most computing-intensive stage of the Speeded Up Robust Features (SURF) algorithm. Combining this 2D cellular network and dedicated hardware for SURF descriptors, this architecture achieves real-time image feature detection with minimal software in the host processor. A prototype FPGA implementation of the proposed architecture achieves 1318.9 GOPS general pixel processing @ 100 MHz clock and achieves up to 118 fps in VGA (640 × 480) image feature detection. The proposed architecture is stand-alone and scalable so it is easy to be migrated into VLSI implementation.
Parallel-multiplexed excitation light-sheet microscopy (Conference Presentation)
NASA Astrophysics Data System (ADS)
Xu, Dongli; Zhou, Weibin; Peng, Leilei
2017-02-01
Laser scanning light-sheet imaging allows fast 3D image of live samples with minimal bleach and photo-toxicity. Existing light-sheet techniques have very limited capability in multi-label imaging. Hyper-spectral imaging is needed to unmix commonly used fluorescent proteins with large spectral overlaps. However, the challenge is how to perform hyper-spectral imaging without sacrificing the image speed, so that dynamic and complex events can be captured live. We report wavelength-encoded structured illumination light sheet imaging (λ-SIM light-sheet), a novel light-sheet technique that is capable of parallel multiplexing in multiple excitation-emission spectral channels. λ-SIM light-sheet captures images of all possible excitation-emission channels in true parallel. It does not require compromising the imaging speed and is capable of distinguish labels by both excitation and emission spectral properties, which facilitates unmixing fluorescent labels with overlapping spectral peaks and will allow more labels being used together. We build a hyper-spectral light-sheet microscope that combined λ-SIM with an extended field of view through Bessel beam illumination. The system has a 250-micron-wide field of view and confocal level resolution. The microscope, equipped with multiple laser lines and an unlimited number of spectral channels, can potentially image up to 6 commonly used fluorescent proteins from blue to red. Results from in vivo imaging of live zebrafish embryos expressing various genetic markers and sensors will be shown. Hyper-spectral images from λ-SIM light-sheet will allow multiplexed and dynamic functional imaging in live tissue and animals.
NASA Astrophysics Data System (ADS)
Roh, Y. H.; Yoon, Y.; Kim, K.; Kim, J.; Kim, J.; Morishita, J.
2016-10-01
Scattered radiation is the main reason for the degradation of image quality and the increased patient exposure dose in diagnostic radiology. In an effort to reduce scattered radiation, a novel structure of an indirect flat panel detector has been proposed. In this study, a performance evaluation of the novel system in terms of image contrast as well as an estimation of the number of photons incident on the detector and the grid exposure factor were conducted using Monte Carlo simulations. The image contrast of the proposed system was superior to that of the no-grid system but slightly inferior to that of the parallel-grid system. The number of photons incident on the detector and the grid exposure factor of the novel system were higher than those of the parallel-grid system but lower than those of the no-grid system. The proposed system exhibited the potential for reduced exposure dose without image quality degradation; additionally, can be further improved by a structural optimization considering the manufacturer's specifications of its lead contents.
An improved non-uniformity correction algorithm and its GPU parallel implementation
NASA Astrophysics Data System (ADS)
Cheng, Kuanhong; Zhou, Huixin; Qin, Hanlin; Zhao, Dong; Qian, Kun; Rong, Shenghui
2018-05-01
The performance of SLP-THP based non-uniformity correction algorithm is seriously affected by the result of SLP filter, which always leads to image blurring and ghosting artifacts. To address this problem, an improved SLP-THP based non-uniformity correction method with curvature constraint was proposed. Here we put forward a new way to estimate spatial low frequency component. First, the details and contours of input image were obtained respectively by minimizing local Gaussian curvature and mean curvature of image surface. Then, the guided filter was utilized to combine these two parts together to get the estimate of spatial low frequency component. Finally, we brought this SLP component into SLP-THP method to achieve non-uniformity correction. The performance of proposed algorithm was verified by several real and simulated infrared image sequences. The experimental results indicated that the proposed algorithm can reduce the non-uniformity without detail losing. After that, a GPU based parallel implementation that runs 150 times faster than CPU was presented, which showed the proposed algorithm has great potential for real time application.
Parallel Implementation of 3-D Iterative Reconstruction With Intra-Thread Update for the jPET-D4
NASA Astrophysics Data System (ADS)
Lam, Chih Fung; Yamaya, Taiga; Obi, Takashi; Yoshida, Eiji; Inadama, Naoko; Shibuya, Kengo; Nishikido, Fumihiko; Murayama, Hideo
2009-02-01
One way to speed-up iterative image reconstruction is by parallel computing with a computer cluster. However, as the number of computing threads increases, parallel efficiency decreases due to network transfer delay. In this paper, we proposed a method to reduce data transfer between computing threads by introducing an intra-thread update. The update factor is collected from each slave thread and a global image is updated as usual in the first K sub-iteration. In the rest of the sub-iterations, the global image is only updated at an interval which is controlled by a parameter L. In between that interval, the intra-thread update is carried out whereby an image update is performed in each slave thread locally. We investigated combinations of K and L parameters based on parallel implementation of RAMLA for the jPET-D4 scanner. Our evaluation used four workstations with a total of 16 slave threads. Each slave thread calculated a different set of LORs which are divided according to ring difference numbers. We assessed image quality of the proposed method with a hotspot simulation phantom. The figure of merit was the full-width-half-maximum of hotspots and the background normalized standard deviation. At an optimum K and L setting, we did not find significant change in the output images. We also applied the proposed method to a Hoffman phantom experiment and found the difference due to intra-thread update was negligible. With the intra-thread update, computation time could be reduced by about 23%.
Price, Anthony N.; Padormo, Francesco; Hajnal, Joseph V.; Malik, Shaihan J.
2017-01-01
Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 +) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a ‘sequence‐level’ optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady‐state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight‐channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single‐channel operation, a mean‐squared‐error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. PMID:28195684
Beqiri, Arian; Price, Anthony N; Padormo, Francesco; Hajnal, Joseph V; Malik, Shaihan J
2017-06-01
Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 + ) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a 'sequence-level' optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady-state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight-channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single-channel operation, a mean-squared-error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. © 2017 The Authors. NMR in Biomedicine published by John Wiley & Sons Ltd.
Hesford, Andrew J; Tillett, Jason C; Astheimer, Jeffrey P; Waag, Robert C
2014-08-01
Accurate and efficient modeling of ultrasound propagation through realistic tissue models is important to many aspects of clinical ultrasound imaging. Simplified problems with known solutions are often used to study and validate numerical methods. Greater confidence in a time-domain k-space method and a frequency-domain fast multipole method is established in this paper by analyzing results for realistic models of the human breast. Models of breast tissue were produced by segmenting magnetic resonance images of ex vivo specimens into seven distinct tissue types. After confirming with histologic analysis by pathologists that the model structures mimicked in vivo breast, the tissue types were mapped to variations in sound speed and acoustic absorption. Calculations of acoustic scattering by the resulting model were performed on massively parallel supercomputer clusters using parallel implementations of the k-space method and the fast multipole method. The efficient use of these resources was confirmed by parallel efficiency and scalability studies using large-scale, realistic tissue models. Comparisons between the temporal and spectral results were performed in representative planes by Fourier transforming the temporal results. An RMS field error less than 3% throughout the model volume confirms the accuracy of the methods for modeling ultrasound propagation through human breast.
Parallel MR imaging: a user's guide.
Glockner, James F; Hu, Houchun H; Stanley, David W; Angelos, Lisa; King, Kevin
2005-01-01
Parallel imaging is a recently developed family of techniques that take advantage of the spatial information inherent in phased-array radiofrequency coils to reduce acquisition times in magnetic resonance imaging. In parallel imaging, the number of sampled k-space lines is reduced, often by a factor of two or greater, thereby significantly shortening the acquisition time. Parallel imaging techniques have only recently become commercially available, and the wide range of clinical applications is just beginning to be explored. The potential clinical applications primarily involve reduction in acquisition time, improved spatial resolution, or a combination of the two. Improvements in image quality can be achieved by reducing the echo train lengths of fast spin-echo and single-shot fast spin-echo sequences. Parallel imaging is particularly attractive for cardiac and vascular applications and will likely prove valuable as 3-T body and cardiovascular imaging becomes part of standard clinical practice. Limitations of parallel imaging include reduced signal-to-noise ratio and reconstruction artifacts. It is important to consider these limitations when deciding when to use these techniques. (c) RSNA, 2005.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cheng, L; Duke University Medical Center, Durham, NC; Fudan University Shanghai Cancer Center, Shanghai
Purpose: To investigate prostate imaging onboard radiation therapy machines using a novel robotic, 49-pinhole Single Photon Emission Computed Tomography (SPECT) system. Methods: Computer-simulation studies were performed for region-of-interest (ROI) imaging using a 49-pinhole SPECT collimator and for broad cross-section imaging using a parallel-hole SPECT collimator. A male XCAT phantom was computersimulated in supine position with one 12mm-diameter tumor added in the prostate. A treatment couch was added to the phantom. Four-minute detector trajectories for imaging a 7cm-diameter-sphere ROI encompassing the tumor were investigated with different parameters, including pinhole focal length, pinhole diameter and trajectory starting angle. Pseudo-random Poisson noise wasmore » included in the simulated projection data, and SPECT images were reconstructed by OSEM with 4 subsets and up to 10 iterations. Images were evaluated by visual inspection, profiles, and Root-Mean- Square-Error (RMSE). Results: The tumor was well visualized above background by the 49-pinhole SPECT system with different pinhole parameters while it was not visible with parallel-hole SPECT imaging. Minimum RMSEs were 0.30 for 49-pinhole imaging and 0.41 for parallelhole imaging. For parallel-hole imaging, the detector trajectory from rightto- left yielded slightly lower RMSEs than that from posterior to anterior. For 49-pinhole imaging, near-minimum RMSEs were maintained over a broader range of OSEM iterations with a 5mm pinhole diameter and 21cm focal length versus a 2mm diameter pinhole and 18cm focal length. The detector with 21cm pinhole focal length had the shortest rotation radius averaged over the trajectory. Conclusion: On-board functional and molecular prostate imaging may be feasible in 4-minute scan times by robotic SPECT. A 49-pinhole SPECT system could improve such imaging as compared to broadcross-section parallel-hole collimated SPECT imaging. Multi-pinhole imaging can be improved by considering pinhole focal length, pinhole diameter, and trajectory starting angle. The project is supported by the NIH grant 5R21-CA156390.« less
Separating figure from ground with a parallel network.
Kienker, P K; Sejnowski, T J; Hinton, G E; Schumacher, L E
1986-01-01
The differentiation of figure from ground plays an important role in the perceptual organization of visual stimuli. The rapidity with which we can discriminate the inside from the outside of a figure suggests that at least this step in the process may be performed in visual cortex by a large number of neurons in several different areas working together in parallel. We have attempted to simulate this collective computation by designing a network of simple processing units that receives two types of information: bottom-up input from the image containing the outlines of a figure, which may be incomplete, and a top-down attentional input that biases one part of the image to be the inside of the figure. No presegmentation of the image was assumed. Two methods for performing the computation were explored: gradient descent, which seeks locally optimal states, and simulated annealing, which attempts to find globally optimal states by introducing noise into the computation. For complete outlines, gradient descent was faster, but the range of input parameters leading to successful performance was very narrow. In contrast, simulated annealing was more robust: it worked over a wider range of attention parameters and a wider range of outlines, including incomplete ones. Our network model is too simplified to serve as a model of human performance, but it does demonstrate that one global property of outlines can be computed through local interactions in a parallel network. Some features of the model, such as the role of noise in escaping from nonglobal optima, may generalize to more realistic models.
Liu, Jingyu; Demirci, Oguz; Calhoun, Vince D.
2009-01-01
Relationships between genomic data and functional brain images are of great interest but require new analysis approaches to integrate the high-dimensional data types. This letter presents an extension of a technique called parallel independent component analysis (paraICA), which enables the joint analysis of multiple modalities including interconnections between them. We extend our earlier work by allowing for multiple interconnections and by providing important overfitting controls. Performance was assessed by simulations under different conditions, and indicated reliable results can be extracted by properly balancing overfitting and underfitting. An application to functional magnetic resonance images and single nucleotide polymorphism array produced interesting findings. PMID:19834575
Liu, Jingyu; Demirci, Oguz; Calhoun, Vince D
2008-01-01
Relationships between genomic data and functional brain images are of great interest but require new analysis approaches to integrate the high-dimensional data types. This letter presents an extension of a technique called parallel independent component analysis (paraICA), which enables the joint analysis of multiple modalities including interconnections between them. We extend our earlier work by allowing for multiple interconnections and by providing important overfitting controls. Performance was assessed by simulations under different conditions, and indicated reliable results can be extracted by properly balancing overfitting and underfitting. An application to functional magnetic resonance images and single nucleotide polymorphism array produced interesting findings.
Medical applications for high-performance computers in SKIF-GRID network.
Zhuchkov, Alexey; Tverdokhlebov, Nikolay
2009-01-01
The paper presents a set of software services for massive mammography image processing by using high-performance parallel computers of SKIF-family which are linked into a service-oriented grid-network. An experience of a prototype system implementation in two medical institutions is also described.
Connectionist Models and Parallelism in High Level Vision.
1985-01-01
GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These
Parallelizing Data-Centric Programs
2013-09-25
results than current techniques, such as ImageWebs [HGO+10], given the same budget of matches performed. 4.2 Scalable Parallel Similarity Search The work...algorithms. 5 Data-Driven Applications in the Cloud In this project, we investigated what happens when data-centric software is moved from expensive custom ...returns appropriate answer tuples. Figure 9 (b) shows the mutual constraint satisfaction that takes place in answering for 122. The intent is that
Emotional stimuli exert parallel effects on attention and memory.
Talmi, Deborah; Ziegler, Marilyne; Hawksworth, Jade; Lalani, Safina; Herman, C Peter; Moscovitch, Morris
2013-01-01
Because emotional and neutral stimuli typically differ on non-emotional dimensions, it has been difficult to determine conclusively which factors underlie the ability of emotional stimuli to enhance immediate long-term memory. Here we induced arousal by varying participants' goals, a method that removes many potential confounds between emotional and non-emotional items. Hungry and sated participants encoded food and clothing images under divided attention conditions. Sated participants attended to and recalled food and clothing images equivalently. Hungry participants performed worse on the concurrent tone-discrimination task when they viewed food relative to clothing images, suggesting enhanced attention to food images, and they recalled more food than clothing images. A follow-up regression analysis of the factors predicting memory for individual pictures revealed that food images had parallel effects on attention and memory in hungry participants, so that enhanced attention to food images did not predict their enhanced memory. We suggest that immediate long-term memory for food is enhanced in the hungry state because hunger leads to more distinctive processing of food images rendering them more accessible during retrieval.
Samsi, Siddharth; Krishnamurthy, Ashok K.; Gurcan, Metin N.
2012-01-01
Follicular Lymphoma (FL) is one of the most common non-Hodgkin Lymphoma in the United States. Diagnosis and grading of FL is based on the review of histopathological tissue sections under a microscope and is influenced by human factors such as fatigue and reader bias. Computer-aided image analysis tools can help improve the accuracy of diagnosis and grading and act as another tool at the pathologist’s disposal. Our group has been developing algorithms for identifying follicles in immunohistochemical images. These algorithms have been tested and validated on small images extracted from whole slide images. However, the use of these algorithms for analyzing the entire whole slide image requires significant changes to the processing methodology since the images are relatively large (on the order of 100k × 100k pixels). In this paper we discuss the challenges involved in analyzing whole slide images and propose potential computational methodologies for addressing these challenges. We discuss the use of parallel computing tools on commodity clusters and compare performance of the serial and parallel implementations of our approach. PMID:22962572
NASA Astrophysics Data System (ADS)
Sourbier, Florent; Operto, Stéphane; Virieux, Jean; Amestoy, Patrick; L'Excellent, Jean-Yves
2009-03-01
This is the first paper in a two-part series that describes a massively parallel code that performs 2D frequency-domain full-waveform inversion of wide-aperture seismic data for imaging complex structures. Full-waveform inversion methods, namely quantitative seismic imaging methods based on the resolution of the full wave equation, are computationally expensive. Therefore, designing efficient algorithms which take advantage of parallel computing facilities is critical for the appraisal of these approaches when applied to representative case studies and for further improvements. Full-waveform modelling requires the resolution of a large sparse system of linear equations which is performed with the massively parallel direct solver MUMPS for efficient multiple-shot simulations. Efficiency of the multiple-shot solution phase (forward/backward substitutions) is improved by using the BLAS3 library. The inverse problem relies on a classic local optimization approach implemented with a gradient method. The direct solver returns the multiple-shot wavefield solutions distributed over the processors according to a domain decomposition driven by the distribution of the LU factors. The domain decomposition of the wavefield solutions is used to compute in parallel the gradient of the objective function and the diagonal Hessian, this latter providing a suitable scaling of the gradient. The algorithm allows one to test different strategies for multiscale frequency inversion ranging from successive mono-frequency inversion to simultaneous multifrequency inversion. These different inversion strategies will be illustrated in the following companion paper. The parallel efficiency and the scalability of the code will also be quantified.
A data distributed parallel algorithm for ray-traced volume rendering
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu; Painter, James S.; Hansen, Charles D.; Krogh, Michael F.
1993-01-01
This paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along with their implementation and performance on the Connection Machine CM-5, and networked workstations. This algorithm distributes both the data and the computations to individual processing units to achieve fast, high-quality rendering of high-resolution data. The volume data, once distributed, is left intact. The processing nodes perform local ray tracing of their subvolume concurrently. No communication between processing units is needed during this locally ray-tracing process. A subimage is generated by each processing unit and the final image is obtained by compositing subimages in the proper order, which can be determined a priori. Test results on both the CM-5 and a group of networked workstations demonstrate the practicality of our rendering algorithm and compositing method.
Design of a Low-Light-Level Image Sensor with On-Chip Sigma-Delta Analog-to- Digital Conversion
NASA Technical Reports Server (NTRS)
Mendis, Sunetra K.; Pain, Bedabrata; Nixon, Robert H.; Fossum, Eric R.
1993-01-01
The design and projected performance of a low-light-level active-pixel-sensor (APS) chip with semi-parallel analog-to-digital (A/D) conversion is presented. The individual elements have been fabricated and tested using MOSIS* 2 micrometer CMOS technology, although the integrated system has not yet been fabricated. The imager consists of a 128 x 128 array of active pixels at a 50 micrometer pitch. Each column of pixels shares a 10-bit A/D converter based on first-order oversampled sigma-delta (Sigma-Delta) modulation. The 10-bit outputs of each converter are multiplexed and read out through a single set of outputs. A semi-parallel architecture is chosen to achieve 30 frames/second operation even at low light levels. The sensor is designed for less than 12 e^- rms noise performance.
Gathmann, Bettina; Schulte, Frank P; Maderwald, Stefan; Pawlikowski, Mirko; Starcke, Katrin; Schäfer, Lena C; Schöler, Tobias; Wolf, Oliver T; Brand, Matthias
2014-03-01
Stress and additional load on the executive system, produced by a parallel working memory task, impair decision making under risk. However, the combination of stress and a parallel task seems to preserve the decision-making performance [e.g., operationalized by the Game of Dice Task (GDT)] from decreasing, probably by a switch from serial to parallel processing. The question remains how the brain manages such demanding decision-making situations. The current study used a 7-tesla magnetic resonance imaging (MRI) system in order to investigate the underlying neural correlates of the interaction between stress (induced by the Trier Social Stress Test), risky decision making (GDT), and a parallel executive task (2-back task) to get a better understanding of those behavioral findings. The results show that on a behavioral level, stressed participants did not show significant differences in task performance. Interestingly, when comparing the stress group (SG) with the control group, the SG showed a greater increase in neural activation in the anterior prefrontal cortex when performing the 2-back task simultaneously with the GDT than when performing each task alone. This brain area is associated with parallel processing. Thus, the results may suggest that in stressful dual-tasking situations, where a decision has to be made when in parallel working memory is demanded, a stronger activation of a brain area associated with parallel processing takes place. The findings are in line with the idea that stress seems to trigger a switch from serial to parallel processing in demanding dual-tasking situations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Einstein, Daniel R.; Kuprat, Andrew P.; Jiao, Xiangmin
2013-01-01
Geometries for organ scale and multiscale simulations of organ function are now routinely derived from imaging data. However, medical images may also contain spatially heterogeneous information other than geometry that are relevant to such simulations either as initial conditions or in the form of model parameters. In this manuscript, we present an algorithm for the efficient and robust mapping of such data to imaging based unstructured polyhedral grids in parallel. We then illustrate the application of our mapping algorithm to three different mapping problems: 1) the mapping of MRI diffusion tensor data to an unstuctured ventricular grid; 2) the mappingmore » of serial cyro-section histology data to an unstructured mouse brain grid; and 3) the mapping of CT-derived volumetric strain data to an unstructured multiscale lung grid. Execution times and parallel performance are reported for each case.« less
Fast encryption of RGB color digital images using a tweakable cellular automaton based schema
NASA Astrophysics Data System (ADS)
Faraoun, Kamel Mohamed
2014-12-01
We propose a new tweakable construction of block-enciphers using second-order reversible cellular automata, and we apply it to encipher RGB-colored images. The proposed construction permits a parallel encryption of the image content by extending the standard definition of a block cipher to take into account a supplementary parameter used as a tweak (nonce) to control the behavior of the cipher from one region of the image to the other, and hence avoid the necessity to use slow sequential encryption's operating modes. The proposed construction defines a flexible pseudorandom permutation that can be used with efficacy to solve the electronic code book problem without the need to a specific sequential mode. Obtained results from various experiments show that the proposed schema achieves high security and execution performances, and enables an interesting mode of selective area decryption due to the parallel character of the approach.
Ardekani, Siamak; Selva, Luis; Sayre, James; Sinha, Usha
2006-11-01
Single-shot echo-planar based diffusion tensor imaging is prone to geometric and intensity distortions. Parallel imaging is a means of reducing these distortions while preserving spatial resolution. A quantitative comparison at 3 T of parallel imaging for diffusion tensor images (DTI) using k-space (generalized auto-calibrating partially parallel acquisitions; GRAPPA) and image domain (sensitivity encoding; SENSE) reconstructions at different acceleration factors, R, is reported here. Images were evaluated using 8 human subjects with repeated scans for 2 subjects to estimate reproducibility. Mutual information (MI) was used to assess the global changes in geometric distortions. The effects of parallel imaging techniques on random noise and reconstruction artifacts were evaluated by placing 26 regions of interest and computing the standard deviation of apparent diffusion coefficient and fractional anisotropy along with the error of fitting the data to the diffusion model (residual error). The larger positive values in mutual information index with increasing R values confirmed the anticipated decrease in distortions. Further, the MI index of GRAPPA sequences for a given R factor was larger than the corresponding mSENSE images. The residual error was lowest in the images acquired without parallel imaging and among the parallel reconstruction methods, the R = 2 acquisitions had the least error. The standard deviation, accuracy, and reproducibility of the apparent diffusion coefficient and fractional anisotropy in homogenous tissue regions showed that GRAPPA acquired with R = 2 had the least amount of systematic and random noise and of these, significant differences with mSENSE, R = 2 were found only for the fractional anisotropy index. Evaluation of the current implementation of parallel reconstruction algorithms identified GRAPPA acquired with R = 2 as optimal for diffusion tensor imaging.
Local spatio-temporal analysis in vision systems
NASA Astrophysics Data System (ADS)
Geisler, Wilson S.; Bovik, Alan; Cormack, Lawrence; Ghosh, Joydeep; Gildeen, David
1994-07-01
The aims of this project are the following: (1) develop a physiologically and psychophysically based model of low-level human visual processing (a key component of which are local frequency coding mechanisms); (2) develop image models and image-processing methods based upon local frequency coding; (3) develop algorithms for performing certain complex visual tasks based upon local frequency representations, (4) develop models of human performance in certain complex tasks based upon our understanding of low-level processing; and (5) develop a computational testbed for implementing, evaluating and visualizing the proposed models and algorithms, using a massively parallel computer. Progress has been substantial on all aims. The highlights include the following: (1) completion of a number of psychophysical and physiological experiments revealing new, systematic and exciting properties of the primate (human and monkey) visual system; (2) further development of image models that can accurately represent the local frequency structure in complex images; (3) near completion in the construction of the Texas Active Vision Testbed; (4) development and testing of several new computer vision algorithms dealing with shape-from-texture, shape-from-stereo, and depth-from-focus; (5) implementation and evaluation of several new models of human visual performance; and (6) evaluation, purchase and installation of a MasPar parallel computer.
Water-fat separation with parallel imaging based on BLADE.
Weng, Dehe; Pan, Yanli; Zhong, Xiaodong; Zhuo, Yan
2013-06-01
Uniform suppression of fat signal is desired in clinical applications. Based on phase differences introduced by different chemical shift frequencies, Dixon method and its variations are used as alternatives of fat saturation methods, which are sensitive to B0 inhomogeneities. Iterative Decomposition of water and fat with Echo Asymmetry and Least squares estimation (IDEAL) separates water and fat images with flexible echo shifting. Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction (PROPELLER, alternatively termed as BLADE), in conjunction with IDEAL, yields Turboprop IDEAL (TP-IDEAL) and allows for decomposition of water and fat signal with motion correction. However, the flexibility of its parameter setting is limited, and the related phase correction is complicated. To address these problems, a novel method, BLADE-Dixon, is proposed in this study. This method used the same polarity readout gradients (fly-back gradients) to acquire in-phase and opposed-phases images, which led to less complicated phase correction and more flexible parameter setting compared to TP-IDEAL. Parallel imaging and undersampling were integrated to reduce scan time. Phantom, orbit, neck and knee images were acquired with BLADE-Dixon. Water-fat separation results were compared to those measured with conventional turbo spin echo (TSE) Dixon and TSE with fat saturation, respectively, to demonstrate the performance of BLADE-Dixon. Copyright © 2013 Elsevier Inc. All rights reserved.
Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C. M. A.; Saltz, Joel
2017-01-01
We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies. PMID:29081725
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce
NASA Astrophysics Data System (ADS)
Chen, Xi; Zhou, Liqing
2015-12-01
With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
Tie Points Extraction for SAR Images Based on Differential Constraints
NASA Astrophysics Data System (ADS)
Xiong, X.; Jin, G.; Xu, Q.; Zhang, H.
2018-04-01
Automatically extracting tie points (TPs) on large-size synthetic aperture radar (SAR) images is still challenging because the efficiency and correct ratio of the image matching need to be improved. This paper proposes an automatic TPs extraction method based on differential constraints for large-size SAR images obtained from approximately parallel tracks, between which the relative geometric distortions are small in azimuth direction and large in range direction. Image pyramids are built firstly, and then corresponding layers of pyramids are matched from the top to the bottom. In the process, the similarity is measured by the normalized cross correlation (NCC) algorithm, which is calculated from a rectangular window with the long side parallel to the azimuth direction. False matches are removed by the differential constrained random sample consensus (DC-RANSAC) algorithm, which appends strong constraints in azimuth direction and weak constraints in range direction. Matching points in the lower pyramid images are predicted with the local bilinear transformation model in range direction. Experiments performed on ENVISAT ASAR and Chinese airborne SAR images validated the efficiency, correct ratio and accuracy of the proposed method.
Fusion Imaging for Procedural Guidance.
Wiley, Brandon M; Eleid, Mackram F; Thaden, Jeremy J
2018-05-01
The field of percutaneous structural heart interventions has grown tremendously in recent years. This growth has fueled the development of new imaging protocols and technologies in parallel to help facilitate these minimally-invasive procedures. Fusion imaging is an exciting new technology that combines the strength of 2 imaging modalities and has the potential to improve procedural planning and the safety of many commonly performed transcatheter procedures. In this review we discuss the basic concepts of fusion imaging along with the relative strengths and weaknesses of static vs dynamic fusion imaging modalities. This review will focus primarily on echocardiographic-fluoroscopic fusion imaging and its application in commonly performed transcatheter structural heart procedures. Copyright © 2017 Sociedad Española de Cardiología. Published by Elsevier España, S.L.U. All rights reserved.
Method for implementation of recursive hierarchical segmentation on parallel computers
NASA Technical Reports Server (NTRS)
Tilton, James C. (Inventor)
2005-01-01
A method, computer readable storage, and apparatus for implementing a recursive hierarchical segmentation algorithm on a parallel computing platform. The method includes setting a bottom level of recursion that defines where a recursive division of an image into sections stops dividing, and setting an intermediate level of recursion where the recursive division changes from a parallel implementation into a serial implementation. The segmentation algorithm is implemented according to the set levels. The method can also include setting a convergence check level of recursion with which the first level of recursion communicates with when performing a convergence check.
Novel 16-channel receive coil array for accelerated upper airway MRI at 3 Tesla.
Kim, Yoon-Chul; Hayes, Cecil E; Narayanan, Shrikanth S; Nayak, Krishna S
2011-06-01
Upper airway MRI can provide a noninvasive assessment of speech and swallowing disorders and sleep apnea. Recent work has demonstrated the value of high-resolution three-dimensional imaging and dynamic two-dimensional imaging and the importance of further improvements in spatio-temporal resolution. The purpose of the study was to describe a novel 16-channel 3 Tesla receive coil that is highly sensitive to the human upper airway and investigate the performance of accelerated upper airway MRI with the coil. In three-dimensional imaging of the upper airway during static posture, 6-fold acceleration is demonstrated using parallel imaging, potentially leading to capturing a whole three-dimensional vocal tract with 1.25 mm isotropic resolution within 9 sec of sustained sound production. Midsagittal spiral parallel imaging of vocal tract dynamics during natural speech production is demonstrated with 2 × 2 mm(2) in-plane spatial and 84 ms temporal resolution. Copyright © 2010 Wiley-Liss, Inc.
Arcmancer: Geodesics and polarized radiative transfer library
NASA Astrophysics Data System (ADS)
Pihajoki, Pauli; Mannerkoski, Matias; Nättilä, Joonas; Johansson, Peter H.
2018-05-01
Arcmancer computes geodesics and performs polarized radiative transfer in user-specified spacetimes. The library supports Riemannian and semi-Riemannian spaces of any dimension and metric; it also supports multiple simultaneous coordinate charts, embedded geometric shapes, local coordinate systems, and automatic parallel propagation. Arcmancer can be used to solve various problems in numerical geometry, such as solving the curve equation of motion using adaptive integration with configurable tolerances and differential equations along precomputed curves. It also provides support for curves with an arbitrary acceleration term and generic tools for generating ray initial conditions and performing parallel computation over the image, among other tools.
A Simple Application of Compressed Sensing to Further Accelerate Partially Parallel Imaging
Miao, Jun; Guo, Weihong; Narayan, Sreenath; Wilson, David L.
2012-01-01
Compressed Sensing (CS) and partially parallel imaging (PPI) enable fast MR imaging by reducing the amount of k-space data required for reconstruction. Past attempts to combine these two have been limited by the incoherent sampling requirement of CS, since PPI routines typically sample on a regular (coherent) grid. Here, we developed a new method, “CS+GRAPPA,” to overcome this limitation. We decomposed sets of equidistant samples into multiple random subsets. Then, we reconstructed each subset using CS, and averaging the results to get a final CS k-space reconstruction. We used both a standard CS, and an edge and joint-sparsity guided CS reconstruction. We tested these intermediate results on both synthetic and real MR phantom data, and performed a human observer experiment to determine the effectiveness of decomposition, and to optimize the number of subsets. We then used these CS reconstructions to calibrate the GRAPPA complex coil weights. In vivo parallel MR brain and heart data sets were used. An objective image quality evaluation metric, Case-PDM, was used to quantify image quality. Coherent aliasing and noise artifacts were significantly reduced using two decompositions. More decompositions further reduced coherent aliasing and noise artifacts but introduced blurring. However, the blurring was effectively minimized using our new edge and joint-sparsity guided CS using two decompositions. Numerical results on parallel data demonstrated that the combined method greatly improved image quality as compared to standard GRAPPA, on average halving Case-PDM scores across a range of sampling rates. The proposed technique allowed the same Case-PDM scores as standard GRAPPA, using about half the number of samples. We conclude that the new method augments GRAPPA by combining it with CS, allowing CS to work even when the k-space sampling pattern is equidistant. PMID:22902065
Wilson, Robert L.; Frisz, Jessica F.; Hanafin, William P.; Carpenter, Kevin J.; Hutcheon, Ian D.; Weber, Peter K.; Kraft, Mary L.
2014-01-01
The local abundance of specific lipid species near a membrane protein is hypothesized to influence the protein’s activity. The ability to simultaneously image the distributions of specific protein and lipid species in the cell membrane would facilitate testing these hypotheses. Recent advances in imaging the distribution of cell membrane lipids with mass spectrometry have created the desire for membrane protein probes that can be simultaneously imaged with isotope labeled lipids. Such probes would enable conclusive tests of whether specific proteins co-localize with particular lipid species. Here, we describe the development of fluorine-functionalized colloidal gold immunolabels that facilitate the detection and imaging of specific proteins in parallel with lipids in the plasma membrane using high-resolution SIMS performed with a NanoSIMS. First, we developed a method to functionalize colloidal gold nanoparticles with a partially fluorinated mixed monolayer that permitted NanoSIMS detection and rendered the functionalized nanoparticles dispersible in aqueous buffer. Then, to allow for selective protein labeling, we attached the fluorinated colloidal gold nanoparticles to the nonbinding portion of antibodies. By combining these functionalized immunolabels with metabolic incorporation of stable isotopes, we demonstrate that influenza hemagglutinin and cellular lipids can be imaged in parallel using NanoSIMS. These labels enable a general approach to simultaneously imaging specific proteins and lipids with high sensitivity and lateral resolution, which may be used to evaluate predictions of protein co-localization with specific lipid species. PMID:22284327
Bayer image parallel decoding based on GPU
NASA Astrophysics Data System (ADS)
Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua
2012-11-01
In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Nguyen, Tuan-Anh; Nakib, Amir; Nguyen, Huy-Nam
2016-06-01
The Non-local means denoising filter has been established as gold standard for image denoising problem in general and particularly in medical imaging due to its efficiency. However, its computation time limited its applications in real world application, especially in medical imaging. In this paper, a distributed version on parallel hybrid architecture is proposed to solve the computation time problem and a new method to compute the filters' coefficients is also proposed, where we focused on the implementation and the enhancement of filters' parameters via taking the neighborhood of the current voxel more accurately into account. In terms of implementation, our key contribution consists in reducing the number of shared memory accesses. The different tests of the proposed method were performed on the brain-web database for different levels of noise. Performances and the sensitivity were quantified in terms of speedup, peak signal to noise ratio, execution time, the number of floating point operations. The obtained results demonstrate the efficiency of the proposed method. Moreover, the implementation is compared to that of other techniques, recently published in the literature. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Coil Compression for Accelerated Imaging with Cartesian Sampling
Zhang, Tao; Pauly, John M.; Vasanawala, Shreyas S.; Lustig, Michael
2012-01-01
MRI using receiver arrays with many coil elements can provide high signal-to-noise ratio and increase parallel imaging acceleration. At the same time, the growing number of elements results in larger datasets and more computation in the reconstruction. This is of particular concern in 3D acquisitions and in iterative reconstructions. Coil compression algorithms are effective in mitigating this problem by compressing data from many channels into fewer virtual coils. In Cartesian sampling there often are fully sampled k-space dimensions. In this work, a new coil compression technique for Cartesian sampling is presented that exploits the spatially varying coil sensitivities in these non-subsampled dimensions for better compression and computation reduction. Instead of directly compressing in k-space, coil compression is performed separately for each spatial location along the fully-sampled directions, followed by an additional alignment process that guarantees the smoothness of the virtual coil sensitivities. This important step provides compatibility with autocalibrating parallel imaging techniques. Its performance is not susceptible to artifacts caused by a tight imaging fieldof-view. High quality compression of in-vivo 3D data from a 32 channel pediatric coil into 6 virtual coils is demonstrated. PMID:22488589
Huiberts, Laura M; Smolders, Karin C H J; de Kort, Yvonne A W
2016-10-01
This study investigated diurnal non-image forming (NIF) effects of illuminance level on physiological arousal in parallel to NIF effects on vigilance and working memory performance. We employed a counterbalanced within-subjects design in which thirty-nine participants (mean age=21.2; SD=2.1; 11 male) completed three 90-min sessions (165 vs. 600lx vs. 1700lx at eye level) either in the morning (N=18) or afternoon (N=21). During each session, participants completed four measurement blocks (incl. one baseline block) each consisting of a 10-min Psychomotor Vigilance Task (PVT) and a Backwards Digit-Span Task (BDST) including easy trials (4-6 digits) and difficult trials (7-8 digits). Heart rate (HR), skin conductance level (SCL) and systolic blood pressure (SBP) were measured continuously. The results revealed significant improvements in performance on the BDST difficult trials under 1700lx vs. 165lx (p=0.01), while illuminance level did not affect performance on the PVT and BDST easy trials. Illuminance level impacted HR and SCL, but not SBP. In the afternoon sessions, HR was significantly higher under 1700lx vs. 165lx during PVT performance (p=0.05), while during BDST performance, HR was only slightly higher under 600 vs. 165lx (p=0.06). SCL was significantly higher under 1700lx vs. 165lx during performance on BDST easy trials (p=0.02) and showed similar, but nonsignificant trends during the PVT and BDST difficult trials. Although both physiology and performance were affected by illuminance level, no consistent pattern emerged with respect to parallel changes in physiology and performance. Rather, physiology and performance seemed to be affected independently, via unique pathways. Copyright © 2016. Published by Elsevier Inc.
Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems
Wang, Kaibo; Huai, Yin; Lee, Rubao; Wang, Fusheng; Zhang, Xiaodong; Saltz, Joel H.
2012-01-01
As an important application of spatial databases in pathology imaging analysis, cross-comparing the spatial boundaries of a huge amount of segmented micro-anatomic objects demands extremely data- and compute-intensive operations, requiring high throughput at an affordable cost. However, the performance of spatial database systems has not been satisfactory since their implementations of spatial operations cannot fully utilize the power of modern parallel hardware. In this paper, we provide a customized software solution that exploits GPUs and multi-core CPUs to accelerate spatial cross-comparison in a cost-effective way. Our solution consists of an efficient GPU algorithm and a pipelined system framework with task migration support. Extensive experiments with real-world data sets demonstrate the effectiveness of our solution, which improves the performance of spatial cross-comparison by over 18 times compared with a parallelized spatial database approach. PMID:23355955
NASA Astrophysics Data System (ADS)
O'Connor, A. S.; Justice, B.; Harris, A. T.
2013-12-01
Graphics Processing Units (GPUs) are high-performance multiple-core processors capable of very high computational speeds and large data throughput. Modern GPUs are inexpensive and widely available commercially. These are general-purpose parallel processors with support for a variety of programming interfaces, including industry standard languages such as C. GPU implementations of algorithms that are well suited for parallel processing can often achieve speedups of several orders of magnitude over optimized CPU codes. Significant improvements in speeds for imagery orthorectification, atmospheric correction, target detection and image transformations like Independent Components Analsyis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x - 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA based GPU processing frame work for accelerating ENVI and IDL processes that can best take advantage of parallelization. Testing Exelis VIS has performed shows that orthorectification can take as long as two hours with a WorldView1 35,0000 x 35,000 pixel image. With GPU orthorecification, the same orthorectification process takes three minutes. By speeding up image processing, imagery can successfully be used by first responders, scientists making rapid discoveries with near real time data, and provides an operational component to data centers needing to quickly process and disseminate data.
Benkert, Thomas; Tian, Ye; Huang, Chenchan; DiBella, Edward V R; Chandarana, Hersh; Feng, Li
2018-07-01
Golden-angle radial sparse parallel (GRASP) MRI reconstruction requires gridding and regridding to transform data between radial and Cartesian k-space. These operations are repeatedly performed in each iteration, which makes the reconstruction computationally demanding. This work aimed to accelerate GRASP reconstruction using self-calibrating GRAPPA operator gridding (GROG) and to validate its performance in clinical imaging. GROG is an alternative gridding approach based on parallel imaging, in which k-space data acquired on a non-Cartesian grid are shifted onto a Cartesian k-space grid using information from multicoil arrays. For iterative non-Cartesian image reconstruction, GROG is performed only once as a preprocessing step. Therefore, the subsequent iterative reconstruction can be performed directly in Cartesian space, which significantly reduces computational burden. Here, a framework combining GROG with GRASP (GROG-GRASP) is first optimized and then compared with standard GRASP reconstruction in 22 prostate patients. GROG-GRASP achieved approximately 4.2-fold reduction in reconstruction time compared with GRASP (∼333 min versus ∼78 min) while maintaining image quality (structural similarity index ≈ 0.97 and root mean square error ≈ 0.007). Visual image quality assessment by two experienced radiologists did not show significant differences between the two reconstruction schemes. With a graphics processing unit implementation, image reconstruction time can be further reduced to approximately 14 min. The GRASP reconstruction can be substantially accelerated using GROG. This framework is promising toward broader clinical application of GRASP and other iterative non-Cartesian reconstruction methods. Magn Reson Med 80:286-293, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Advances in Parallel Computing and Databases for Digital Pathology in Cancer Research
2016-11-13
these technologies and how we have used them in the past. We are interested in learning more about the needs of clinical pathologists as we continue to...such as image processing and correlation. Further, High Performance Computing (HPC) paradigms such as the Message Passing Interface (MPI) have been...Defense for Research and Engineering. such as pMatlab [4], or bcMPI [5] can significantly reduce the need for deep knowledge of parallel computing. In
Evaluation of a 3D point cloud tetrahedral tomographic reconstruction method
Pereira, N F; Sitek, A
2011-01-01
Tomographic reconstruction on an irregular grid may be superior to reconstruction on a regular grid. This is achieved through an appropriate choice of the image space model, the selection of an optimal set of points and the use of any available prior information during the reconstruction process. Accordingly, a number of reconstruction-related parameters must be optimized for best performance. In this work, a 3D point cloud tetrahedral mesh reconstruction method is evaluated for quantitative tasks. A linear image model is employed to obtain the reconstruction system matrix and five point generation strategies are studied. The evaluation is performed using the recovery coefficient, as well as voxel- and template-based estimates of bias and variance measures, computed over specific regions in the reconstructed image. A similar analysis is performed for regular grid reconstructions that use voxel basis functions. The maximum likelihood expectation maximization reconstruction algorithm is used. For the tetrahedral reconstructions, of the five point generation methods that are evaluated, three use image priors. For evaluation purposes, an object consisting of overlapping spheres with varying activity is simulated. The exact parallel projection data of this object are obtained analytically using a parallel projector, and multiple Poisson noise realizations of these exact data are generated and reconstructed using the different point generation strategies. The unconstrained nature of point placement in some of the irregular mesh-based reconstruction strategies has superior activity recovery for small, low-contrast image regions. The results show that, with an appropriately generated set of mesh points, the irregular grid reconstruction methods can out-perform reconstructions on a regular grid for mathematical phantoms, in terms of the performance measures evaluated. PMID:20736496
Evaluation of a 3D point cloud tetrahedral tomographic reconstruction method
NASA Astrophysics Data System (ADS)
Pereira, N. F.; Sitek, A.
2010-09-01
Tomographic reconstruction on an irregular grid may be superior to reconstruction on a regular grid. This is achieved through an appropriate choice of the image space model, the selection of an optimal set of points and the use of any available prior information during the reconstruction process. Accordingly, a number of reconstruction-related parameters must be optimized for best performance. In this work, a 3D point cloud tetrahedral mesh reconstruction method is evaluated for quantitative tasks. A linear image model is employed to obtain the reconstruction system matrix and five point generation strategies are studied. The evaluation is performed using the recovery coefficient, as well as voxel- and template-based estimates of bias and variance measures, computed over specific regions in the reconstructed image. A similar analysis is performed for regular grid reconstructions that use voxel basis functions. The maximum likelihood expectation maximization reconstruction algorithm is used. For the tetrahedral reconstructions, of the five point generation methods that are evaluated, three use image priors. For evaluation purposes, an object consisting of overlapping spheres with varying activity is simulated. The exact parallel projection data of this object are obtained analytically using a parallel projector, and multiple Poisson noise realizations of these exact data are generated and reconstructed using the different point generation strategies. The unconstrained nature of point placement in some of the irregular mesh-based reconstruction strategies has superior activity recovery for small, low-contrast image regions. The results show that, with an appropriately generated set of mesh points, the irregular grid reconstruction methods can out-perform reconstructions on a regular grid for mathematical phantoms, in terms of the performance measures evaluated.
High-performance computing in image registration
NASA Astrophysics Data System (ADS)
Zanin, Michele; Remondino, Fabio; Dalla Mura, Mauro
2012-10-01
Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e.g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical Processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.
Seismic imaging using finite-differences and parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ober, C.C.
1997-12-31
A key to reducing the risks and costs of associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Prestack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar wave equation using finite differences. As part of an ongoing ACTI project funded by the US Department of Energy, a finite difference, 3-D prestack, depth migration code has been developed. The goal of this work is to demonstrate that massively parallel computersmore » can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite difference, prestack, depth migration practical for oil and gas exploration. Several problems had to be addressed to get an efficient code for the Intel Paragon. These include efficient I/O, efficient parallel tridiagonal solves, and high single-node performance. Furthermore, to provide portable code the author has been restricted to the use of high-level programming languages (C and Fortran) and interprocessor communications using MPI. He has been using the SUNMOS operating system, which has affected many of his programming decisions. He will present images created from two verification datasets (the Marmousi Model and the SEG/EAEG 3D Salt Model). Also, he will show recent images from real datasets, and point out locations of improved imaging. Finally, he will discuss areas of current research which will hopefully improve the image quality and reduce computational costs.« less
Tomographic image reconstruction using the cell broadband engine (CBE) general purpose hardware
NASA Astrophysics Data System (ADS)
Knaup, Michael; Steckmann, Sven; Bockenbach, Olivier; Kachelrieß, Marc
2007-02-01
Tomographic image reconstruction, such as the reconstruction of CT projection values, of tomosynthesis data, PET or SPECT events, is computational very demanding. In filtered backprojection as well as in iterative reconstruction schemes, the most time-consuming steps are forward- and backprojection which are often limited by the memory bandwidth. Recently, a novel general purpose architecture optimized for distributed computing became available: the Cell Broadband Engine (CBE). Its eight synergistic processing elements (SPEs) currently allow for a theoretical performance of 192 GFlops (3 GHz, 8 units, 4 floats per vector, 2 instructions, multiply and add, per clock). To maximize image reconstruction speed we modified our parallel-beam and perspective backprojection algorithms which are highly optimized for standard PCs, and optimized the code for the CBE processor. 1-3 In addition, we implemented an optimized perspective forwardprojection on the CBE which allows us to perform statistical image reconstructions like the ordered subset convex (OSC) algorithm. 4 Performance was measured using simulated data with 512 projections per rotation and 5122 detector elements. The data were backprojected into an image of 512 3 voxels using our PC-based approaches and the new CBE- based algorithms. Both the PC and the CBE timings were scaled to a 3 GHz clock frequency. On the CBE, we obtain total reconstruction times of 4.04 s for the parallel backprojection, 13.6 s for the perspective backprojection and 192 s for a complete OSC reconstruction, consisting of one initial Feldkamp reconstruction, followed by 4 OSC iterations.
Towards Exascale Seismic Imaging and Inversion
NASA Astrophysics Data System (ADS)
Tromp, J.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Lei, W.; Ruan, Y.
2015-12-01
Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns tied to obtaining optimum performance. Several issues are currently being investigated by the HPC community. These include energy consumption, fault resilience, scalability of the current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. In this presentation, we focus on the last three issues. In the context of seismic imaging and inversion, in particular for simulations based on adjoint methods, workflows are well defined.They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts comprising it. The usual approach is to speedup the purely computational parts based on code optimization in order to reach higher FLOPS and better memory management. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). Parallel I/O libraries, namely HDF5 and ADIOS, are used to drastically reduce the cost of disk access. Parallel visualization tools, such as VisIt, are able to take advantage of ADIOS metadata to extract features and display massive datasets. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the imaging process with the integration of scientific workflow management tools, specifically Pegasus.
Studies in optical parallel processing. [All optical and electro-optic approaches
NASA Technical Reports Server (NTRS)
Lee, S. H.
1978-01-01
Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.
Simultaneous orthogonal plane imaging.
Mickevicius, Nikolai J; Paulson, Eric S
2017-11-01
Intrafraction motion can result in a smearing of planned external beam radiation therapy dose distributions, resulting in an uncertainty in dose actually deposited in tissue. The purpose of this paper is to present a pulse sequence that is capable of imaging a moving target at a high frame rate in two orthogonal planes simultaneously for MR-guided radiotherapy. By balancing the zero gradient moment on all axes, slices in two orthogonal planes may be spatially encoded simultaneously. The orthogonal slice groups may be acquired with equal or nonequal echo times. A Cartesian spoiled gradient echo simultaneous orthogonal plane imaging (SOPI) sequence was tested in phantom and in vivo. Multiplexed SOPI acquisitions were performed in which two parallel slices were imaged along two orthogonal axes simultaneously. An autocalibrating phase-constrained 2D-SENSE-GRAPPA (generalized autocalibrating partially parallel acquisition) algorithm was implemented to reconstruct the multiplexed data. SOPI images without intraslice motion artifacts were reconstructed at a maximum frame rate of 8.16 Hz. The 2D-SENSE-GRAPPA reconstruction separated the parallel slices aliased along each orthogonal axis. The high spatiotemporal resolution provided by SOPI has the potential to be beneficial for intrafraction motion management during MR-guided radiation therapy or other MRI-guided interventions. Magn Reson Med 78:1700-1710, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Parallel asynchronous systems and image processing algorithms
NASA Technical Reports Server (NTRS)
Coon, D. D.; Perera, A. G. U.
1989-01-01
A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.
Integrated parallel reception, excitation, and shimming (iPRES).
Han, Hui; Song, Allen W; Truong, Trong-Kha
2013-07-01
To develop a new concept for a hardware platform that enables integrated parallel reception, excitation, and shimming. This concept uses a single coil array rather than separate arrays for parallel excitation/reception and B0 shimming. It relies on a novel design that allows a radiofrequency current (for excitation/reception) and a direct current (for B0 shimming) to coexist independently in the same coil. Proof-of-concept B0 shimming experiments were performed with a two-coil array in a phantom, whereas B0 shimming simulations were performed with a 48-coil array in the human brain. Our experiments show that individually optimized direct currents applied in each coil can reduce the B0 root-mean-square error by 62-81% and minimize distortions in echo-planar images. The simulations show that dynamic shimming with the 48-coil integrated parallel reception, excitation, and shimming array can reduce the B0 root-mean-square error in the prefrontal and temporal regions by 66-79% as compared with static second-order spherical harmonic shimming and by 12-23% as compared with dynamic shimming with a 48-coil conventional shim array. Our results demonstrate the feasibility of the integrated parallel reception, excitation, and shimming concept to perform parallel excitation/reception and B0 shimming with a unified coil system as well as its promise for in vivo applications. Copyright © 2013 Wiley Periodicals, Inc.
Processing large remote sensing image data sets on Beowulf clusters
Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail
2003-01-01
High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.
Spectral X-Ray Diffraction using a 6 Megapixel Photon Counting Array Detector.
Muir, Ryan D; Pogranichniy, Nicholas R; Muir, J Lewis; Sullivan, Shane Z; Battaile, Kevin P; Mulichak, Anne M; Toth, Scott J; Keefe, Lisa J; Simpson, Garth J
2015-03-12
Pixel-array array detectors allow single-photon counting to be performed on a massively parallel scale, with several million counting circuits and detectors in the array. Because the number of photoelectrons produced at the detector surface depends on the photon energy, these detectors offer the possibility of spectral imaging. In this work, a statistical model of the instrument response is used to calibrate the detector on a per-pixel basis. In turn, the calibrated sensor was used to perform separation of dual-energy diffraction measurements into two monochromatic images. Targeting applications include multi-wavelength diffraction to aid in protein structure determination and X-ray diffraction imaging.
NASA Astrophysics Data System (ADS)
Pozzi, Paolo; Wilding, Dean; Soloviev, Oleg; Vdovin, Gleb; Verhaegen, Michel
2018-02-01
In this work, we present a new confocal laser scanning microscope capable to perform sensorless wavefront optimization in real time. The device is a parallelized laser scanning microscope in which the excitation light is structured in a lattice of spots by a spatial light modulator, while a deformable mirror provides aberration correction and scanning. A binary DMD is positioned in an image plane of the detection optical path, acting as a dynamic array of reflective confocal pinholes, images by a high performance cmos camera. A second camera detects images of the light rejected by the pinholes for sensorless aberration correction.
Spectral x-ray diffraction using a 6 megapixel photon counting array detector
NASA Astrophysics Data System (ADS)
Muir, Ryan D.; Pogranichniy, Nicholas R.; Muir, J. Lewis; Sullivan, Shane Z.; Battaile, Kevin P.; Mulichak, Anne M.; Toth, Scott J.; Keefe, Lisa J.; Simpson, Garth J.
2015-03-01
Pixel-array array detectors allow single-photon counting to be performed on a massively parallel scale, with several million counting circuits and detectors in the array. Because the number of photoelectrons produced at the detector surface depends on the photon energy, these detectors offer the possibility of spectral imaging. In this work, a statistical model of the instrument response is used to calibrate the detector on a per-pixel basis. In turn, the calibrated sensor was used to perform separation of dual-energy diffraction measurements into two monochromatic images. Targeting applications include multi-wavelength diffraction to aid in protein structure determination and X-ray diffraction imaging.
3D hyperpolarized C-13 EPI with calibrationless parallel imaging
NASA Astrophysics Data System (ADS)
Gordon, Jeremy W.; Hansen, Rie B.; Shin, Peter J.; Feng, Yesu; Vigneron, Daniel B.; Larson, Peder E. Z.
2018-04-01
With the translation of metabolic MRI with hyperpolarized 13C agents into the clinic, imaging approaches will require large volumetric FOVs to support clinical applications. Parallel imaging techniques will be crucial to increasing volumetric scan coverage while minimizing RF requirements and temporal resolution. Calibrationless parallel imaging approaches are well-suited for this application because they eliminate the need to acquire coil profile maps or auto-calibration data. In this work, we explored the utility of a calibrationless parallel imaging method (SAKE) and corresponding sampling strategies to accelerate and undersample hyperpolarized 13C data using 3D blipped EPI acquisitions and multichannel receive coils, and demonstrated its application in a human study of [1-13C]pyruvate metabolism.
Park, Jong Kang; Rowlands, Christopher J; So, Peter T C
2017-01-01
Temporal focusing multiphoton microscopy is a technique for performing highly parallelized multiphoton microscopy while still maintaining depth discrimination. While the conventional wide-field configuration for temporal focusing suffers from sub-optimal axial resolution, line scanning temporal focusing, implemented here using a digital micromirror device (DMD), can provide substantial improvement. The DMD-based line scanning temporal focusing technique dynamically trades off the degree of parallelization, and hence imaging speed, for axial resolution, allowing performance parameters to be adapted to the experimental requirements. We demonstrate this new instrument in calibration specimens and in biological specimens, including a mouse kidney slice.
Park, Jong Kang; Rowlands, Christopher J.; So, Peter T. C.
2017-01-01
Temporal focusing multiphoton microscopy is a technique for performing highly parallelized multiphoton microscopy while still maintaining depth discrimination. While the conventional wide-field configuration for temporal focusing suffers from sub-optimal axial resolution, line scanning temporal focusing, implemented here using a digital micromirror device (DMD), can provide substantial improvement. The DMD-based line scanning temporal focusing technique dynamically trades off the degree of parallelization, and hence imaging speed, for axial resolution, allowing performance parameters to be adapted to the experimental requirements. We demonstrate this new instrument in calibration specimens and in biological specimens, including a mouse kidney slice. PMID:29387484
BPF-type region-of-interest reconstruction for parallel translational computed tomography.
Wu, Weiwen; Yu, Hengyong; Wang, Shaoyu; Liu, Fenglin
2017-01-01
The objective of this study is to present and test a new ultra-low-cost linear scan based tomography architecture. Similar to linear tomosynthesis, the source and detector are translated in opposite directions and the data acquisition system targets on a region-of-interest (ROI) to acquire data for image reconstruction. This kind of tomographic architecture was named parallel translational computed tomography (PTCT). In previous studies, filtered backprojection (FBP)-type algorithms were developed to reconstruct images from PTCT. However, the reconstructed ROI images from truncated projections have severe truncation artefact. In order to overcome this limitation, we in this study proposed two backprojection filtering (BPF)-type algorithms named MP-BPF and MZ-BPF to reconstruct ROI images from truncated PTCT data. A weight function is constructed to deal with data redundancy for multi-linear translations modes. Extensive numerical simulations are performed to evaluate the proposed MP-BPF and MZ-BPF algorithms for PTCT in fan-beam geometry. Qualitative and quantitative results demonstrate that the proposed BPF-type algorithms cannot only more accurately reconstruct ROI images from truncated projections but also generate high-quality images for the entire image support in some circumstances.
2014-05-01
fusion, space and astrophysical plasmas, but still the general picture can be presented quite well with the fluid approach [6, 7]. The microscopic...purpose computing CPU for algorithms where processing of large blocks of data is done in parallel. The reason for that is the GPU’s highly effective...parallel structure. Most of the image and video processing computations involve heavy matrix and vector op- erations over large amounts of data and
NASA Astrophysics Data System (ADS)
Zhao, Shaoshuai; Ni, Chen; Cao, Jing; Li, Zhengqiang; Chen, Xingfeng; Ma, Yan; Yang, Leiku; Hou, Weizhen; Qie, Lili; Ge, Bangyu; Liu, Li; Xing, Jin
2018-03-01
The remote sensing image is usually polluted by atmosphere components especially like aerosol particles. For the quantitative remote sensing applications, the radiative transfer model based atmospheric correction is used to get the reflectance with decoupling the atmosphere and surface by consuming a long computational time. The parallel computing is a solution method for the temporal acceleration. The parallel strategy which uses multi-CPU to work simultaneously is designed to do atmospheric correction for a multispectral remote sensing image. The parallel framework's flow and the main parallel body of atmospheric correction are described. Then, the multispectral remote sensing image of the Chinese Gaofen-2 satellite is used to test the acceleration efficiency. When the CPU number is increasing from 1 to 8, the computational speed is also increasing. The biggest acceleration rate is 6.5. Under the 8 CPU working mode, the whole image atmospheric correction costs 4 minutes.
Parallel Implementation of a Frozen Flow Based Wavefront Reconstructor
NASA Astrophysics Data System (ADS)
Nagy, J.; Kelly, K.
2013-09-01
Obtaining high resolution images of space objects from ground based telescopes is challenging, often requiring the use of a multi-frame blind deconvolution (MFBD) algorithm to remove blur caused by atmospheric turbulence. In order for an MFBD algorithm to be effective, it is necessary to obtain a good initial estimate of the wavefront phase. Although wavefront sensors work well in low turbulence situations, they are less effective in high turbulence, such as when imaging in daylight, or when imaging objects that are close to the Earth's horizon. One promising approach, which has been shown to work very well in high turbulence settings, uses a frozen flow assumption on the atmosphere to capture the inherent temporal correlations present in consecutive frames of wavefront data. Exploiting these correlations can lead to more accurate estimation of the wavefront phase, and the associated PSF, which leads to more effective MFBD algorithms. However, with the current serial implementation, the approach can be prohibitively expensive in situations when it is necessary to use a large number of frames. In this poster we describe a parallel implementation that overcomes this constraint. The parallel implementation exploits sparse matrix computations, and uses the Trilinos package developed at Sandia National Laboratories. Trilinos provides a variety of core mathematical software for parallel architectures that have been designed using high quality software engineering practices, The package is open source, and portable to a variety of high-performance computing architectures.
NASA Astrophysics Data System (ADS)
Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh
2015-07-01
This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
High-density fiber-optic DNA random microsphere array.
Ferguson, J A; Steemers, F J; Walt, D R
2000-11-15
A high-density fiber-optic DNA microarray sensor was developed to monitor multiple DNA sequences in parallel. Microarrays were prepared by randomly distributing DNA probe-functionalized 3.1-microm-diameter microspheres in an array of wells etched in a 500-microm-diameter optical imaging fiber. Registration of the microspheres was performed using an optical encoding scheme and a custom-built imaging system. Hybridization was visualized using fluorescent-labeled DNA targets with a detection limit of 10 fM. Hybridization times of seconds are required for nanomolar target concentrations, and analysis is performed in minutes.
Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D; Volz, Kerstin
2017-06-01
We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space. Copyright © 2017 Elsevier B.V. All rights reserved.
Embedded Implementation of VHR Satellite Image Segmentation
Li, Chao; Balla-Arabé, Souleymane; Ginhac, Dominique; Yang, Fan
2016-01-01
Processing and analysis of Very High Resolution (VHR) satellite images provide a mass of crucial information, which can be used for urban planning, security issues or environmental monitoring. However, they are computationally expensive and, thus, time consuming, while some of the applications, such as natural disaster monitoring and prevention, require high efficiency performance. Fortunately, parallel computing techniques and embedded systems have made great progress in recent years, and a series of massively parallel image processing devices, such as digital signal processors or Field Programmable Gate Arrays (FPGAs), have been made available to engineers at a very convenient price and demonstrate significant advantages in terms of running-cost, embeddability, power consumption flexibility, etc. In this work, we designed a texture region segmentation method for very high resolution satellite images by using the level set algorithm and the multi-kernel theory in a high-abstraction C environment and realize its register-transfer level implementation with the help of a new proposed high-level synthesis-based design flow. The evaluation experiments demonstrate that the proposed design can produce high quality image segmentation with a significant running-cost advantage. PMID:27240370
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
NASA Astrophysics Data System (ADS)
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-01-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
NASA Astrophysics Data System (ADS)
Xue, Xinwei; Cheryauka, Arvi; Tubbs, David
2006-03-01
CT imaging in interventional and minimally-invasive surgery requires high-performance computing solutions that meet operational room demands, healthcare business requirements, and the constraints of a mobile C-arm system. The computational requirements of clinical procedures using CT-like data are increasing rapidly, mainly due to the need for rapid access to medical imagery during critical surgical procedures. The highly parallel nature of Radon transform and CT algorithms enables embedded computing solutions utilizing a parallel processing architecture to realize a significant gain of computational intensity with comparable hardware and program coding/testing expenses. In this paper, using a sample 2D and 3D CT problem, we explore the programming challenges and the potential benefits of embedded computing using commodity hardware components. The accuracy and performance results obtained on three computational platforms: a single CPU, a single GPU, and a solution based on FPGA technology have been analyzed. We have shown that hardware-accelerated CT image reconstruction can be achieved with similar levels of noise and clarity of feature when compared to program execution on a CPU, but gaining a performance increase at one or more orders of magnitude faster. 3D cone-beam or helical CT reconstruction and a variety of volumetric image processing applications will benefit from similar accelerations.
NASA Astrophysics Data System (ADS)
Ying, Jia-ju; Chen, Yu-dan; Liu, Jie; Wu, Dong-sheng; Lu, Jun
2016-10-01
The maladjustment of photoelectric instrument binocular optical axis parallelism will affect the observe effect directly. A binocular optical axis parallelism digital calibration system is designed. On the basis of the principle of optical axis binocular photoelectric instrument calibration, the scheme of system is designed, and the binocular optical axis parallelism digital calibration system is realized, which include four modules: multiband parallel light tube, optical axis translation, image acquisition system and software system. According to the different characteristics of thermal infrared imager and low-light-level night viewer, different algorithms is used to localize the center of the cross reticle. And the binocular optical axis parallelism calibration is realized for calibrating low-light-level night viewer and thermal infrared imager.
A permanent MRI magnet for magic angle imaging having its field parallel to the poles.
McGinley, John V M; Ristic, Mihailo; Young, Ian R
2016-10-01
A novel design of open permanent magnet is presented, in which the magnetic field is oriented parallel to the planes of its poles. The paper describes the methods whereby such a magnet can be designed with a field homogeneity suitable for Magnetic Resonance Imaging (MRI). Its primary purpose is to take advantage of the Magic Angle effect in MRI of human extremities, particularly the knee joint, by being capable of rotating the direction of the main magnetic field B0 about two orthogonal axes around a stationary subject and achieve all possible angulations. The magnet comprises a parallel pair of identical profiled arrays of permanent magnets backed by a flat steel yoke such that access in lateral directions is practical. The paper describes the detailed optimization procedure from a target 150mm DSV to the achievement of a measured uniform field over a 130mm DSV. Actual performance data of the manufactured magnet, including shimming and a sample image, is presented. The overall magnet system mounting mechanism is presented, including two orthogonal axes of rotation of the magnet about its isocentre. Copyright © 2016 Elsevier Inc. All rights reserved.
IceT users' guide and reference.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.
2011-01-01
The Image Composition Engine for Tiles (IceT) is a high-performance sort-last parallel rendering library. In addition to providing accelerated rendering for a standard display, IceT provides the unique ability to generate images for tiled displays. The overall resolution of the display may be several times larger than any viewport that may be rendered by a single machine. This document is an overview of the user interface to IceT.
Three-Dimensional Weighting in Cone Beam FBP Reconstruction and Its Transformation Over Geometries.
Tang, Shaojie; Huang, Kuidong; Cheng, Yunyong; Niu, Tianye; Tang, Xiangyang
2018-06-01
With substantially increased number of detector rows in multidetector CT (MDCT), axial scan with projection data acquired along a circular source trajectory has become the method-of-choice in increasing clinical applications. Recognizing the practical relevance of image reconstruction directly from the projection data acquired in the native cone beam (CB) geometry, especially in scenarios wherein the most achievable in-plane resolution is desirable, we present a three-dimensional (3-D) weighted CB-FBP algorithm in such geometry in this paper. We start the algorithm's derivation in the cone-parallel geometry. Via changing of variables, taking the Jacobian into account and making heuristic and empirical assumptions, we arrive at the formulas for 3-D weighted image reconstruction in the native CB geometry. Using the projection data simulated by computer and acquired by an MDCT scanner, we evaluate and verify performance of the proposed algorithm for image reconstruction directly from projection data acquired in the native CB geometry. The preliminary data show that the proposed algorithm performs as well as the 3-D weighted CB-FBP algorithm in the cone-parallel geometry. The proposed algorithm is anticipated to find its utility in extensive clinical and preclinical applications wherein the reconstruction of images in the native CB geometry, i.e., the geometry for data acquisition, is of relevance.
A 2D MTF approach to evaluate and guide dynamic imaging developments.
Chao, Tzu-Cheng; Chung, Hsiao-Wen; Hoge, W Scott; Madore, Bruno
2010-02-01
As the number and complexity of partially sampled dynamic imaging methods continue to increase, reliable strategies to evaluate performance may prove most useful. In the present work, an analytical framework to evaluate given reconstruction methods is presented. A perturbation algorithm allows the proposed evaluation scheme to perform robustly without requiring knowledge about the inner workings of the method being evaluated. A main output of the evaluation process consists of a two-dimensional modulation transfer function, an easy-to-interpret visual rendering of a method's ability to capture all combinations of spatial and temporal frequencies. Approaches to evaluate noise properties and artifact content at all spatial and temporal frequencies are also proposed. One fully sampled phantom and three fully sampled cardiac cine datasets were subsampled (R = 4 and 8) and reconstructed with the different methods tested here. A hybrid method, which combines the main advantageous features observed in our assessments, was proposed and tested in a cardiac cine application, with acceleration factors of 3.5 and 6.3 (skip factors of 4 and 8, respectively). This approach combines features from methods such as k-t sensitivity encoding, unaliasing by Fourier encoding the overlaps in the temporal dimension-sensitivity encoding, generalized autocalibrating partially parallel acquisition, sensitivity profiles from an array of coils for encoding and reconstruction in parallel, self, hybrid referencing with unaliasing by Fourier encoding the overlaps in the temporal dimension and generalized autocalibrating partially parallel acquisition, and generalized autocalibrating partially parallel acquisition-enhanced sensitivity maps for sensitivity encoding reconstructions.
Specialized Computer Systems for Environment Visualization
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Bashkov, Evgeniy A.; Zori, Sergii A.
2018-06-01
The need for real time image generation of landscapes arises in various fields as part of tasks solved by virtual and augmented reality systems, as well as geographic information systems. Such systems provide opportunities for collecting, storing, analyzing and graphically visualizing geographic data. Algorithmic and hardware software tools for increasing the realism and efficiency of the environment visualization in 3D visualization systems are proposed. This paper discusses a modified path tracing algorithm with a two-level hierarchy of bounding volumes and finding intersections with Axis-Aligned Bounding Box. The proposed algorithm eliminates the branching and hence makes the algorithm more suitable to be implemented on the multi-threaded CPU and GPU. A modified ROAM algorithm is used to solve the qualitative visualization of reliefs' problems and landscapes. The algorithm is implemented on parallel systems—cluster and Compute Unified Device Architecture-networks. Results show that the implementation on MPI clusters is more efficient than Graphics Processing Unit/Graphics Processing Clusters and allows real-time synthesis. The organization and algorithms of the parallel GPU system for the 3D pseudo stereo image/video synthesis are proposed. With realizing possibility analysis on a parallel GPU-architecture of each stage, 3D pseudo stereo synthesis is performed. An experimental prototype of a specialized hardware-software system 3D pseudo stereo imaging and video was developed on the CPU/GPU. The experimental results show that the proposed adaptation of 3D pseudo stereo imaging to the architecture of GPU-systems is efficient. Also it accelerates the computational procedures of 3D pseudo-stereo synthesis for the anaglyph and anamorphic formats of the 3D stereo frame without performing optimization procedures. The acceleration is on average 11 and 54 times for test GPUs.
An automated workflow for parallel processing of large multiview SPIM recordings
Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel
2016-01-01
Summary: Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. Availability and implementation: The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT. The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows. Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction. Contact: schmied@mpi-cbg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26628585
An automated workflow for parallel processing of large multiview SPIM recordings.
Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel
2016-04-01
Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction : schmied@mpi-cbg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
The parallel algorithm for the 2D discrete wavelet transform
NASA Astrophysics Data System (ADS)
Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel
2018-04-01
The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
Satellite on-board real-time SAR processor prototype
NASA Astrophysics Data System (ADS)
Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François
2017-11-01
A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and size are reviewed.
A New Joint-Blade SENSE Reconstruction for Accelerated PROPELLER MRI
Lyu, Mengye; Liu, Yilong; Xie, Victor B.; Feng, Yanqiu; Guo, Hua; Wu, Ed X.
2017-01-01
PROPELLER technique is widely used in MRI examinations for being motion insensitive, but it prolongs scan time and is restricted mainly to T2 contrast. Parallel imaging can accelerate PROPELLER and enable more flexible contrasts. Here, we propose a multi-step joint-blade (MJB) SENSE reconstruction to reduce the noise amplification in parallel imaging accelerated PROPELLER. MJB SENSE utilizes the fact that PROPELLER blades contain sharable information and blade-combined images can serve as regularization references. It consists of three steps. First, conventional blade-combined images are obtained using the conventional simple single-blade (SSB) SENSE, which reconstructs each blade separately. Second, the blade-combined images are employed as regularization for blade-wise noise reduction. Last, with virtual high-frequency data resampled from the previous step, all blades are jointly reconstructed to form the final images. Simulations were performed to evaluate the proposed MJB SENSE for noise reduction and motion correction. MJB SENSE was also applied to both T2-weighted and T1-weighted in vivo brain data. Compared to SSB SENSE, MJB SENSE greatly reduced the noise amplification at various acceleration factors, leading to increased image SNR in all simulation and in vivo experiments, including T1-weighted imaging with short echo trains. Furthermore, it preserved motion correction capability and was computationally efficient. PMID:28205602
A New Joint-Blade SENSE Reconstruction for Accelerated PROPELLER MRI.
Lyu, Mengye; Liu, Yilong; Xie, Victor B; Feng, Yanqiu; Guo, Hua; Wu, Ed X
2017-02-16
PROPELLER technique is widely used in MRI examinations for being motion insensitive, but it prolongs scan time and is restricted mainly to T2 contrast. Parallel imaging can accelerate PROPELLER and enable more flexible contrasts. Here, we propose a multi-step joint-blade (MJB) SENSE reconstruction to reduce the noise amplification in parallel imaging accelerated PROPELLER. MJB SENSE utilizes the fact that PROPELLER blades contain sharable information and blade-combined images can serve as regularization references. It consists of three steps. First, conventional blade-combined images are obtained using the conventional simple single-blade (SSB) SENSE, which reconstructs each blade separately. Second, the blade-combined images are employed as regularization for blade-wise noise reduction. Last, with virtual high-frequency data resampled from the previous step, all blades are jointly reconstructed to form the final images. Simulations were performed to evaluate the proposed MJB SENSE for noise reduction and motion correction. MJB SENSE was also applied to both T2-weighted and T1-weighted in vivo brain data. Compared to SSB SENSE, MJB SENSE greatly reduced the noise amplification at various acceleration factors, leading to increased image SNR in all simulation and in vivo experiments, including T1-weighted imaging with short echo trains. Furthermore, it preserved motion correction capability and was computationally efficient.
Robson, Philip M; Grant, Aaron K; Madhuranthakam, Ananth J; Lattanzi, Riccardo; Sodickson, Daniel K; McKenzie, Charles A
2008-10-01
Parallel imaging reconstructions result in spatially varying noise amplification characterized by the g-factor, precluding conventional measurements of noise from the final image. A simple Monte Carlo based method is proposed for all linear image reconstruction algorithms, which allows measurement of signal-to-noise ratio and g-factor and is demonstrated for SENSE and GRAPPA reconstructions for accelerated acquisitions that have not previously been amenable to such assessment. Only a simple "prescan" measurement of noise amplitude and correlation in the phased-array receiver, and a single accelerated image acquisition are required, allowing robust assessment of signal-to-noise ratio and g-factor. The "pseudo multiple replica" method has been rigorously validated in phantoms and in vivo, showing excellent agreement with true multiple replica and analytical methods. This method is universally applicable to the parallel imaging reconstruction techniques used in clinical applications and will allow pixel-by-pixel image noise measurements for all parallel imaging strategies, allowing quantitative comparison between arbitrary k-space trajectories, image reconstruction, or noise conditioning techniques. (c) 2008 Wiley-Liss, Inc.
Robot acting on moving bodies (RAMBO): Preliminary results
NASA Technical Reports Server (NTRS)
Davis, Larry S.; Dementhon, Daniel; Bestul, Thor; Ziavras, Sotirios; Srinivasan, H. V.; Siddalingaiah, Madju; Harwood, David
1989-01-01
A robot system called RAMBO is being developed. It is equipped with a camera, which, given a sequence of simple tasks, can perform these tasks on a moving object. RAMBO is given a complete geometric model of the object. A low level vision module extracts and groups characteristic features in images of the object. The positions of the object are determined in a sequence of images, and a motion estimate of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative locations nearby the object sufficient for achieving the tasks. More specifically, low level vision uses parallel algorithms for image enchancement by symmetric nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by sector filtering. The object pose estimation is a Hough transform method accumulating position hypotheses obtained by matching triples of image features (corners) to triples of model features. To maximize computing speed, the estimate of the position in space of a triple of features is obtained by decomposing its perspective view into a product of rotations and a scaled orthographic projection. This allows the use of 2-D lookup tables at each stage of the decomposition. The position hypotheses for each possible match of model feature triples and image feature triples are calculated in parallel. Trajectory planning combines heuristic and dynamic programming techniques. Then trajectories are created using parametric cubic splines between initial and goal trajectories. All the parallel algorithms run on a Connection Machine CM-2 with 16K processors.
Real-time FPGA architectures for computer vision
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel; Torres-Huitzil, Cesar
2000-03-01
This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low level image processing. The FPGA-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on a dedicated VLSI to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real time performance are discussed. Some results are presented and discussed.
Enhancing Image Processing Performance for PCID in a Heterogeneous Network of Multi-core Processors
2009-09-01
TFLOPS of Playstation 3 (PS3) nodes with IBM Cell Broadband Engine multi-cores and 15 dual-quad Xeon head nodes. The interconnect fabric includes... 4 3. INFORMATION MANAGEMENT FOR PARALLELIZATION AND...STREAMING............................................................. 7 4 . RESULTS
Automated inspection of hot steel slabs
Martin, R.J.
1985-12-24
The disclosure relates to a real time digital image enhancement system for performing the image enhancement segmentation processing required for a real time automated system for detecting and classifying surface imperfections in hot steel slabs. The system provides for simultaneous execution of edge detection processing and intensity threshold processing in parallel on the same image data produced by a sensor device such as a scanning camera. The results of each process are utilized to validate the results of the other process and a resulting image is generated that contains only corresponding segmentation that is produced by both processes. 5 figs.
NASA Astrophysics Data System (ADS)
Raphael, David T.; McIntee, Diane; Tsuruda, Jay S.; Colletti, Patrick; Tatevossian, Raymond; Frazier, James
2006-03-01
We explored multiple image processing approaches by which to display the segmented adult brachial plexus in a three-dimensional manner. Magnetic resonance neurography (MRN) 1.5-Tesla scans with STIR sequences, which preferentially highlight nerves, were performed in adult volunteers to generate high-resolution raw images. Using multiple software programs, the raw MRN images were then manipulated so as to achieve segmentation of plexus neurovascular structures, which were incorporated into three different visualization schemes: rotating upper thoracic girdle skeletal frames, dynamic fly-throughs parallel to the clavicle, and thin slab volume-rendered composite projections.
Automated inspection of hot steel slabs
Martin, Ronald J.
1985-01-01
The disclosure relates to a real time digital image enhancement system for performing the image enhancement segmentation processing required for a real time automated system for detecting and classifying surface imperfections in hot steel slabs. The system provides for simultaneous execution of edge detection processing and intensity threshold processing in parallel on the same image data produced by a sensor device such as a scanning camera. The results of each process are utilized to validate the results of the other process and a resulting image is generated that contains only corresponding segmentation that is produced by both processes.
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.
1989-01-01
Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992
Hernández, Moisés; Guerrero, Ginés D.; Cecilia, José M.; García, José M.; Inuggi, Alberto; Jbabdi, Saad; Behrens, Timothy E. J.; Sotiropoulos, Stamatios N.
2013-01-01
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation. PMID:23658616
Parallel Guessing: A Strategy for High-Speed Computation
1984-09-19
for using additional hardware to obtain higher processing speed). In this paper we argue that parallel guessing for image analysis is a useful...from a true solution, or the correctness of a guess, can be readily checked. We review image - analysis algorithms having a parallel guessing or
Wavelet Transforms in Parallel Image Processing
1994-01-27
NUMBER OF PAGES Object Segmentation, Texture Segmentation, Image Compression, Image 137 Halftoning , Neural Network, Parallel Algorithms, 2D and 3D...Vector Quantization of Wavelet Transform Coefficients ........ ............................. 57 B.1.f Adaptive Image Halftoning based on Wavelet...application has been directed to the adaptive image halftoning . The gray information at a pixel, including its gray value and gradient, is represented by
NASA Astrophysics Data System (ADS)
García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun
2016-10-01
This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.
Integrated microfluidic devices for combinatorial cell-based assays.
Yu, Zeta Tak For; Kamei, Ken-ichiro; Takahashi, Hiroko; Shu, Chengyi Jenny; Wang, Xiaopu; He, George Wenfu; Silverman, Robert; Radu, Caius G; Witte, Owen N; Lee, Ki-Bum; Tseng, Hsian-Rong
2009-06-01
The development of miniaturized cell culture platforms for performing parallel cultures and combinatorial assays is important in cell biology from the single-cell level to the system level. In this paper we developed an integrated microfluidic cell-culture platform, Cell-microChip (Cell-microChip), for parallel analyses of the effects of microenvironmental cues (i.e., culture scaffolds) on different mammalian cells and their cellular responses to external stimuli. As a model study, we demonstrated the ability of culturing and assaying several mammalian cells, such as NIH 3T3 fibroblast, B16 melanoma and HeLa cell lines, in a parallel way. For functional assays, first we tested drug-induced apoptotic responses from different cell lines. As a second functional assay, we performed "on-chip" transfection of a reporter gene encoding an enhanced green fluorescent protein (EGFP) followed by live-cell imaging of transcriptional activation of cyclooxygenase 2 (Cox-2) expression. Collectively, our Cell-microChip approach demonstrated the capability to carry out parallel operations and the potential to further integrate advanced functions and applications in the broader space of combinatorial chemistry and biology.
Integrated microfluidic devices for combinatorial cell-based assays
Yu, Zeta Tak For; Kamei, Ken-ichiro; Takahashi, Hiroko; Shu, Chengyi Jenny; Wang, Xiaopu; He, George Wenfu; Silverman, Robert
2010-01-01
The development of miniaturized cell culture platforms for performing parallel cultures and combinatorial assays is important in cell biology from the single-cell level to the system level. In this paper we developed an integrated microfluidic cell-culture platform, Cell-microChip (Cell-μChip), for parallel analyses of the effects of microenvir-onmental cues (i.e., culture scaffolds) on different mammalian cells and their cellular responses to external stimuli. As a model study, we demonstrated the ability of culturing and assaying several mammalian cells, such as NIH 3T3 fibro-blast, B16 melanoma and HeLa cell lines, in a parallel way. For functional assays, first we tested drug-induced apoptotic responses from different cell lines. As a second functional assay, we performed "on-chip" transfection of a reporter gene encoding an enhanced green fluorescent protein (EGFP) followed by live-cell imaging of transcriptional activation of cyclooxygenase 2 (Cox-2) expression. Collectively, our Cell-μChip approach demonstrated the capability to carry out parallel operations and the potential to further integrate advanced functions and applications in the broader space of combinatorial chemistry and biology. PMID:19130244
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Massively parallel information processing systems for space applications
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1979-01-01
NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
NASA Technical Reports Server (NTRS)
Dorband, John E.
1987-01-01
Generating graphics to faithfully represent information can be a computationally intensive task. A way of using the Massively Parallel Processor to generate images by ray tracing is presented. This technique uses sort computation, a method of performing generalized routing interspersed with computation on a single-instruction-multiple-data (SIMD) computer.
Detecting Planar Surfaces in Outdoor Urban Environments
2008-09-01
coplanar or parallel scene points and lines. Sturm and Maybank (18) perform 3D reconstruction given user-provided coplanarity, perpendicularity, and... Maybank , S. J. A method for intactive 3d reconstruction of piercewise planar objects from single images. in BMVC, 1999, 265–274 [19] Schaffalitzky, F
Improved parallel image reconstruction using feature refinement.
Cheng, Jing; Jia, Sen; Ying, Leslie; Liu, Yuanyuan; Wang, Shanshan; Zhu, Yanjie; Li, Ye; Zou, Chao; Liu, Xin; Liang, Dong
2018-07-01
The aim of this study was to develop a novel feature refinement MR reconstruction method from highly undersampled multichannel acquisitions for improving the image quality and preserve more detail information. The feature refinement technique, which uses a feature descriptor to pick up useful features from residual image discarded by sparsity constrains, is applied to preserve the details of the image in compressed sensing and parallel imaging in MRI (CS-pMRI). The texture descriptor and structure descriptor recognizing different types of features are required for forming the feature descriptor. Feasibility of the feature refinement was validated using three different multicoil reconstruction methods on in vivo data. Experimental results show that reconstruction methods with feature refinement improve the quality of reconstructed image and restore the image details more accurately than the original methods, which is also verified by the lower values of the root mean square error and high frequency error norm. A simple and effective way to preserve more useful detailed information in CS-pMRI is proposed. This technique can effectively improve the reconstruction quality and has superior performance in terms of detail preservation compared with the original version without feature refinement. Magn Reson Med 80:211-223, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Implementation of total focusing method for phased array ultrasonic imaging on FPGA
NASA Astrophysics Data System (ADS)
Guo, JianQiang; Li, Xi; Gao, Xiaorong; Wang, Zeyong; Zhao, Quanke
2015-02-01
This paper describes a multi-FPGA imaging system dedicated for the real-time imaging using the Total Focusing Method (TFM) and Full Matrix Capture (FMC). The system was entirely described using Verilog HDL language and implemented on Altera Stratix IV GX FPGA development board. The whole algorithm process is to: establish a coordinate system of image and divide it into grids; calculate the complete acoustic distance of array element between transmitting array element and receiving array element, and transform it into index value; then index the sound pressure values from ROM and superimpose sound pressure values to get pixel value of one focus point; and calculate the pixel values of all focus points to get the final imaging. The imaging result shows that this algorithm has high SNR of defect imaging. And FPGA with parallel processing capability can provide high speed performance, so this system can provide the imaging interface, with complete function and good performance.
Accelerated Adaptive MGS Phase Retrieval
NASA Technical Reports Server (NTRS)
Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang
2011-01-01
The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Klix, Sabrina; Hezel, Fabian; Fuchs, Katharina; Ruff, Jan; Dieringer, Matthias A.; Niendorf, Thoralf
2014-01-01
Purpose Design, validation and application of an accelerated fast spin-echo (FSE) variant that uses a split-echo approach for self-calibrated parallel imaging. Methods For self-calibrated, split-echo FSE (SCSE-FSE), extra displacement gradients were incorporated into FSE to decompose odd and even echo groups which were independently phase encoded to derive coil sensitivity maps, and to generate undersampled data (reduction factor up to R = 3). Reference and undersampled data were acquired simultaneously. SENSE reconstruction was employed. Results The feasibility of SCSE-FSE was demonstrated in phantom studies. Point spread function performance of SCSE-FSE was found to be competitive with traditional FSE variants. The immunity of SCSE-FSE for motion induced mis-registration between reference and undersampled data was shown using a dynamic left ventricular model and cardiac imaging. The applicability of black blood prepared SCSE-FSE for cardiac imaging was demonstrated in healthy volunteers including accelerated multi-slice per breath-hold imaging and accelerated high spatial resolution imaging. Conclusion SCSE-FSE obviates the need of external reference scans for SENSE reconstructed parallel imaging with FSE. SCSE-FSE reduces the risk for mis-registration between reference scans and accelerated acquisitions. SCSE-FSE is feasible for imaging of the heart and of large cardiac vessels but also meets the needs of brain, abdominal and liver imaging. PMID:24728341
Optical/MRI Multimodality Molecular Imaging
NASA Astrophysics Data System (ADS)
Ma, Lixin; Smith, Charles; Yu, Ping
2007-03-01
Multimodality molecular imaging that combines anatomical and functional information has shown promise in development of tumor-targeted pharmaceuticals for cancer detection or therapy. We present a new multimodality imaging technique that combines fluorescence molecular tomography (FMT) and magnetic resonance imaging (MRI) for in vivo molecular imaging of preclinical tumor models. Unlike other optical/MRI systems, the new molecular imaging system uses parallel phase acquisition based on heterodyne principle. The system has a higher accuracy of phase measurements, reduced noise bandwidth, and an efficient modulation of the fluorescence diffuse density waves. Fluorescent Bombesin probes were developed for targeting breast cancer cells and prostate cancer cells. Tissue phantom and small animal experiments were performed for calibration of the imaging system and validation of the targeting probes.
Hinton-Yates, Denise P; Cury, Ricardo C; Wald, Lawrence L; Wiggins, Graham C; Keil, Boris; Seethmaraju, Ravi; Gangadharamurthy, Dakshinamurthy; Ogilvy, Christopher S; Dai, Guangping; Houser, Stuart L; Stone, James R; Furie, Karen L
2007-10-01
The aim of this article is to evaluate 3.0 T magnetic resonance imaging for characterization of vessel morphology and plaque composition. Emphasis is placed on early and moderate stages of carotid atherosclerosis, where increases in signal-to-noise (SNR) and contrast-to-noise (CNR) ratios compared with 1.5 T are sought. Comparison of in vivo 3.0 T imaging to histopathology is performed for validation. Parallel acceleration methods applied with an 8-channel carotid array are investigated as well as higher field ex vivo imaging to explore even further gains. The overall endeavor is to improve prospective assessment of atherosclerosis stage and stability for reduction of atherothrombotic event risk. A total of 10 male and female subjects ranging in age from 22 to 72 years (5 healthy and 5 with cardiovascular disease) participated. Custom-built array coils were used with endogenous and exogenous multicontrast bright and black-blood protocols for 3.0 T carotid imaging. Comparisons were performed to 1.5 T, and ex vivo plaque was stained with hematoxylin and eosin for histology. Imaging (9.4 T) was also performed on intact specimens. The factor of 2 gain in signal-to-noise SNR is realized compared with 1.5 T along with improved wall-lumen and plaque component CNR. Post-contrast black-blood imaging within 5-10 minutes of gadolinium injection is optimal for detection of the necrotic lipid component. In a preliminary 18-month follow-up study, this method provided measurement of a 50% reduction in lipid content with minimal change in plaque size in a subject receiving aggressive statin therapy. Parallel imaging applied with signal averaging further improves 3.0 T black-blood vessel wall imaging. The use of 3.0 T for carotid plaque imaging has demonstrated increases in SNR and CNR compared with 1.5 T. Quantitative prospective studies of moderate and early plaques are feasible at 3.0 T. Continued improvements in coil arrays, 3-dimensional pulse sequences, and the use of novel molecular imaging agents implemented at high field will further improve magnetic resonance plaque characterization.
High performance computing environment for multidimensional image analysis
Rao, A Ravishankar; Cecchi, Guillermo A; Magnasco, Marcelo
2007-01-01
Background The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications. Results We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478× speedup. Conclusion Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets. PMID:17634099
High performance computing environment for multidimensional image analysis.
Rao, A Ravishankar; Cecchi, Guillermo A; Magnasco, Marcelo
2007-07-10
The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications. We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478x speedup. Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets.
Garty, Guy; Chen, Youhua; Turner, Helen C; Zhang, Jian; Lyulko, Oleksandra V; Bertucci, Antonella; Xu, Yanping; Wang, Hongliang; Simaan, Nabil; Randers-Pehrson, Gerhard; Lawrence Yao, Y; Brenner, David J
2011-08-01
Over the past five years the Center for Minimally Invasive Radiation Biodosimetry at Columbia University has developed the Rapid Automated Biodosimetry Tool (RABiT), a completely automated, ultra-high throughput biodosimetry workstation. This paper describes recent upgrades and reliability testing of the RABiT. The RABiT analyses fingerstick-derived blood samples to estimate past radiation exposure or to identify individuals exposed above or below a cut-off dose. Through automated robotics, lymphocytes are extracted from fingerstick blood samples into filter-bottomed multi-well plates. Depending on the time since exposure, the RABiT scores either micronuclei or phosphorylation of the histone H2AX, in an automated robotic system, using filter-bottomed multi-well plates. Following lymphocyte culturing, fixation and staining, the filter bottoms are removed from the multi-well plates and sealed prior to automated high-speed imaging. Image analysis is performed online using dedicated image processing hardware. Both the sealed filters and the images are archived. We have developed a new robotic system for lymphocyte processing, making use of an upgraded laser power and parallel processing of four capillaries at once. This system has allowed acceleration of lymphocyte isolation, the main bottleneck of the RABiT operation, from 12 to 2 sec/sample. Reliability tests have been performed on all robotic subsystems. Parallel handling of multiple samples through the use of dedicated, purpose-built, robotics and high speed imaging allows analysis of up to 30,000 samples per day.
Garty, Guy; Chen, Youhua; Turner, Helen; Zhang, Jian; Lyulko, Oleksandra; Bertucci, Antonella; Xu, Yanping; Wang, Hongliang; Simaan, Nabil; Randers-Pehrson, Gerhard; Yao, Y. Lawrence; Brenner, David J.
2011-01-01
Purpose Over the past five years the Center for Minimally Invasive Radiation Biodosimetry at Columbia University has developed the Rapid Automated Biodosimetry Tool (RABiT), a completely automated, ultra-high throughput biodosimetry workstation. This paper describes recent upgrades and reliability testing of the RABiT. Materials and methods The RABiT analyzes fingerstick-derived blood samples to estimate past radiation exposure or to identify individuals exposed above or below a cutoff dose. Through automated robotics, lymphocytes are extracted from fingerstick blood samples into filter-bottomed multi-well plates. Depending on the time since exposure, the RABiT scores either micronuclei or phosphorylation of the histone H2AX, in an automated robotic system, using filter-bottomed multi-well plates. Following lymphocyte culturing, fixation and staining, the filter bottoms are removed from the multi-well plates and sealed prior to automated high-speed imaging. Image analysis is performed online using dedicated image processing hardware. Both the sealed filters and the images are archived. Results We have developed a new robotic system for lymphocyte processing, making use of an upgraded laser power and parallel processing of four capillaries at once. This system has allowed acceleration of lymphocyte isolation, the main bottleneck of the RABiT operation, from 12 to 2 sec/sample. Reliability tests have been performed on all robotic subsystems. Conclusions Parallel handling of multiple samples through the use of dedicated, purpose-built, robotics and high speed imaging allows analysis of up to 30,000 samples per day. PMID:21557703
NASA Astrophysics Data System (ADS)
He, Xiaojun; Ma, Haotong; Luo, Chuanxin
2016-10-01
The optical multi-aperture imaging system is an effective way to magnify the aperture and increase the resolution of telescope optical system, the difficulty of which lies in detecting and correcting of co-phase error. This paper presents a method based on stochastic parallel gradient decent algorithm (SPGD) to correct the co-phase error. Compared with the current method, SPGD method can avoid detecting the co-phase error. This paper analyzed the influence of piston error and tilt error on image quality based on double-aperture imaging system, introduced the basic principle of SPGD algorithm, and discuss the influence of SPGD algorithm's key parameters (the gain coefficient and the disturbance amplitude) on error control performance. The results show that SPGD can efficiently correct the co-phase error. The convergence speed of the SPGD algorithm is improved with the increase of gain coefficient and disturbance amplitude, but the stability of the algorithm reduced. The adaptive gain coefficient can solve this problem appropriately. This paper's results can provide the theoretical reference for the co-phase error correction of the multi-aperture imaging system.
Lyu, Tao; Yao, Suying; Nie, Kaiming; Xu, Jiangtao
2014-11-17
A 12-bit high-speed column-parallel two-step single-slope (SS) analog-to-digital converter (ADC) for CMOS image sensors is proposed. The proposed ADC employs a single ramp voltage and multiple reference voltages, and the conversion is divided into coarse phase and fine phase to improve the conversion rate. An error calibration scheme is proposed to correct errors caused by offsets among the reference voltages. The digital-to-analog converter (DAC) used for the ramp generator is based on the split-capacitor array with an attenuation capacitor. Analysis of the DAC's linearity performance versus capacitor mismatch and parasitic capacitance is presented. A prototype 1024 × 32 Time Delay Integration (TDI) CMOS image sensor with the proposed ADC architecture has been fabricated in a standard 0.18 μm CMOS process. The proposed ADC has average power consumption of 128 μW and a conventional rate 6 times higher than the conventional SS ADC. A high-quality image, captured at the line rate of 15.5 k lines/s, shows that the proposed ADC is suitable for high-speed CMOS image sensors.
Assessing clutter reduction in parallel coordinates using image processing techniques
NASA Astrophysics Data System (ADS)
Alhamaydh, Heba; Alzoubi, Hussein; Almasaeid, Hisham
2018-01-01
Information visualization has appeared as an important research field for multidimensional data and correlation analysis in recent years. Parallel coordinates (PCs) are one of the popular techniques to visual high-dimensional data. A problem with the PCs technique is that it suffers from crowding, a clutter which hides important data and obfuscates the information. Earlier research has been conducted to reduce clutter without loss in data content. We introduce the use of image processing techniques as an approach for assessing the performance of clutter reduction techniques in PC. We use histogram analysis as our first measure, where the mean feature of the color histograms of the possible alternative orderings of coordinates for the PC images is calculated and compared. The second measure is the extracted contrast feature from the texture of PC images based on gray-level co-occurrence matrices. The results show that the best PC image is the one that has the minimal mean value of the color histogram feature and the maximal contrast value of the texture feature. In addition to its simplicity, the proposed assessment method has the advantage of objectively assessing alternative ordering of PC visualization.
Samant, Sanjiv S; Xia, Junyi; Muyan-Ozcelik, Pinar; Owens, John D
2008-08-01
The advent of readily available temporal imaging or time series volumetric (4D) imaging has become an indispensable component of treatment planning and adaptive radiotherapy (ART) at many radiotherapy centers. Deformable image registration (DIR) is also used in other areas of medical imaging, including motion corrected image reconstruction. Due to long computation time, clinical applications of DIR in radiation therapy and elsewhere have been limited and consequently relegated to offline analysis. With the recent advances in hardware and software, graphics processing unit (GPU) based computing is an emerging technology for general purpose computation, including DIR, and is suitable for highly parallelized computing. However, traditional general purpose computation on the GPU is limited because the constraints of the available programming platforms. As well, compared to CPU programming, the GPU currently has reduced dedicated processor memory, which can limit the useful working data set for parallelized processing. We present an implementation of the demons algorithm using the NVIDIA 8800 GTX GPU and the new CUDA programming language. The GPU performance will be compared with single threading and multithreading CPU implementations on an Intel dual core 2.4 GHz CPU using the C programming language. CUDA provides a C-like language programming interface, and allows for direct access to the highly parallel compute units in the GPU. Comparisons for volumetric clinical lung images acquired using 4DCT were carried out. Computation time for 100 iterations in the range of 1.8-13.5 s was observed for the GPU with image size ranging from 2.0 x 10(6) to 14.2 x 10(6) pixels. The GPU registration was 55-61 times faster than the CPU for the single threading implementation, and 34-39 times faster for the multithreading implementation. For CPU based computing, the computational time generally has a linear dependence on image size for medical imaging data. Computational efficiency is characterized in terms of time per megapixels per iteration (TPMI) with units of seconds per megapixels per iteration (or spmi). For the demons algorithm, our CPU implementation yielded largely invariant values of TPMI. The mean TPMIs were 0.527 spmi and 0.335 spmi for the single threading and multithreading cases, respectively, with <2% variation over the considered image data range. For GPU computing, we achieved TPMI =0.00916 spmi with 3.7% variation, indicating optimized memory handling under CUDA. The paradigm of GPU based real-time DIR opens up a host of clinical applications for medical imaging.
Nana, Roger; Hu, Xiaoping
2010-01-01
k-space-based reconstruction in parallel imaging depends on the reconstruction kernel setting, including its support. An optimal choice of the kernel depends on the calibration data, coil geometry and signal-to-noise ratio, as well as the criterion used. In this work, data consistency, imposed by the shift invariance requirement of the kernel, is introduced as a goodness measure of k-space-based reconstruction in parallel imaging and demonstrated. Data consistency error (DCE) is calculated as the sum of squared difference between the acquired signals and their estimates obtained based on the interpolation of the estimated missing data. A resemblance between DCE and the mean square error in the reconstructed image was found, demonstrating DCE's potential as a metric for comparing or choosing reconstructions. When used for selecting the kernel support for generalized autocalibrating partially parallel acquisition (GRAPPA) reconstruction and the set of frames for calibration as well as the kernel support in temporal GRAPPA reconstruction, DCE led to improved images over existing methods. Data consistency error is efficient to evaluate, robust for selecting reconstruction parameters and suitable for characterizing and optimizing k-space-based reconstruction in parallel imaging.
Parallel Processing Systems for Passive Ranging During Helicopter Flight
NASA Technical Reports Server (NTRS)
Sridhar, Bavavar; Suorsa, Raymond E.; Showman, Robert D. (Technical Monitor)
1994-01-01
The complexity of rotorcraft missions involving operations close to the ground result in high pilot workload. In order to allow a pilot time to perform mission-oriented tasks, sensor-aiding and automation of some of the guidance and control functions are highly desirable. Images from an electro-optical sensor provide a covert way of detecting objects in the flight path of a low-flying helicopter. Passive ranging consists of processing a sequence of images using techniques based on optical low computation and recursive estimation. The passive ranging algorithm has to extract obstacle information from imagery at rates varying from five to thirty or more frames per second depending on the helicopter speed. We have implemented and tested the passive ranging algorithm off-line using helicopter-collected images. However, the real-time data and computation requirements of the algorithm are beyond the capability of any off-the-shelf microprocessor or digital signal processor. This paper describes the computational requirements of the algorithm and uses parallel processing technology to meet these requirements. Various issues in the selection of a parallel processing architecture are discussed and four different computer architectures are evaluated regarding their suitability to process the algorithm in real-time. Based on this evaluation, we conclude that real-time passive ranging is a realistic goal and can be achieved with a short time.
Automation of a Wave-Optics Simulation and Image Post-Processing Package on Riptide
NASA Astrophysics Data System (ADS)
Werth, M.; Lucas, J.; Thompson, D.; Abercrombie, M.; Holmes, R.; Roggemann, M.
Detailed wave-optics simulations and image post-processing algorithms are computationally expensive and benefit from the massively parallel hardware available at supercomputing facilities. We created an automated system that interfaces with the Maui High Performance Computing Center (MHPCC) Distributed MATLAB® Portal interface to submit massively parallel waveoptics simulations to the IBM iDataPlex (Riptide) supercomputer. This system subsequently postprocesses the output images with an improved version of physically constrained iterative deconvolution (PCID) and analyzes the results using a series of modular algorithms written in Python. With this architecture, a single person can simulate thousands of unique scenarios and produce analyzed, archived, and briefing-compatible output products with very little effort. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
NASA Astrophysics Data System (ADS)
Dong, Bing; Ren, De-Qing; Zhang, Xi
2011-08-01
An adaptive optics (AO) system based on a stochastic parallel gradient descent (SPGD) algorithm is proposed to reduce the speckle noises in the optical system of a stellar coronagraph in order to further improve the contrast. The principle of the SPGD algorithm is described briefly and a metric suitable for point source imaging optimization is given. The feasibility and good performance of the SPGD algorithm is demonstrated by an experimental system featured with a 140-actuator deformable mirror and a Hartmann-Shark wavefront sensor. Then the SPGD based AO is applied to a liquid crystal array (LCA) based coronagraph to improve the contrast. The LCA can modulate the incoming light to generate a pupil apodization mask of any pattern. A circular stepped pattern is used in our preliminary experiment and the image contrast shows improvement from 10-3 to 10-4.5 at an angular distance of 2λ/D after being corrected by SPGD based AO.
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Massively parallel processor computer
NASA Technical Reports Server (NTRS)
Fung, L. W. (Inventor)
1983-01-01
An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Compact holographic optical neural network system for real-time pattern recognition
NASA Astrophysics Data System (ADS)
Lu, Taiwei; Mintzer, David T.; Kostrzewski, Andrew A.; Lin, Freddie S.
1996-08-01
One of the important characteristics of artificial neural networks is their capability for massive interconnection and parallel processing. Recently, specialized electronic neural network processors and VLSI neural chips have been introduced in the commercial market. The number of parallel channels they can handle is limited because of the limited parallel interconnections that can be implemented with 1D electronic wires. High-resolution pattern recognition problems can require a large number of neurons for parallel processing of an image. This paper describes a holographic optical neural network (HONN) that is based on high- resolution volume holographic materials and is capable of performing massive 3D parallel interconnection of tens of thousands of neurons. A HONN with more than 16,000 neurons packaged in an attache case has been developed. Rotation- shift-scale-invariant pattern recognition operations have been demonstrated with this system. System parameters such as the signal-to-noise ratio, dynamic range, and processing speed are discussed.
A survey of GPU-based medical image computing techniques
Shi, Lin; Liu, Wen; Zhang, Heye; Xie, Yongming
2012-01-01
Medical imaging currently plays a crucial role throughout the entire clinical applications from medical scientific research to diagnostics and treatment planning. However, medical imaging procedures are often computationally demanding due to the large three-dimensional (3D) medical datasets to process in practical clinical applications. With the rapidly enhancing performances of graphics processors, improved programming support, and excellent price-to-performance ratio, the graphics processing unit (GPU) has emerged as a competitive parallel computing platform for computationally expensive and demanding tasks in a wide range of medical image applications. The major purpose of this survey is to provide a comprehensive reference source for the starters or researchers involved in GPU-based medical image processing. Within this survey, the continuous advancement of GPU computing is reviewed and the existing traditional applications in three areas of medical image processing, namely, segmentation, registration and visualization, are surveyed. The potential advantages and associated challenges of current GPU-based medical imaging are also discussed to inspire future applications in medicine. PMID:23256080
A review of snapshot multidimensional optical imaging: measuring photon tags in parallel
Gao, Liang; Wang, Lihong V.
2015-01-01
Multidimensional optical imaging has seen remarkable growth in the past decade. Rather than measuring only the two-dimensional spatial distribution of light, as in conventional photography, multidimensional optical imaging captures light in up to nine dimensions, providing unprecedented information about incident photons’ spatial coordinates, emittance angles, wavelength, time, and polarization. Multidimensional optical imaging can be accomplished either by scanning or parallel acquisition. Compared with scanning-based imagers, parallel acquisition—also dubbed snapshot imaging—has a prominent advantage in maximizing optical throughput, particularly when measuring a datacube of high dimensions. Here, we first categorize snapshot multidimensional imagers based on their acquisition and image reconstruction strategies, then highlight the snapshot advantage in the context of optical throughput, and finally we discuss their state-of-the-art implementations and applications. PMID:27134340
Parallel task processing of very large datasets
NASA Astrophysics Data System (ADS)
Romig, Phillip Richardson, III
This research concerns the use of distributed computer technologies for the analysis and management of very large datasets. Improvements in sensor technology, an emphasis on global change research, and greater access to data warehouses all are increase the number of non-traditional users of remotely sensed data. We present a framework for distributed solutions to the challenges of datasets which exceed the online storage capacity of individual workstations. This framework, called parallel task processing (PTP), incorporates both the task- and data-level parallelism exemplified by many image processing operations. An implementation based on the principles of PTP, called Tricky, is also presented. Additionally, we describe the challenges and practical issues in modeling the performance of parallel task processing with large datasets. We present a mechanism for estimating the running time of each unit of work within a system and an algorithm that uses these estimates to simulate the execution environment and produce estimated runtimes. Finally, we describe and discuss experimental results which validate the design. Specifically, the system (a) is able to perform computation on datasets which exceed the capacity of any one disk, (b) provides reduction of overall computation time as a result of the task distribution even with the additional cost of data transfer and management, and (c) in the simulation mode accurately predicts the performance of the real execution environment.
NASA Technical Reports Server (NTRS)
Sargent, Jeff Scott
1988-01-01
A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
High resolution human diffusion tensor imaging using 2-D navigated multi-shot SENSE EPI at 7 Tesla
Jeong, Ha-Kyu; Gore, John C.; Anderson, Adam W.
2012-01-01
The combination of parallel imaging with partial Fourier acquisition has greatly improved the performance of diffusion-weighted single-shot EPI and is the preferred method for acquisitions at low to medium magnetic field strength such as 1.5 or 3 Tesla. Increased off-resonance effects and reduced transverse relaxation times at 7 Tesla, however, generate more significant artifacts than at lower magnetic field strength and limit data acquisition. Additional acceleration of k-space traversal using a multi-shot approach, which acquires a subset of k-space data after each excitation, reduces these artifacts relative to conventional single-shot acquisitions. However, corrections for motion-induced phase errors are not straightforward in accelerated, diffusion-weighted multi-shot EPI because of phase aliasing. In this study, we introduce a simple acquisition and corresponding reconstruction method for diffusion-weighted multi-shot EPI with parallel imaging suitable for use at high field. The reconstruction uses a simple modification of the standard SENSE algorithm to account for shot-to-shot phase errors; the method is called Image Reconstruction using Image-space Sampling functions (IRIS). Using this approach, reconstruction from highly aliased in vivo image data using 2-D navigator phase information is demonstrated for human diffusion-weighted imaging studies at 7 Tesla. The final reconstructed images show submillimeter in-plane resolution with no ghosts and much reduced blurring and off-resonance artifacts. PMID:22592941
NASA Astrophysics Data System (ADS)
López, M.; Vázquez, F.; Solís-Nájera, S.; Rodriguez, A. O.
2015-01-01
The use of the travelling wave approach for high magnetic field magnetic resonance imaging has been used recently with very promising results. This approach offer images one with greater field-of-view and a reasonable signal-to-noise ratio using a circular waveguide. This scheme has been proved to be successful at 7 T and 9.4 T with whole-body imager. Images have also been acquired with clinical magnetic resonance imaging systems whose resonant frequencies were 64 MHz and 128 MHz. These results motivated the use of remote detection of the magnetic resonance signal using a parallel-plate waveguide together with 3 T clinical scanners, to acquired human leg images. The cut-off frequency of this waveguide is zero for the principal mode, allowing us to overcome the barrier of transmitting waves at lower frequency than 300 MHz or 7 T for protons. These motivated the study of remote detection outside the actual magnet. We performed electromagnetic field simulations of a parallel-plate waveguide and a phantom. The signal transmission was done at 128 MHz and using a circular surface coil located almost 200 cm away for the magnet isocentre. Numerical simulations demonstrated that the magnetic field of the principal mode propagate inside a waveguide outside the magnet. Numerical results were compared with previous experimental-acquired image data under similar conditions.
Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu
1995-01-01
As computing technology continues to advance, computational modeling of scientific and engineering problems produces data of increasing complexity: large in size and unstructured in shape. Volume visualization of such data is a challenging problem. This paper proposes a distributed parallel solution that makes ray-casting volume rendering of unstructured-grid data practical. Both the data and the rendering process are distributed among processors. At each processor, ray-casting of local data is performed independent of the other processors. The global image composing processes, which require inter-processor communication, are overlapped with the local ray-casting processes to achieve maximum parallel efficiency. This algorithm differs from previous ones in four ways: it is completely distributed, less view-dependent, reasonably scalable, and flexible. Without using dynamic load balancing, test results on the Intel Paragon using from two to 128 processors show, on average, about 60% parallel efficiency.
Xu, Jian; Mcgorty, Kelly Anne; Lim, Ruth. P.; Bruno, Mary; Babb, James S.; Srichai, Monvadi B.; Kim, Daniel; Sodickson, Daniel K.
2011-01-01
OBJECTIVE To evaluate the feasibility of performing single breath-hold 3D thoracic non-contrast magnetic resonance angiography (NC-MRA) using highly-accelerated parallel imaging. MATERIALS AND METHODS We developed a single breath-hold NC MRA pulse sequence using balanced steady state free precession (SSFP) readout and highly-accelerated parallel imaging. In 17 subjects, highly-accelerated non-contrast MRA was compared against electrocardiogram (ECG)-triggered contrast-enhanced MRA. Anonymized images were randomized for blinded review by two independent readers for image quality, artifact severity in 8 defined vessel segments and aortic dimensions in 6 standard sites. NC-MRA and CE-MRA were compared in terms of these measures using paired sample t and Wilcoxon tests. RESULTS The overall image quality (3.21±0.68 for NC-MRA vs. 3.12±0.71 for CE-MRA) and artifact (2.87±1.01 for NC-MRA vs. 2.92±0.87 for CE-MRA) scores were not significantly different, but there were significant differences for the great vessel and coronary artery origins. NC-MRA demonstrated significantly lower aortic diameter measurements compared to CE-MRA; however, this difference was not considered clinically relevant (>3 mm difference) for less than 12% of segments, most commonly at the sinotubular junction. Mean total scan time was significantly lower for NC-MRA compared to CE-MRA (18.2 ± 6.0s vs. 28.1 ± 5.4s, respectively; p < 0.05). CONCLUSION Single breath-hold NC-MRA is feasible and can be a useful alternative for evaluation and follow-up of thoracic aortic diseases. PMID:22147589
NASA Astrophysics Data System (ADS)
Wu, Z.; Gao, K.; Wang, Z. L.; Shao, Q. G.; Hu, R. F.; Wei, C. X.; Zan, G. B.; Wali, F.; Luo, R. H.; Zhu, P. P.; Tian, Y. C.
2017-06-01
In X-ray grating-based phase contrast imaging, information retrieval is necessary for quantitative research, especially for phase tomography. However, numerous and repetitive processes have to be performed for tomographic reconstruction. In this paper, we report a novel information retrieval method, which enables retrieving phase and absorption information by means of a linear combination of two mutually conjugate images. Thanks to the distributive law of the multiplication as well as the commutative law and associative law of the addition, the information retrieval can be performed after tomographic reconstruction, thus simplifying the information retrieval procedure dramatically. The theoretical model of this method is established in both parallel beam geometry for Talbot interferometer and fan beam geometry for Talbot-Lau interferometer. Numerical experiments are also performed to confirm the feasibility and validity of the proposed method. In addition, we discuss its possibility in cone beam geometry and its advantages compared with other methods. Moreover, this method can also be employed in other differential phase contrast imaging methods, such as diffraction enhanced imaging, non-interferometric imaging, and edge illumination.
Fast Face-Recognition Optical Parallel Correlator Using High Accuracy Correlation Filter
NASA Astrophysics Data System (ADS)
Watanabe, Eriko; Kodate, Kashiko
2005-11-01
We designed and fabricated a fully automatic fast face recognition optical parallel correlator [E. Watanabe and K. Kodate: Appl. Opt. 44 (2005) 5666] based on the VanderLugt principle. The implementation of an as-yet unattained ultra high-speed system was aided by reconfiguring the system to make it suitable for easier parallel processing, as well as by composing a higher accuracy correlation filter and high-speed ferroelectric liquid crystal-spatial light modulator (FLC-SLM). In running trial experiments using this system (dubbed FARCO), we succeeded in acquiring remarkably low error rates of 1.3% for false match rate (FMR) and 2.6% for false non-match rate (FNMR). Given the results of our experiments, the aim of this paper is to examine methods of designing correlation filters and arranging database image arrays for even faster parallel correlation, underlining the issues of calculation technique, quantization bit rate, pixel size and shift from optical axis. The correlation filter has proved its excellent performance and higher precision than classical correlation and joint transform correlator (JTC). Moreover, arrangement of multi-object reference images leads to 10-channel correlation signals, as sharply marked as those of a single channel. This experiment result demonstrates great potential for achieving the process speed of 10000 face/s.
Optimized Laplacian image sharpening algorithm based on graphic processing unit
NASA Astrophysics Data System (ADS)
Ma, Tinghuai; Li, Lu; Ji, Sai; Wang, Xin; Tian, Yuan; Al-Dhelaan, Abdullah; Al-Rodhaan, Mznah
2014-12-01
In classical Laplacian image sharpening, all pixels are processed one by one, which leads to large amount of computation. Traditional Laplacian sharpening processed on CPU is considerably time-consuming especially for those large pictures. In this paper, we propose a parallel implementation of Laplacian sharpening based on Compute Unified Device Architecture (CUDA), which is a computing platform of Graphic Processing Units (GPU), and analyze the impact of picture size on performance and the relationship between the processing time of between data transfer time and parallel computing time. Further, according to different features of different memory, an improved scheme of our method is developed, which exploits shared memory in GPU instead of global memory and further increases the efficiency. Experimental results prove that two novel algorithms outperform traditional consequentially method based on OpenCV in the aspect of computing speed.
NASA Astrophysics Data System (ADS)
Yin, Guoyan; Zhang, Limin; Zhang, Yanqi; Liu, Han; Du, Wenwen; Ma, Wenjuan; Zhao, Huijuan; Gao, Feng
2018-02-01
Pharmacokinetic diffuse fluorescence tomography (DFT) can describe the metabolic processes of fluorescent agents in biomedical tissue and provide helpful information for tumor differentiation. In this paper, a dynamic DFT system was developed by employing digital lock-in-photon-counting with square wave modulation, which predominates in ultra-high sensitivity and measurement parallelism. In this system, 16 frequency-encoded laser diodes (LDs) driven by self-designed light source system were distributed evenly in the imaging plane and irradiated simultaneously. Meanwhile, 16 detection fibers collected emission light in parallel by the digital lock-in-photon-counting module. The fundamental performances of the proposed system were assessed with phantom experiments in terms of stability, linearity, anti-crosstalk as well as images reconstruction. The results validated the availability of the proposed dynamic DFT system.
Li, Bingyi; Chen, Liang; Yu, Wenyue; Xie, Yizhuang; Bian, Mingming; Zhang, Qingjun; Pang, Long
2018-01-01
With the development of satellite load technology and very large-scale integrated (VLSI) circuit technology, on-board real-time synthetic aperture radar (SAR) imaging systems have facilitated rapid response to disasters. A key goal of the on-board SAR imaging system design is to achieve high real-time processing performance under severe size, weight, and power consumption constraints. This paper presents a multi-node prototype system for real-time SAR imaging processing. We decompose the commonly used chirp scaling (CS) SAR imaging algorithm into two parts according to the computing features. The linearization and logic-memory optimum allocation methods are adopted to realize the nonlinear part in a reconfigurable structure, and the two-part bandwidth balance method is used to realize the linear part. Thus, float-point SAR imaging processing can be integrated into a single Field Programmable Gate Array (FPGA) chip instead of relying on distributed technologies. A single-processing node requires 10.6 s and consumes 17 W to focus on 25-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384. The design methodology of the multi-FPGA parallel accelerating system under the real-time principle is introduced. As a proof of concept, a prototype with four processing nodes and one master node is implemented using a Xilinx xc6vlx315t FPGA. The weight and volume of one single machine are 10 kg and 32 cm × 24 cm × 20 cm, respectively, and the power consumption is under 100 W. The real-time performance of the proposed design is demonstrated on Chinese Gaofen-3 stripmap continuous imaging. PMID:29495637
Parallel and Serial Grouping of Image Elements in Visual Perception
ERIC Educational Resources Information Center
Houtkamp, Roos; Roelfsema, Pieter R.
2010-01-01
The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some…
CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms
Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W.
2011-01-01
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. PMID:21159404
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.
Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W
2012-06-01
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Scalable Parallel Density-based Clustering and Applications
NASA Astrophysics Data System (ADS)
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Qi, Xin; Xing, Fuyong; Foran, David J.; Yang, Lin
2013-01-01
Automated image analysis of histopathology specimens could potentially provide support for early detection and improved characterization of breast cancer. Automated segmentation of the cells comprising imaged tissue microarrays (TMA) is a prerequisite for any subsequent quantitative analysis. Unfortunately, crowding and overlapping of cells present significant challenges for most traditional segmentation algorithms. In this paper, we propose a novel algorithm which can reliably separate touching cells in hematoxylin stained breast TMA specimens which have been acquired using a standard RGB camera. The algorithm is composed of two steps. It begins with a fast, reliable object center localization approach which utilizes single-path voting followed by mean-shift clustering. Next, the contour of each cell is obtained using a level set algorithm based on an interactive model. We compared the experimental results with those reported in the most current literature. Finally, performance was evaluated by comparing the pixel-wise accuracy provided by human experts with that produced by the new automated segmentation algorithm. The method was systematically tested on 234 image patches exhibiting dense overlap and containing more than 2200 cells. It was also tested on whole slide images including blood smears and tissue microarrays containing thousands of cells. Since the voting step of the seed detection algorithm is well suited for parallelization, a parallel version of the algorithm was implemented using graphic processing units (GPU) which resulted in significant speed-up over the C/C++ implementation. PMID:22167559
NASA Astrophysics Data System (ADS)
Rahman, Tasneem; Tahtali, Murat; Pickering, Mark R.
2014-09-01
The purpose of this study is to derive optimized parameters for a detector module employing an off-the-shelf X-ray camera and a pinhole array collimator applicable for a range of different SPECT systems. Monte Carlo simulations using the Geant4 application for tomographic emission (GATE) were performed to estimate the performance of the pinhole array collimators and were compared to that of low energy high resolution (LEHR) parallel-hole collimator in a four head SPECT system. A detector module was simulated to have 48 mm by 48 mm active area along with 1mm, 1.6mm and 2 mm pinhole aperture sizes at 0.48 mm pitch on a tungsten plate. Perpendicular lead septa were employed to verify overlapping and non-overlapping projections against a proper acceptance angle without lead septa. A uniform shape cylindrical water phantom was used to evaluate the performance of the proposed four head SPECT system of the pinhole array detector module. For each head, 100 pinhole configurations were evaluated based on sensitivity and detection efficiency for 140 keV γ-rays, and compared to LEHR parallel-hole collimator. SPECT images were reconstructed based on filtered back projection (FBP) algorithm where neither scatter nor attenuation corrections were performed. A better reconstruction algorithm development for this specific system is in progress. Nevertheless, activity distribution was well visualized using the backprojection algorithm. In this study, we have evaluated several quantitative and comparative analyses for a pinhole array imaging system providing high detection efficiency and better system sensitivity over a large FOV, comparing to the conventional four head SPECT system. The proposed detector module is expected to provide improved performance in various SPECT imaging.
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmalz, Mark S
2011-07-24
Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Ji, Jim; Wright, Steven
2005-01-01
Parallel imaging using multiple phased-array coils and receiver channels has become an effective approach to high-speed magnetic resonance imaging (MRI). To obtain high spatiotemporal resolution, the k-space is subsampled and later interpolated using multiple channel data. Higher subsampling factors result in faster image acquisition. However, the subsampling factors are upper-bounded by the number of parallel channels. Phase constraints have been previously proposed to overcome this limitation with some success. In this paper, we demonstrate that in certain applications it is possible to obtain acceleration factors potentially up to twice the channel numbers by using a real image constraint. Data acquisition and processing methods to manipulate and estimate of the image phase information are presented for improving image reconstruction. In-vivo brain MRI experimental results show that accelerations up to 6 are feasible with 4-channel data.
Parallel image logical operations using cross correlation
NASA Technical Reports Server (NTRS)
Strong, J. P., III
1972-01-01
Methods are presented for counting areas in an image in a parallel manner using noncoherent optical techniques. The techniques presented include the Levialdi algorithm for counting, optical techniques for binary operations, and cross-correlation.
Mediaprocessors in medical imaging for high performance and flexibility
NASA Astrophysics Data System (ADS)
Managuli, Ravi; Kim, Yongmin
2002-05-01
New high performance programmable processors, called mediaprocessors, have been emerging since the early 1990s for various digital media applications, such as digital TV, set-top boxes, desktop video conferencing, and digital camcorders. Modern mediaprocessors, e.g., TI's TMS320C64x and Hitachi/Equator Technologies MAP-CA, can offer high performance utilizing both instruction-level and data-level parallelism. During this decade, with continued performance improvement and cost reduction, we believe that the mediaprocessors will become a preferred choice in designing imaging and video systems due to their flexibility in incorporating new algorithms and applications via programming and faster-time-to-market. In this paper, we will evaluate the suitability of these mediaprocessors in medical imaging. We will review the core routines of several medical imaging modalities, such as ultrasound and DR, and present how these routines can be mapped to mediaprocessors and their resultant performance. We will analyze the architecture of several leading mediaprocessors. By carefully mapping key imaging routines, such as 2D convolution, unsharp masking, and 2D FFT, to the mediaprocessor, we have been able to achieve comparable (if not better) performance to that of traditional hardwired approaches. Thus, we believe that future medical imaging systems will benefit greatly from these advanced mediaprocessors, offering significantly increased flexibility and adaptability, reducing the time-to-market, and improving the cost/performance ratio compared to the existing systems while meeting the high computing requirements.
GPU accelerated fuzzy connected image segmentation by using CUDA.
Zhuge, Ying; Cao, Yong; Miller, Robert W
2009-01-01
Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present a parallel fuzzy connected image segmentation algorithm on Nvidia's Compute Unified Device Architecture (CUDA) platform for segmenting large medical image data sets. Our experiments based on three data sets with small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 7.2x, 7.3x, and 14.4x, correspondingly, for the three data sets over the sequential implementation of fuzzy connected image segmentation algorithm on CPU.
NASA Technical Reports Server (NTRS)
Huck, Friedrich O.; Fales, Carl L.
1990-01-01
Researchers are concerned with the end-to-end performance of image gathering, coding, and processing. The applications range from high-resolution television to vision-based robotics, wherever the resolution, efficiency and robustness of visual information acquisition and processing are critical. For the presentation at this workshop, it is convenient to divide research activities into the following two overlapping areas: The first is the development of focal-plane processing techniques and technology to effectively combine image gathering with coding, with an emphasis on low-level vision processing akin to the retinal processing in human vision. The approach includes the familiar Laplacian pyramid, the new intensity-dependent spatial summation, and parallel sensing/processing networks. Three-dimensional image gathering is attained by combining laser ranging with sensor-array imaging. The second is the rigorous extension of information theory and optimal filtering to visual information acquisition and processing. The goal is to provide a comprehensive methodology for quantitatively assessing the end-to-end performance of image gathering, coding, and processing.
Lattice Boltzmann Model of 3D Multiphase Flow in Artery Bifurcation Aneurysm Problem
Abas, Aizat; Mokhtar, N. Hafizah; Ishak, M. H. H.; Abdullah, M. Z.; Ho Tian, Ang
2016-01-01
This paper simulates and predicts the laminar flow inside the 3D aneurysm geometry, since the hemodynamic situation in the blood vessels is difficult to determine and visualize using standard imaging techniques, for example, magnetic resonance imaging (MRI). Three different types of Lattice Boltzmann (LB) models are computed, namely, single relaxation time (SRT), multiple relaxation time (MRT), and regularized BGK models. The results obtained using these different versions of the LB-based code will then be validated with ANSYS FLUENT, a commercially available finite volume- (FV-) based CFD solver. The simulated flow profiles that include velocity, pressure, and wall shear stress (WSS) are then compared between the two solvers. The predicted outcomes show that all the LB models are comparable and in good agreement with the FVM solver for complex blood flow simulation. The findings also show minor differences in their WSS profiles. The performance of the parallel implementation for each solver is also included and discussed in this paper. In terms of parallelization, it was shown that LBM-based code performed better in terms of the computation time required. PMID:27239221
Efficient testing methodologies for microcameras in a gigapixel imaging system
NASA Astrophysics Data System (ADS)
Youn, Seo Ho; Marks, Daniel L.; McLaughlin, Paul O.; Brady, David J.; Kim, Jungsang
2013-04-01
Multiscale parallel imaging--based on a monocentric optical design--promises revolutionary advances in diverse imaging applications by enabling high resolution, real-time image capture over a wide field-of-view (FOV), including sport broadcast, wide-field microscopy, astronomy, and security surveillance. Recently demonstrated AWARE-2 is a gigapixel camera consisting of an objective lens and 98 microcameras spherically arranged to capture an image over FOV of 120° by 50°, using computational image processing to form a composite image of 0.96 gigapixels. Since microcameras are capable of individually adjusting exposure, gain, and focus, true parallel imaging is achieved with a high dynamic range. From the integration perspective, manufacturing and verifying consistent quality of microcameras is a key to successful realization of AWARE cameras. We have developed an efficient testing methodology that utilizes a precisely fabricated dot grid chart as a calibration target to extract critical optical properties such as optical distortion, veiling glare index, and modulation transfer function to validate imaging performance of microcameras. This approach utilizes an AWARE objective lens simulator which mimics the actual objective lens but operates with a short object distance, suitable for a laboratory environment. Here we describe the principles of the methodologies developed for AWARE microcameras and discuss the experimental results with our prototype microcameras. Reference Brady, D. J., Gehm, M. E., Stack, R. A., Marks, D. L., Kittle, D. S., Golish, D. R., Vera, E. M., and Feller, S. D., "Multiscale gigapixel photography," Nature 486, 386--389 (2012).
Parallel algorithm for determining motion vectors in ice floe images by matching edge features
NASA Technical Reports Server (NTRS)
Manohar, M.; Ramapriyan, H. K.; Strong, J. P.
1988-01-01
A parallel algorithm is described to determine motion vectors of ice floes using time sequences of images of the Arctic ocean obtained from the Synthetic Aperture Radar (SAR) instrument flown on-board the SEASAT spacecraft. Researchers describe a parallel algorithm which is implemented on the MPP for locating corresponding objects based on their translationally and rotationally invariant features. The algorithm first approximates the edges in the images by polygons or sets of connected straight-line segments. Each such edge structure is then reduced to a seed point. Associated with each seed point are the descriptions (lengths, orientations and sequence numbers) of the lines constituting the corresponding edge structure. A parallel matching algorithm is used to match packed arrays of such descriptions to identify corresponding seed points in the two images. The matching algorithm is designed such that fragmentation and merging of ice floes are taken into account by accepting partial matches. The technique has been demonstrated to work on synthetic test patterns and real image pairs from SEASAT in times ranging from .5 to 0.7 seconds for 128 x 128 images.
The parallel-sequential field subtraction techniques for nonlinear ultrasonic imaging
NASA Astrophysics Data System (ADS)
Cheng, Jingwei; Potter, Jack N.; Drinkwater, Bruce W.
2018-04-01
Nonlinear imaging techniques have recently emerged which have the potential to detect cracks at a much earlier stage and have sensitivity to particularly closed defects. This study utilizes two modes of focusing: parallel, in which the elements are fired together with a delay law, and sequential, in which elements are fired independently. In the parallel focusing, a high intensity ultrasonic beam is formed in the specimen at the focal point. However, in sequential focusing only low intensity signals from individual elements enter the sample and the full matrix of transmit-receive signals is recorded; with elastic assumptions, both parallel and sequential images are expected to be identical. Here we measure the difference between these images formed from the coherent component of the field and use this to characterize nonlinearity of closed fatigue cracks. In particular we monitor the reduction in amplitude at the fundamental frequency at each focal point and use this metric to form images of the spatial distribution of nonlinearity. The results suggest the subtracted image can suppress linear features (e.g., back wall or large scatters) and allow damage to be detected at an early stage.
Optical ranked-order filtering using threshold decomposition
Allebach, Jan P.; Ochoa, Ellen; Sweeney, Donald W.
1990-01-01
A hybrid optical/electronic system performs median filtering and related ranked-order operations using threshold decomposition to encode the image. Threshold decomposition transforms the nonlinear neighborhood ranking operation into a linear space-invariant filtering step followed by a point-to-point threshold comparison step. Spatial multiplexing allows parallel processing of all the threshold components as well as recombination by a second linear, space-invariant filtering step. An incoherent optical correlation system performs the linear filtering, using a magneto-optic spatial light modulator as the input device and a computer-generated hologram in the filter plane. Thresholding is done electronically. By adjusting the value of the threshold, the same architecture is used to perform median, minimum, and maximum filtering of images. A totally optical system is also disclosed.
New scheme for image edge detection using the switching mechanism of nonlinear optical material
NASA Astrophysics Data System (ADS)
Pahari, Nirmalya; Mukhopadhyay, Sourangshu
2006-03-01
The limitations of electronics in conducting parallel arithmetic, algebraic, and logic processing are well known. Very high-speed (terahertz) performance cannot be expected in conventional electronic mechanisms. To achieve such performance we can introduce optics instead of electronics for information processing, computing, and data handling. Nonlinear optical material (NOM) is a successful candidate in this regard to play a major role in the domain of optically controlled switching systems. The character of some NOMs is such as to reflect the probe beam in the presence of two read beams (or pump beams) exciting the material from opposite directions, using the principle of four-wave mixing. In image processing, edge extraction from an image is an important and essential task. Several optical methods of digital image processing are used for properly evaluating the image edges. We propose here a new method of image edge detection, extraction, and enhancement by use of AND-based switching operations with NOM. In this process we have used the optically inverted image of a supplied image. This can be obtained by the EXOR switching operation of the NOM.
NASA Astrophysics Data System (ADS)
Massambone de Oliveira, Rafael; Salomão Helou, Elias; Fontoura Costa, Eduardo
2016-11-01
We present a method for non-smooth convex minimization which is based on subgradient directions and string-averaging techniques. In this approach, the set of available data is split into sequences (strings) and a given iterate is processed independently along each string, possibly in parallel, by an incremental subgradient method (ISM). The end-points of all strings are averaged to form the next iterate. The method is useful to solve sparse and large-scale non-smooth convex optimization problems, such as those arising in tomographic imaging. A convergence analysis is provided under realistic, standard conditions. Numerical tests are performed in a tomographic image reconstruction application, showing good performance for the convergence speed when measured as the decrease ratio of the objective function, in comparison to classical ISM.
Real-time field programmable gate array architecture for computer vision
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel; Torres-Huitzil, Cesar
2001-01-01
This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low-level image processing. The field programmable gate array (FPGA)-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and it is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on dedicated very- large-scale-integrated devices to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real-time performance are discussed. Some results are presented and discussed.
Moslemi, Vahid; Ashoor, Mansour
2017-05-01
In addition to the trade-off between resolution and sensitivity which is a common problem among all types of parallel hole collimators (PCs), obtained images by high energy PCs (HEPCs) suffer from hole-pattern artifact (HPA) due to further septa thickness. In this study, a new design on the collimator has been proposed to improve the trade-off between resolution and sensitivity and to eliminate the HPA. A novel PC, namely high energy extended PC (HEEPC), is proposed and is compared to HEPCs. In the new PC, trapezoidal denticles were added upon the septa in the detector side. The performance of the HEEPCs were evaluated and compared to that of HEPCs using a Monte Carlo-N-particle version5 (MCNP5) simulation. The point spread functions (PSF) of HEPCs and HEEPCs were obtained as well as the various parameters such as resolution, sensitivity, scattering, and penetration ratios, and the HPA of the collimators was assessed. Furthermore, a Picker phantom study was performed to examine the effects of the collimators on the quality of planar images. It was found that the HEEPC D with an identical resolution to that of HEPC C increased sensitivity by 34.7%, and it improved the trade-off between resolution and sensitivity as well as to eliminate the HPA. In the picker phantom study, the HEEPC D indicated the hot and cold lesions with the higher contrast, lower noise, and higher contrast to noise ratio (CNR). Since the HEEPCs modify the shaping of PSFs, they are able to improve the trade-off between the resolution and sensitivity; consequently, planar images can be achieved with higher contrast resolutions. Furthermore, because the HEEPC S reduce the HPA and produce images with a higher CNR, compared to HEPCs, the obtained images by HEEPCs have a higher quality, which can help physicians to provide better diagnosis.
Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU
NASA Astrophysics Data System (ADS)
Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang
2017-10-01
Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.
Creating a Parallel Version of VisIt for Microsoft Windows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitlock, B J; Biagas, K S; Rawson, P L
2011-12-07
VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Real-time image dehazing using local adaptive neighborhoods and dark-channel-prior
NASA Astrophysics Data System (ADS)
Valderrama, Jesus A.; Díaz-Ramírez, Víctor H.; Kober, Vitaly; Hernandez, Enrique
2015-09-01
A real-time algorithm for single image dehazing is presented. The algorithm is based on calculation of local neighborhoods of a hazed image inside a moving window. The local neighborhoods are constructed by computing rank-order statistics. Next the dark-channel-prior approach is applied to the local neighborhoods to estimate the transmission function of the scene. By using the suggested approach there is no need for applying a refining algorithm to the estimated transmission such as the soft matting algorithm. To achieve high-rate signal processing the proposed algorithm is implemented exploiting massive parallelism on a graphics processing unit (GPU). Computer simulation results are carried out to test the performance of the proposed algorithm in terms of dehazing efficiency and speed of processing. These tests are performed using several synthetic and real images. The obtained results are analyzed and compared with those obtained with existing dehazing algorithms.
Fuzzy Matching Based on Gray-scale Difference for Quantum Images
NASA Astrophysics Data System (ADS)
Luo, GaoFeng; Zhou, Ri-Gui; Liu, XingAo; Hu, WenWen; Luo, Jia
2018-05-01
Quantum image processing has recently emerged as an essential problem in practical tasks, e.g. real-time image matching. Previous studies have shown that the superposition and entanglement of quantum can greatly improve the efficiency of complex image processing. In this paper, a fuzzy quantum image matching scheme based on gray-scale difference is proposed to find out the target region in a reference image, which is very similar to the template image. Firstly, we employ the proposed enhanced quantum representation (NEQR) to store digital images. Then some certain quantum operations are used to evaluate the gray-scale difference between two quantum images by thresholding. If all of the obtained gray-scale differences are not greater than the threshold value, it indicates a successful fuzzy matching of quantum images. Theoretical analysis and experiments show that the proposed scheme performs fuzzy matching at a low cost and also enables exponentially significant speedup via quantum parallel computation.
Sharif, Behzad; Bresler, Yoram
2013-01-01
Patient-Adaptive Reconstruction and Acquisition Dynamic Imaging with Sensitivity Encoding (PARADISE) is a dynamic MR imaging scheme that optimally combines parallel imaging and model-based adaptive acquisition. In this work, we propose the application of PARADISE to real-time cardiac MRI. We introduce a physiologically improved version of a realistic four-dimensional cardiac-torso (NCAT) phantom, which incorporates natural beat-to-beat heart rate and motion variations. Cardiac cine imaging using PARADISE is simulated and its performance is analyzed by virtue of the improved phantom. Results verify the effectiveness of PARADISE for high resolution un-gated real-time cardiac MRI and its superiority over conventional acquisition methods. PMID:24398475
Graphics Processing Unit Assisted Thermographic Compositing
NASA Technical Reports Server (NTRS)
Ragasa, Scott; Russell, Samuel S.
2012-01-01
Objective Develop a software application utilizing high performance computing techniques, including general purpose graphics processing units (GPGPUs), for the analysis and visualization of large thermographic data sets. Over the past several years, an increasing effort among scientists and engineers to utilize graphics processing units (GPUs) in a more general purpose fashion is allowing for previously unobtainable levels of computation by individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU which yield significant increases in performance. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Image processing is one area were GPUs are being used to greatly increase the performance of certain analysis and visualization techniques.
Real-time object tracking based on scale-invariant features employing bio-inspired hardware.
Yasukawa, Shinsuke; Okuno, Hirotsugu; Ishii, Kazuo; Yagi, Tetsuya
2016-09-01
We developed a vision sensor system that performs a scale-invariant feature transform (SIFT) in real time. To apply the SIFT algorithm efficiently, we focus on a two-fold process performed by the visual system: whole-image parallel filtering and frequency-band parallel processing. The vision sensor system comprises an active pixel sensor, a metal-oxide semiconductor (MOS)-based resistive network, a field-programmable gate array (FPGA), and a digital computer. We employed the MOS-based resistive network for instantaneous spatial filtering and a configurable filter size. The FPGA is used to pipeline process the frequency-band signals. The proposed system was evaluated by tracking the feature points detected on an object in a video. Copyright © 2016 Elsevier Ltd. All rights reserved.
A novel highly parallel algorithm for linearly unmixing hyperspectral images
NASA Astrophysics Data System (ADS)
Guerra, Raúl; López, Sebastián.; Callico, Gustavo M.; López, Jose F.; Sarmiento, Roberto
2014-10-01
Endmember extraction and abundances calculation represent critical steps within the process of linearly unmixing a given hyperspectral image because of two main reasons. The first one is due to the need of computing a set of accurate endmembers in order to further obtain confident abundance maps. The second one refers to the huge amount of operations involved in these time-consuming processes. This work proposes an algorithm to estimate the endmembers of a hyperspectral image under analysis and its abundances at the same time. The main advantage of this algorithm is its high parallelization degree and the mathematical simplicity of the operations implemented. This algorithm estimates the endmembers as virtual pixels. In particular, the proposed algorithm performs the descent gradient method to iteratively refine the endmembers and the abundances, reducing the mean square error, according with the linear unmixing model. Some mathematical restrictions must be added so the method converges in a unique and realistic solution. According with the algorithm nature, these restrictions can be easily implemented. The results obtained with synthetic images demonstrate the well behavior of the algorithm proposed. Moreover, the results obtained with the well-known Cuprite dataset also corroborate the benefits of our proposal.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, E.L.
A novel method for performing real-time acquisition and processing Landsat/EROS data covers all aspects including radiometric and geometric corrections of multispectral scanner or return-beam vidicon inputs, image enhancement, statistical analysis, feature extraction, and classification. Radiometric transformations include bias/gain adjustment, noise suppression, calibration, scan angle compensation, and illumination compensation, including topography and atmospheric effects. Correction or compensation for geometric distortion includes sensor-related distortions, such as centering, skew, size, scan nonlinearity, radial symmetry, and tangential symmetry. Also included are object image-related distortions such as aspect angle (altitude), scale distortion (altitude), terrain relief, and earth curvature. Ephemeral corrections are also applied to compensatemore » for satellite forward movement, earth rotation, altitude variations, satellite vibration, and mirror scan velocity. Image enhancement includes high-pass, low-pass, and Laplacian mask filtering and data restoration for intermittent losses. Resource classification is provided by statistical analysis including histograms, correlational analysis, matrix manipulations, and determination of spectral responses. Feature extraction includes spatial frequency analysis, which is used in parallel discriminant functions in each array processor for rapid determination. The technique uses integrated parallel array processors that decimate the tasks concurrently under supervision of a control processor. The operator-machine interface is optimized for programming ease and graphics image windowing.« less
Parallel architecture for rapid image generation and analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nerheim, R.J.
1987-01-01
A multiprocessor architecture inspired by the Disney multiplane camera is proposed. For many applications, this approach produces a natural mapping of processors to objects in a scene. Such a mapping promotes parallelism and reduces the hidden-surface work with minimal interprocessor communication and low-overhead cost. Existing graphics architectures store the final picture as a monolithic entity. The architecture here stores each object's image separately. It assembles the final composite picture from component images only when the video display needs to be refreshed. This organization simplifies the work required to animate moving objects that occlude other objects. In addition, the architecture hasmore » multiple processors that generate the component images in parallel. This further shortens the time needed to create a composite picture. In addition to generating images for animation, the architecture has the ability to decompose images.« less
Wu, Xiaoping; Tian, Jinfeng; Schmitter, Sebastian; Vaughan, J Tommy; Uğurbil, Kâmil; Van de Moortele, Pierre-François
2016-06-01
We explore the advantages of using a double-ring radiofrequency (RF) array and slice orientation to design parallel transmission (pTx) multiband (MB) pulses for simultaneous multislice (SMS) imaging with whole-brain coverage at 7 Tesla (T). A double-ring head array with 16 elements split evenly in two rings stacked in the z-direction was modeled and compared with two single-ring arrays consisting of 8 or 16 elements. The array performance was evaluated by designing band-specific pTx MB pulses with local specific absorption rate (SAR) control. The impact of slice orientations was also investigated. The double-ring array consistently and significantly outperformed the other two single-ring arrays, with peak local SAR reduced by up to 40% at a fixed excitation error of 0.024. For all three arrays, exciting sagittal or coronal slices yielded better RF performance than exciting axial or oblique slices. A double-ring RF array can be used to drastically improve SAR versus excitation fidelity tradeoff for pTx MB pulse design for brain imaging at 7 T; therefore, it is preferable against single-ring RF array designs when pursuing various biomedical applications of pTx SMS imaging. In comparing the stripline arrays, coronal and sagittal slices are more advantageous than axial and oblique slices for pTx MB pulses. Magn Reson Med 75:2464-2472, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
An application of the MPP to the interactive manipulation of stereo images of digital terrain models
NASA Technical Reports Server (NTRS)
Pol, Sanjay; Mcallister, David; Davis, Edward
1987-01-01
Massively Parallel Processor algorithms were developed for the interactive manipulation of flat shaded digital terrain models defined over grids. The emphasis is on real time manipulation of stereo images. Standard graphics transformations are applied to a 128 x 128 grid of elevations followed by shading and a perspective projection to produce the right eye image. The surface is then rendered using a simple painter's algorithm for hidden surface removal. The left eye image is produced by rotating the surface 6 degs about the viewer's y axis followed by a perspective projection and rendering of the image as described above. The left and right eye images are then presented on a graphics device using standard stereo technology. Performance evaluations and comparisons are presented.
Watanabe, Yuuki; Takahashi, Yuhei; Numazawa, Hiroshi
2014-02-01
We demonstrate intensity-based optical coherence tomography (OCT) angiography using the squared difference of two sequential frames with bulk-tissue-motion (BTM) correction. This motion correction was performed by minimization of the sum of the pixel values using axial- and lateral-pixel-shifted structural OCT images. We extract the BTM-corrected image from a total of 25 calculated OCT angiographic images. Image processing was accelerated by a graphics processing unit (GPU) with many stream processors to optimize the parallel processing procedure. The GPU processing rate was faster than that of a line scan camera (46.9 kHz). Our OCT system provides the means of displaying structural OCT images and BTM-corrected OCT angiographic images in real time.
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.
1989-01-01
Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
NASA Astrophysics Data System (ADS)
Wang, Zhi-shan; Zhao, Yue-jin; Li, Zhuo; Dong, Liquan; Chu, Xuhong; Li, Ping
2010-11-01
The comparison goniometer is widely used to measure and inspect small angle, angle difference, and parallelism of two surfaces. However, the common manner to read a comparison goniometer is to inspect the ocular of the goniometer by one eye of the operator. To read an old goniometer that just equips with one adjustable ocular is a difficult work. In the fabrication of an IR reflecting mirrors assembly, a common comparison goniometer is used to measure the angle errors between two neighbor assembled mirrors. In this paper, a quick reading technique image-based for the comparison goniometer used to inspect the parallelism of mirrors in a mirrors assembly is proposed. One digital camera, one comparison goniometer and one set of computer are used to construct a reading system, the image of the sight field in the comparison goniometer will be extracted and recognized to get the angle positions of the reflection surfaces to be measured. In order to obtain the interval distance between the scale lines, a particular technique, left peak first method, based on the local peak values of intensity in the true color image is proposed. A program written in VC++6.0 has been developed to perform the color digital image processing.
Robot Acting on Moving Bodies (RAMBO): Interaction with tumbling objects
NASA Technical Reports Server (NTRS)
Davis, Larry S.; Dementhon, Daniel; Bestul, Thor; Ziavras, Sotirios; Srinivasan, H. V.; Siddalingaiah, Madhu; Harwood, David
1989-01-01
Interaction with tumbling objects will become more common as human activities in space expand. Attempting to interact with a large complex object translating and rotating in space, a human operator using only his visual and mental capacities may not be able to estimate the object motion, plan actions or control those actions. A robot system (RAMBO) equipped with a camera, which, given a sequence of simple tasks, can perform these tasks on a tumbling object, is being developed. RAMBO is given a complete geometric model of the object. A low level vision module extracts and groups characteristic features in images of the object. The positions of the object are determined in a sequence of images, and a motion estimate of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative locations rearby the object sufficient for achieving the tasks. More specifically, low level vision uses parallel algorithms for image enhancement by symmetric nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by sector filtering. The object pose estimation is a Hough transform method accumulating position hypotheses obtained by matching triples of image features (corners) to triples of model features. To maximize computing speed, the estimate of the position in space of a triple of features is obtained by decomposing its perspective view into a product of rotations and a scaled orthographic projection. This allows use of 2-D lookup tables at each stage of the decomposition. The position hypotheses for each possible match of model feature triples and image feature triples are calculated in parallel. Trajectory planning combines heuristic and dynamic programming techniques. Then trajectories are created using dynamic interpolations between initial and goal trajectories. All the parallel algorithms run on a Connection Machine CM-2 with 16K processors.
Integrated Parallel Reception, Excitation, and Shimming (iPRES)
Han, Hui; Song, Allen W.; Truong, Trong-Kha
2013-01-01
Purpose To develop a new concept for a hardware platform that enables integrated parallel reception, excitation, and shimming (iPRES). Theory This concept uses a single coil array rather than separate arrays for parallel excitation/reception and B0 shimming. It relies on a novel design that allows a radiofrequency current (for excitation/reception) and a direct current (for B0 shimming) to coexist independently in the same coil. Methods Proof-of-concept B0 shimming experiments were performed with a two-coil array in a phantom, whereas B0 shimming simulations were performed with a 48-coil array in the human brain. Results Our experiments show that individually optimized direct currents applied in each coil can reduce the B0 root-mean-square error by 62–81% and minimize distortions in echo-planar images. The simulations show that dynamic shimming with the 48-coil iPRES array can reduce the B0 root-mean-square error in the prefrontal and temporal regions by 66–79% as compared to static 2nd-order spherical harmonic shimming and by 12–23% as compared to dynamic shimming with a 48-coil conventional shim array. Conclusion Our results demonstrate the feasibility of the iPRES concept to perform parallel excitation/reception and B0 shimming with a unified coil system as well as its promise for in vivo applications. PMID:23629974
Real-time intraoperative fluorescence imaging system using light-absorption correction.
Themelis, George; Yoo, Jung Sun; Soh, Kwang-Sup; Schulz, Ralf; Ntziachristos, Vasilis
2009-01-01
We present a novel fluorescence imaging system developed for real-time interventional imaging applications. The system implements a correction scheme that improves the accuracy of epi-illumination fluorescence images for light intensity variation in tissues. The implementation is based on the use of three cameras operating in parallel, utilizing a common lens, which allows for the concurrent collection of color, fluorescence, and light attenuation images at the excitation wavelength from the same field of view. The correction is based on a ratio approach of fluorescence over light attenuation images. Color images and video is used for surgical guidance and for registration with the corrected fluorescence images. We showcase the performance metrics of this system on phantoms and animals, and discuss the advantages over conventional epi-illumination systems developed for real-time applications and the limits of validity of corrected epi-illumination fluorescence imaging.
Parallel ICA and its hardware implementation in hyperspectral image analysis
NASA Astrophysics Data System (ADS)
Du, Hongtao; Qi, Hairong; Peterson, Gregory D.
2004-04-01
Advances in hyperspectral images have dramatically boosted remote sensing applications by providing abundant information using hundreds of contiguous spectral bands. However, the high volume of information also results in excessive computation burden. Since most materials have specific characteristics only at certain bands, a lot of these information is redundant. This property of hyperspectral images has motivated many researchers to study various dimensionality reduction algorithms, including Projection Pursuit (PP), Principal Component Analysis (PCA), wavelet transform, and Independent Component Analysis (ICA), where ICA is one of the most popular techniques. It searches for a linear or nonlinear transformation which minimizes the statistical dependence between spectral bands. Through this process, ICA can eliminate superfluous but retain practical information given only the observations of hyperspectral images. One hurdle of applying ICA in hyperspectral image (HSI) analysis, however, is its long computation time, especially for high volume hyperspectral data sets. Even the most efficient method, FastICA, is a very time-consuming process. In this paper, we present a parallel ICA (pICA) algorithm derived from FastICA. During the unmixing process, pICA divides the estimation of weight matrix into sub-processes which can be conducted in parallel on multiple processors. The decorrelation process is decomposed into the internal decorrelation and the external decorrelation, which perform weight vector decorrelations within individual processors and between cooperative processors, respectively. In order to further improve the performance of pICA, we seek hardware solutions in the implementation of pICA. Until now, there are very few hardware designs for ICA-related processes due to the complicated and iterant computation. This paper discusses capacity limitation of FPGA implementations for pICA in HSI analysis. A synthesis of Application-Specific Integrated Circuit (ASIC) is designed for pICA-based dimensionality reduction in HSI analysis. The pICA design is implemented using standard-height cells and aimed at TSMC 0.18 micron process. During the synthesis procedure, three ICA-related reconfigurable components are developed for the reuse and retargeting purpose. Preliminary results show that the standard-height cell based ASIC synthesis provide an effective solution for pICA and ICA-related processes in HSI analysis.
Algorithms and programming tools for image processing on the MPP
NASA Technical Reports Server (NTRS)
Reeves, A. P.
1985-01-01
Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
GPU-completeness: theory and implications
NASA Astrophysics Data System (ADS)
Lin, I.-Jong
2011-01-01
This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
Smeets, Julien; Roellinghoff, Frauke; Janssens, Guillaume; Perali, Irene; Celani, Andrea; Fiorini, Carlo; Freud, Nicolas; Testa, Etienne; Prieels, Damien
2016-01-01
More and more camera concepts are being investigated to try and seize the opportunity of instantaneous range verification of proton therapy treatments offered by prompt gammas emitted along the proton tracks. Focusing on one-dimensional imaging with a passive collimator, the present study experimentally compared in combination with the first, clinically compatible, dedicated camera device the performances of instances of the two main options: a knife-edge slit (KES) and a multi-parallel slit (MPS) design. These two options were experimentally assessed in this specific context as they were previously demonstrated through analytical and numerical studies to allow similar performances in terms of Bragg peak retrieval precision and spatial resolution in a general context. Both collimators were prototyped according to the conclusions of Monte Carlo optimization studies under constraints of equal weight (40 mm tungsten alloy equivalent thickness) and of the specificities of the camera device under consideration (in particular 4 mm segmentation along beam axis and no time-of-flight discrimination, both of which less favorable to the MPS performance than to the KES one). Acquisitions of proton pencil beams of 100, 160, and 230 MeV in a PMMA target revealed that, in order to reach a given level of statistical precision on Bragg peak depth retrieval, the KES collimator requires only half the dose the present MPS collimator needs, making the KES collimator a preferred option for a compact camera device aimed at imaging only the Bragg peak position. On the other hand, the present MPS collimator proves more effective at retrieving the entrance of the beam in the target in the context of an extended camera device aimed at imaging the whole proton track within the patient.
Pandit, Prachi; Rivoire, Julien; King, Kevin; Li, Xiaojuan
2016-03-01
Quantitative T1ρ imaging is beneficial for early detection for osteoarthritis but has seen limited clinical use due to long scan times. In this study, we evaluated the feasibility of accelerated T1ρ mapping for knee cartilage quantification using a combination of compressed sensing (CS) and data-driven parallel imaging (ARC-Autocalibrating Reconstruction for Cartesian sampling). A sequential combination of ARC and CS, both during data acquisition and reconstruction, was used to accelerate the acquisition of T1ρ maps. Phantom, ex vivo (porcine knee), and in vivo (human knee) imaging was performed on a GE 3T MR750 scanner. T1ρ quantification after CS-accelerated acquisition was compared with non CS-accelerated acquisition for various cartilage compartments. Accelerating image acquisition using CS did not introduce major deviations in quantification. The coefficient of variation for the root mean squared error increased with increasing acceleration, but for in vivo measurements, it stayed under 5% for a net acceleration factor up to 2, where the acquisition was 25% faster than the reference (only ARC). To the best of our knowledge, this is the first implementation of CS for in vivo T1ρ quantification. These early results show that this technique holds great promise in making quantitative imaging techniques more accessible for clinical applications. © 2015 Wiley Periodicals, Inc.
Design of k-Space Channel Combination Kernels and Integration with Parallel Imaging
Beatty, Philip J.; Chang, Shaorong; Holmes, James H.; Wang, Kang; Brau, Anja C. S.; Reeder, Scott B.; Brittain, Jean H.
2014-01-01
Purpose In this work, a new method is described for producing local k-space channel combination kernels using a small amount of low-resolution multichannel calibration data. Additionally, this work describes how these channel combination kernels can be combined with local k-space unaliasing kernels produced by the calibration phase of parallel imaging methods such as GRAPPA, PARS and ARC. Methods Experiments were conducted to evaluate both the image quality and computational efficiency of the proposed method compared to a channel-by-channel parallel imaging approach with image-space sum-of-squares channel combination. Results Results indicate comparable image quality overall, with some very minor differences seen in reduced field-of-view imaging. It was demonstrated that this method enables a speed up in computation time on the order of 3–16X for 32-channel data sets. Conclusion The proposed method enables high quality channel combination to occur earlier in the reconstruction pipeline, reducing computational and memory requirements for image reconstruction. PMID:23943602
Radio Synthesis Imaging - A High Performance Computing and Communications Project
NASA Astrophysics Data System (ADS)
Crutcher, Richard M.
The National Science Foundation has funded a five-year High Performance Computing and Communications project at the National Center for Supercomputing Applications (NCSA) for the direct implementation of several of the computing recommendations of the Astronomy and Astrophysics Survey Committee (the "Bahcall report"). This paper is a summary of the project goals and a progress report. The project will implement a prototype of the next generation of astronomical telescope systems - remotely located telescopes connected by high-speed networks to very high performance, scalable architecture computers and on-line data archives, which are accessed by astronomers over Gbit/sec networks. Specifically, a data link has been installed between the BIMA millimeter-wave synthesis array at Hat Creek, California and NCSA at Urbana, Illinois for real-time transmission of data to NCSA. Data are automatically archived, and may be browsed and retrieved by astronomers using the NCSA Mosaic software. In addition, an on-line digital library of processed images will be established. BIMA data will be processed on a very high performance distributed computing system, with I/O, user interface, and most of the software system running on the NCSA Convex C3880 supercomputer or Silicon Graphics Onyx workstations connected by HiPPI to the high performance, massively parallel Thinking Machines Corporation CM-5. The very computationally intensive algorithms for calibration and imaging of radio synthesis array observations will be optimized for the CM-5 and new algorithms which utilize the massively parallel architecture will be developed. Code running simultaneously on the distributed computers will communicate using the Data Transport Mechanism developed by NCSA. The project will also use the BLANCA Gbit/s testbed network between Urbana and Madison, Wisconsin to connect an Onyx workstation in the University of Wisconsin Astronomy Department to the NCSA CM-5, for development of long-distance distributed computing. Finally, the project is developing 2D and 3D visualization software as part of the international AIPS++ project. This research and development project is being carried out by a team of experts in radio astronomy, algorithm development for massively parallel architectures, high-speed networking, database management, and Thinking Machines Corporation personnel. The development of this complete software, distributed computing, and data archive and library solution to the radio astronomy computing problem will advance our expertise in high performance computing and communications technology and the application of these techniques to astronomical data processing.
NASA Astrophysics Data System (ADS)
Wang, Yihan; Lu, Tong; Wan, Wenbo; Liu, Lingling; Zhang, Songhe; Li, Jiao; Zhao, Huijuan; Gao, Feng
2018-02-01
To fully realize the potential of photoacoustic tomography (PAT) in preclinical and clinical applications, rapid measurements and robust reconstructions are needed. Sparse-view measurements have been adopted effectively to accelerate the data acquisition. However, since the reconstruction from the sparse-view sampling data is challenging, both of the effective measurement and the appropriate reconstruction should be taken into account. In this study, we present an iterative sparse-view PAT reconstruction scheme where a virtual parallel-projection concept matching for the proposed measurement condition is introduced to help to achieve the "compressive sensing" procedure of the reconstruction, and meanwhile the spatially adaptive filtering fully considering the a priori information of the mutually similar blocks existing in natural images is introduced to effectively recover the partial unknown coefficients in the transformed domain. Therefore, the sparse-view PAT images can be reconstructed with higher quality compared with the results obtained by the universal back-projection (UBP) algorithm in the same sparse-view cases. The proposed approach has been validated by simulation experiments, which exhibits desirable performances in image fidelity even from a small number of measuring positions.
Automated analysis and classification of melanocytic tumor on skin whole slide images.
Xu, Hongming; Lu, Cheng; Berendt, Richard; Jha, Naresh; Mandal, Mrinal
2018-06-01
This paper presents a computer-aided technique for automated analysis and classification of melanocytic tumor on skin whole slide biopsy images. The proposed technique consists of four main modules. First, skin epidermis and dermis regions are segmented by a multi-resolution framework. Next, epidermis analysis is performed, where a set of epidermis features reflecting nuclear morphologies and spatial distributions is computed. In parallel with epidermis analysis, dermis analysis is also performed, where dermal cell nuclei are segmented and a set of textural and cytological features are computed. Finally, the skin melanocytic image is classified into different categories such as melanoma, nevus or normal tissue by using a multi-class support vector machine (mSVM) with extracted epidermis and dermis features. Experimental results on 66 skin whole slide images indicate that the proposed technique achieves more than 95% classification accuracy, which suggests that the technique has the potential to be used for assisting pathologists on skin biopsy image analysis and classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
The parallel-sequential field subtraction technique for coherent nonlinear ultrasonic imaging
NASA Astrophysics Data System (ADS)
Cheng, Jingwei; Potter, Jack N.; Drinkwater, Bruce W.
2018-06-01
Nonlinear imaging techniques have recently emerged which have the potential to detect cracks at a much earlier stage than was previously possible and have sensitivity to partially closed defects. This study explores a coherent imaging technique based on the subtraction of two modes of focusing: parallel, in which the elements are fired together with a delay law and sequential, in which elements are fired independently. In the parallel focusing a high intensity ultrasonic beam is formed in the specimen at the focal point. However, in sequential focusing only low intensity signals from individual elements enter the sample and the full matrix of transmit-receive signals is recorded and post-processed to form an image. Under linear elastic assumptions, both parallel and sequential images are expected to be identical. Here we measure the difference between these images and use this to characterise the nonlinearity of small closed fatigue cracks. In particular we monitor the change in relative phase and amplitude at the fundamental frequencies for each focal point and use this nonlinear coherent imaging metric to form images of the spatial distribution of nonlinearity. The results suggest the subtracted image can suppress linear features (e.g. back wall or large scatters) effectively when instrumentation noise compensation in applied, thereby allowing damage to be detected at an early stage (c. 15% of fatigue life) and reliably quantified in later fatigue life.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-12-01
Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-01-01
Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
Measuring signal-to-noise ratio in partially parallel imaging MRI
Goerner, Frank L.; Clarke, Geoffrey D.
2011-01-01
Purpose: To assess five different methods of signal-to-noise ratio (SNR) measurement for partially parallel imaging (PPI) acquisitions. Methods: Measurements were performed on a spherical phantom and three volunteers using a multichannel head coil a clinical 3T MRI system to produce echo planar, fast spin echo, gradient echo, and balanced steady state free precession image acquisitions. Two different PPI acquisitions, generalized autocalibrating partially parallel acquisition algorithm and modified sensitivity encoding with acceleration factors (R) of 2–4, were evaluated and compared to nonaccelerated acquisitions. Five standard SNR measurement techniques were investigated and Bland–Altman analysis was used to determine agreement between the various SNR methods. The estimated g-factor values, associated with each method of SNR calculation and PPI reconstruction method, were also subjected to assessments that considered the effects on SNR due to reconstruction method, phase encoding direction, and R-value. Results: Only two SNR measurement methods produced g-factors in agreement with theoretical expectations (g ≥ 1). Bland–Altman tests demonstrated that these two methods also gave the most similar results relative to the other three measurements. R-value was the only factor of the three we considered that showed significant influence on SNR changes. Conclusions: Non-signal methods used in SNR evaluation do not produce results consistent with expectations in the investigated PPI protocols. Two of the methods studied provided the most accurate and useful results. Of these two methods, it is recommended, when evaluating PPI protocols, the image subtraction method be used for SNR calculations due to its relative accuracy and ease of implementation. PMID:21978049
Performance of external and internal coil configurations for prostate investigations at 7 Tesla
Metzger, Gregory J.; van de Moortele, Pierre-Francois; Akgun, Can; Snyder, Carl J.; Moeller, Steen; Strupp, John; Andersen, Peter; Shrivastava, Devashish; Vaughan, Tommy; Ugurbil, Kamil; Adriany, Gregor
2010-01-01
Three different coil configurations were evaluated through simulation and experimentally to determine safe operating limits and evaluate subject size dependent performance for prostate imaging at 7 Tesla. The coils included a transceiver endorectal coil (trERC), a 16 channel transceiver external surface array (trESA) and a trESA combined with a receive-only ERC (trESA+roERC). While the transmit B1 (B1+) homogeneity was far superior for the trESA, the maximum achievable B1+ is subject size dependent and limited by transmit chain losses and amplifier performance. For the trERC, limitations in transmit homogeneity greatly compromised image quality and limited coverage of the prostate. Despite these challenges, the high peak B1+ close to the trERC and subject size independent performance provides potential advantages especially for spectroscopic localization where high bandwidth RF pulses are required. On the receive side, the combined trESA+roERC provided the highest SNR and improved homogeneity over the trERC resulting in better visualization of the prostate and surrounding anatomy. In addition, the parallel imaging performance of the trESA+roERC holds strong promise for diffusion weighted imaging and dynamic contrast enhanced MRI. PMID:20740657
Blanc-Garin, J; Faure, S; Sabio, P
1993-05-01
The objective of this study was to analyze dynamic aspects of right hemisphere implementation in processing visual images. Two tachistoscopic, divided visual field experiments were carried out on a partial split-brain patient with no damage to the right hemisphere. In the first experiment, image generation performance for letters presented in the right visual field (/left hemisphere) was undeniably optimal. In the left visual field (/right hemisphere), performance was no better than chance level at first, but then improved dramatically across stimulation blocks, in each of five successive sessions. This was interpreted as revealing the progressive spontaneous activation of the right hemisphere's competence not shown initially. The aim of the second experiment was to determine some conditions under which this pattern was obtained. The experimental design contrasted stimuli (words and pictures) and representational activity (phonologic and visuo-imaged processing). The right visual field (/left hemisphere: LH) elicited higher performance than the left visual field (/right hemisphere, RH) in the three situations where verbal activity was required. No superiority could be found when visual images were to be generated from pictures: parallel and weak improvement of both hemispheres was observed across sessions. Two other patterns were obtained: improvement in RH performance (although LH performance remained superior) and an unexpectedly large decrease in RH performance. These data are discussed in terms of RH cognitive competence and hemisphere implementation.
High Performance Computing for Medical Image Interpretation
1993-10-01
programme for some key companies in the health care industries (Dumay et al., (1993)). With the modules developed for the "Smart Surgeon" an anatomical model...Interpretation System" (HIPMI 2S) geoft FEL-TNO do mogelijkheid omn haar expertise amn to bieden amn do Nederlandse, en Europese civiole medische industrie ...Spin-off 23 3 HIGH PERFORMANCE COMPUTING 24 3.1 Introduction 24 3.2 Parallel processing 24 3.3 Artificial Neural Networks 25 3.4 European Industry
Rotation and scale change invariant point pattern relaxation matching by the Hopfield neural network
NASA Astrophysics Data System (ADS)
Sang, Nong; Zhang, Tianxu
1997-12-01
Relaxation matching is one of the most relevant methods for image matching. The original relaxation matching technique using point patterns is sensitive to rotations and scale changes. We improve the original point pattern relaxation matching technique to be invariant to rotations and scale changes. A method that makes the Hopfield neural network perform this matching process is discussed. An advantage of this is that the relaxation matching process can be performed in real time with the neural network's massively parallel capability to process information. Experimental results with large simulated images demonstrate the effectiveness and feasibility of the method to perform point patten relaxation matching invariant to rotations and scale changes and the method to perform this matching by the Hopfield neural network. In addition, we show that the method presented can be tolerant to small random error.
High Performance Compression of Science Data
NASA Technical Reports Server (NTRS)
Storer, James A.; Carpentieri, Bruno; Cohn, Martin
1994-01-01
Two papers make up the body of this report. One presents a single-pass adaptive vector quantization algorithm that learns a codebook of variable size and shape entries; the authors present experiments on a set of test images showing that with no training or prior knowledge of the data, for a given fidelity, the compression achieved typically equals or exceeds that of the JPEG standard. The second paper addresses motion compensation, one of the most effective techniques used in interframe data compression. A parallel block-matching algorithm for estimating interframe displacement of blocks with minimum error is presented. The algorithm is designed for a simple parallel architecture to process video in real time.
Density-based parallel skin lesion border detection with webCL
2015-01-01
Background Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Methods Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Results Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. Conclusions When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser. PMID:26423836
Density-based parallel skin lesion border detection with webCL.
Lemon, James; Kockara, Sinan; Halic, Tansel; Mete, Mutlu
2015-01-01
Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser.
The design of multi-core DSP parallel model based on message passing and multi-level pipeline
NASA Astrophysics Data System (ADS)
Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong
2017-10-01
Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.
Interactive Fringe Analysis System: Applications To Moire Contourogram And Interferogram
NASA Astrophysics Data System (ADS)
Yatagai, T.; Idesawa, M.; Yamaashi, Y.; Suzuki, M.
1982-10-01
A general purpose fringe pattern processing facility was developed in order to analyze moire photographs used for scoliosis diagnoses and interferometric patterns in optical shops. A TV camera reads a fringe profile to be analyzed, and peaks of the fringe are detected by a microcomputer. Fringe peak correction and fringe order determination are performed with the man-machine interactive software developed. A light pen facility and an image digitizer are employed for interaction. In the case of two-dimensional fringe analysis, we analyze independently analysis lines parallel to each other and a reference line perpendicular to the parallel analysis lines. Fringe orders of parallel analysis lines are uniquely determined by using the fringe order of the reference line. Some results of analysis of moire contourograms, interferometric testing of silicon wafers, and holographic measurement of thermal deformation are presented.
NASA Technical Reports Server (NTRS)
2002-01-01
(Released 08 April 2002) This image shows the cratered highlands of Terra Sirenum in the southern hemisphere. Near the center of the image running from left to right one can see long parallel to semi-parallel fractures or troughs called graben. Mars Global Surveyor initially discovered gullies on the south-facing wall of these fractures. This image is located at 38oS, 174oW (186oE).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, S; Touch, M; Bowsher, J
Purpose: To construct a robotic SPECT system and demonstrate its capability to image a thorax phantom on a radiation therapy flat-top couch. The system has potential for on-board functional and molecular imaging in radiation therapy. Methods: A robotic SPECT imaging system was developed utilizing a Digirad 2020tc detector and a KUKA KR150-L110 robot. An imaging study was performed with the PET CT Phantom, which includes 5 spheres: 10, 13, 17, 22 and 28 mm in diameter. Sphere-tobackground concentration ratio was 6:1 of Tc99m. The phantom was placed on a flat-top couch. SPECT projections were acquired with a parallel-hole collimator andmore » a single pinhole collimator. The robotic system navigated the detector tracing the flat-top table to maintain the closest possible proximity to the phantom. For image reconstruction, detector trajectories were described by six parameters: radius-of-rotation, x and z detector shifts, and detector rotation θ, tilt ϕ and twist γ. These six parameters were obtained from the robotic system by calibrating the robot base and tool coordinates. Results: The robotic SPECT system was able to maneuver parallel-hole and pinhole collimated SPECT detectors in close proximity to the phantom, minimizing impact of the flat-top couch on detector-to-COR (center-ofrotation) distance. In acquisitions with background at 1/6th sphere activity concentration, photopeak contamination was heavy, yet the 17, 22, and 28 mm diameter spheres were readily observed with the parallel hole imaging, and the single, targeted sphere (28 mm diameter) was readily observed in the pinhole region-of-interest (ROI) imaging. Conclusion: Onboard SPECT could be achieved by a robot maneuvering a SPECT detector about patients in position for radiation therapy on a flat-top couch. The robot inherent coordinate frame could be an effective means to estimate detector pose for use in SPECT image reconstruction. PHS/NIH/NCI grant R21-CA156390-01A1.« less
Hirokawa, Yuusuke; Isoda, Hiroyoshi; Maetani, Yoji S; Arizono, Shigeki; Shimada, Kotaro; Okada, Tomohisa; Shibata, Toshiya; Togashi, Kaori
2009-05-01
To evaluate the effectiveness of the periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER) technique for superparamagnetic iron oxide (SPIO)-enhanced T2-weighted magnetic resonance (MR) imaging with respiratory compensation with the prospective acquisition correction (PACE) technique in the detection of hepatic lesions. The institutional human research committee approved this prospective study, and all patients provided written informed consent. Eighty-one patients (mean age, 58 years) underwent hepatic 1.5-T MR imaging. Fat-saturated T2-weighted turbo spin-echo images were acquired with the PACE technique and with and without the PROPELLER method after administration of SPIO. Images were qualitatively evaluated for image artifacts, depiction of liver edge and intrahepatic vessels, overall image quality, and presence of lesions. Three radiologists independently assessed these characteristics with a five-point confidence scale. Diagnostic performance was assessed with receiver operating characteristic (ROC) curve analysis. Quantitative analysis was conducted by measuring the liver signal-to-noise ratio (SNR) and the lesion-to-liver contrast-to-noise ratio (CNR). The Wilcoxon signed rank test and two-tailed Student t test were used, and P < .05 indicated a significant difference. MR imaging with the PROPELLER and PACE techniques resulted in significantly improved image quality, higher sensitivity, and greater area under the ROC curve for hepatic lesion detection than did MR imaging with the PACE technique alone (P < .001). The mean liver SNR and the lesion-to-liver CNR were higher with the PROPELLER technique than without it (P < .001). T2-weighted MR imaging with the PROPELLER and PACE technique and SPIO enhancement is a promising method with which to improve the detection of hepatic lesions. (c) RSNA, 2009.
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters
Bajaj, Chandrajit
2009-01-01
Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
Optical Processing of Speckle Images with Bacteriorhodopsin for Pattern Recognition
NASA Technical Reports Server (NTRS)
Downie, John D.; Tucker, Deanne (Technical Monitor)
1994-01-01
Logarithmic processing of images with multiplicative noise characteristics can be utilized to transform the image into one with an additive noise distribution. This simplifies subsequent image processing steps for applications such as image restoration or correlation for pattern recognition. One particularly common form of multiplicative noise is speckle, for which the logarithmic operation not only produces additive noise, but also makes it of constant variance (signal-independent). We examine the optical transmission properties of some bacteriorhodopsin films here and find them well suited to implement such a pointwise logarithmic transformation optically in a parallel fashion. We present experimental results of the optical conversion of speckle images into transformed images with additive, signal-independent noise statistics using the real-time photochromic properties of bacteriorhodopsin. We provide an example of improved correlation performance in terms of correlation peak signal-to-noise for such a transformed speckle image.
Iterative image reconstruction for PROPELLER-MRI using the nonuniform fast fourier transform.
Tamhane, Ashish A; Anastasio, Mark A; Gui, Minzhi; Arfanakis, Konstantinos
2010-07-01
To investigate an iterative image reconstruction algorithm using the nonuniform fast Fourier transform (NUFFT) for PROPELLER (Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction) MRI. Numerical simulations, as well as experiments on a phantom and a healthy human subject were used to evaluate the performance of the iterative image reconstruction algorithm for PROPELLER, and compare it with that of conventional gridding. The trade-off between spatial resolution, signal to noise ratio, and image artifacts, was investigated for different values of the regularization parameter. The performance of the iterative image reconstruction algorithm in the presence of motion was also evaluated. It was demonstrated that, for a certain range of values of the regularization parameter, iterative reconstruction produced images with significantly increased signal to noise ratio, reduced artifacts, for similar spatial resolution, compared with gridding. Furthermore, the ability to reduce the effects of motion in PROPELLER-MRI was maintained when using the iterative reconstruction approach. An iterative image reconstruction technique based on the NUFFT was investigated for PROPELLER MRI. For a certain range of values of the regularization parameter, the new reconstruction technique may provide PROPELLER images with improved image quality compared with conventional gridding. (c) 2010 Wiley-Liss, Inc.
Iterative Image Reconstruction for PROPELLER-MRI using the NonUniform Fast Fourier Transform
Tamhane, Ashish A.; Anastasio, Mark A.; Gui, Minzhi; Arfanakis, Konstantinos
2013-01-01
Purpose To investigate an iterative image reconstruction algorithm using the non-uniform fast Fourier transform (NUFFT) for PROPELLER (Periodically Rotated Overlapping parallEL Lines with Enhanced Reconstruction) MRI. Materials and Methods Numerical simulations, as well as experiments on a phantom and a healthy human subject were used to evaluate the performance of the iterative image reconstruction algorithm for PROPELLER, and compare it to that of conventional gridding. The trade-off between spatial resolution, signal to noise ratio, and image artifacts, was investigated for different values of the regularization parameter. The performance of the iterative image reconstruction algorithm in the presence of motion was also evaluated. Results It was demonstrated that, for a certain range of values of the regularization parameter, iterative reconstruction produced images with significantly increased SNR, reduced artifacts, for similar spatial resolution, compared to gridding. Furthermore, the ability to reduce the effects of motion in PROPELLER-MRI was maintained when using the iterative reconstruction approach. Conclusion An iterative image reconstruction technique based on the NUFFT was investigated for PROPELLER MRI. For a certain range of values of the regularization parameter the new reconstruction technique may provide PROPELLER images with improved image quality compared to conventional gridding. PMID:20578028
Supercomputer algorithms for efficient linear octree encoding of three-dimensional brain images.
Berger, S B; Reis, D J
1995-02-01
We designed and implemented algorithms for three-dimensional (3-D) reconstruction of brain images from serial sections using two important supercomputer architectures, vector and parallel. These architectures were represented by the Cray YMP and Connection Machine CM-2, respectively. The programs operated on linear octree representations of the brain data sets, and achieved 500-800 times acceleration when compared with a conventional laboratory workstation. As the need for higher resolution data sets increases, supercomputer algorithms may offer a means of performing 3-D reconstruction well above current experimental limits.
Optical ranked-order filtering using threshold decomposition
Allebach, J.P.; Ochoa, E.; Sweeney, D.W.
1987-10-09
A hybrid optical/electronic system performs median filtering and related ranked-order operations using threshold decomposition to encode the image. Threshold decomposition transforms the nonlinear neighborhood ranking operation into a linear space-invariant filtering step followed by a point-to-point threshold comparison step. Spatial multiplexing allows parallel processing of all the threshold components as well as recombination by a second linear, space-invariant filtering step. An incoherent optical correlation system performs the linear filtering, using a magneto-optic spatial light modulator as the input device and a computer-generated hologram in the filter plane. Thresholding is done electronically. By adjusting the value of the threshold, the same architecture is used to perform median, minimum, and maximum filtering of images. A totally optical system is also disclosed. 3 figs.
Development of IR imaging system simulator
NASA Astrophysics Data System (ADS)
Xiang, Xinglang; He, Guojing; Dong, Weike; Dong, Lu
2017-02-01
To overcome the disadvantages of the tradition semi-physical simulation and injection simulation equipment in the performance evaluation of the infrared imaging system (IRIS), a low-cost and reconfigurable IRIS simulator, which can simulate the realistic physical process of infrared imaging, is proposed to test and evaluate the performance of the IRIS. According to the theoretical simulation framework and the theoretical models of the IRIS, the architecture of the IRIS simulator is constructed. The 3D scenes are generated and the infrared atmospheric transmission effects are simulated using OGRE technology in real-time on the computer. The physical effects of the IRIS are classified as the signal response characteristic, modulation transfer characteristic and noise characteristic, and they are simulated on the single-board signal processing platform based on the core processor FPGA in real-time using high-speed parallel computation method.
STAMPS: Software Tool for Automated MRI Post-processing on a supercomputer.
Bigler, Don C; Aksu, Yaman; Miller, David J; Yang, Qing X
2009-08-01
This paper describes a Software Tool for Automated MRI Post-processing (STAMP) of multiple types of brain MRIs on a workstation and for parallel processing on a supercomputer (STAMPS). This software tool enables the automation of nonlinear registration for a large image set and for multiple MR image types. The tool uses standard brain MRI post-processing tools (such as SPM, FSL, and HAMMER) for multiple MR image types in a pipeline fashion. It also contains novel MRI post-processing features. The STAMP image outputs can be used to perform brain analysis using Statistical Parametric Mapping (SPM) or single-/multi-image modality brain analysis using Support Vector Machines (SVMs). Since STAMPS is PBS-based, the supercomputer may be a multi-node computer cluster or one of the latest multi-core computers.
A Semi-flexible 64-channel Receive-only Phased Array for Pediatric Body MRI at 3T
Zhang, Tao; Grafendorfer, Thomas; Cheng, Joseph Y.; Ning, Peigang; Rainey, Bob; Giancola, Mark; Ortman, Sarah; Robb, Fraser J.; Calderon, Paul D.; Hargreaves, Brian A.; Lustig, Michael; Scott, Greig C.; Pauly, John M.; Vasanawala, Shreyas S.
2015-01-01
Purpose To design, construct, and validate a semi-flexible 64-channel receive-only phased array for pediatric body MRI at 3T. Methods A 64-channel receive-only phased array was developed and constructed. The designed flexible coil can easily conform to different patient sizes with non-overlapping coil elements in the transverse plane. It can cover a field of view of up to 44 × 28 cm2 and removes the need for coil repositioning for body MRI patients with multiple clinical concerns. The 64-channel coil was compared with a 32-channel standard coil for signal-to-noise ratio (SNR) and parallel imaging performances on different phantoms. With IRB approval and informed consent/assent, the designed coil was validated on 21 consecutive pediatric patients. Results The pediatric coil provided higher SNR than the standard coil on different phantoms, with the averaged SNR gain at least 23% over a depth of 7 cm along the cross-section of phantoms. It also achieved better parallel imaging performance under moderate acceleration factors. Good image quality (average score 4.6 out of 5) was achieved using the developed pediatric coil in the clinical studies. Conclusion A 64-channel semi-flexible receive-only phased array has been developed and validated to facilitate high quality pediatric body MRI at 3T. PMID:26418283
Basic research planning in mathematical pattern recognition and image analysis
NASA Technical Reports Server (NTRS)
Bryant, J.; Guseman, L. F., Jr.
1981-01-01
Fundamental problems encountered while attempting to develop automated techniques for applications of remote sensing are discussed under the following categories: (1) geometric and radiometric preprocessing; (2) spatial, spectral, temporal, syntactic, and ancillary digital image representation; (3) image partitioning, proportion estimation, and error models in object scene interference; (4) parallel processing and image data structures; and (5) continuing studies in polarization; computer architectures and parallel processing; and the applicability of "expert systems" to interactive analysis.
Image stack alignment in full-field X-ray absorption spectroscopy using SIFT_PyOCL.
Paleo, Pierre; Pouyet, Emeline; Kieffer, Jérôme
2014-03-01
Full-field X-ray absorption spectroscopy experiments allow the acquisition of millions of spectra within minutes. However, the construction of the hyperspectral image requires an image alignment procedure with sub-pixel precision. While the image correlation algorithm has originally been used for image re-alignment using translations, the Scale Invariant Feature Transform (SIFT) algorithm (which is by design robust versus rotation, illumination change, translation and scaling) presents an additional advantage: the alignment can be limited to a region of interest of any arbitrary shape. In this context, a Python module, named SIFT_PyOCL, has been developed. It implements a parallel version of the SIFT algorithm in OpenCL, providing high-speed image registration and alignment both on processors and graphics cards. The performance of the algorithm allows online processing of large datasets.
Dupas, Laura; Massire, Aurélien; Amadon, Alexis; Vignaud, Alexandre; Boulant, Nicolas
2015-06-01
The spokes method combined with parallel transmission is a promising technique to mitigate the B1(+) inhomogeneity at ultra-high field in 2D imaging. To date however, the spokes placement optimization combined with the magnitude least squares pulse design has never been done in direct conjunction with the explicit Specific Absorption Rate (SAR) and hardware constraints. In this work, the joint optimization of 2-spoke trajectories and RF subpulse weights is performed under these constraints explicitly and in the small tip angle regime. The problem is first considerably simplified by making the observation that only the vector between the 2 spokes is relevant in the magnitude least squares cost-function, thereby reducing the size of the parameter space and allowing a more exhaustive search. The algorithm starts from a set of initial k-space candidates and performs in parallel for all of them optimizations of the RF subpulse weights and the k-space locations simultaneously, under explicit SAR and power constraints, using an active-set algorithm. The dimensionality of the spoke placement parameter space being low, the RF pulse performance is computed for every location in k-space to study the robustness of the proposed approach with respect to initialization, by looking at the probability to converge towards a possible global minimum. Moreover, the optimization of the spoke placement is repeated with an increased pulse bandwidth in order to investigate the impact of the constraints on the result. Bloch simulations and in vivo T2(∗)-weighted images acquired at 7 T validate the approach. The algorithm returns simulated normalized root mean square errors systematically smaller than 5% in 10 s. Copyright © 2015 Elsevier Inc. All rights reserved.
Next Generation Parallelization Systems for Processing and Control of PDS Image Node Assets
NASA Astrophysics Data System (ADS)
Verma, R.
2017-06-01
We present next-generation parallelization tools to help Planetary Data System (PDS) Imaging Node (IMG) better monitor, process, and control changes to nearly 650 million file assets and over a dozen machines on which they are referenced or stored.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ban, H. Y.; Kavuri, V. C., E-mail: venk@physics.up
Purpose: The authors introduce a state-of-the-art all-optical clinical diffuse optical tomography (DOT) imaging instrument which collects spatially dense, multispectral, frequency-domain breast data in the parallel-plate geometry. Methods: The instrument utilizes a CCD-based heterodyne detection scheme that permits massively parallel detection of diffuse photon density wave amplitude and phase for a large number of source–detector pairs (10{sup 6}). The stand-alone clinical DOT instrument thus offers high spatial resolution with reduced crosstalk between absorption and scattering. Other novel features include a fringe profilometry system for breast boundary segmentation, real-time data normalization, and a patient bed design which permits both axial and sagittalmore » breast measurements. Results: The authors validated the instrument using tissue simulating phantoms with two different chromophore-containing targets and one scattering target. The authors also demonstrated the instrument in a case study breast cancer patient; the reconstructed 3D image of endogenous chromophores and scattering gave tumor localization in agreement with MRI. Conclusions: Imaging with a novel parallel-plate DOT breast imager that employs highly parallel, high-resolution CCD detection in the frequency-domain was demonstrated.« less
Parallel fuzzy connected image segmentation on GPU
Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.
2011-01-01
Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA’s compute unified device Architecture (cuda) platform for segmenting medical image data sets. Methods: In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as cuda kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Results: Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. Conclusions: The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set. PMID:21859037
Parallel fuzzy connected image segmentation on GPU.
Zhuge, Ying; Cao, Yong; Udupa, Jayaram K; Miller, Robert W
2011-07-01
Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA's compute unified device Architecture (CUDA) platform for segmenting medical image data sets. In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as CUDA kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set.
Spatio-temporal Hotelling observer for signal detection from image sequences
Caucci, Luca; Barrett, Harrison H.; Rodríguez, Jeffrey J.
2010-01-01
Detection of signals in noisy images is necessary in many applications, including astronomy and medical imaging. The optimal linear observer for performing a detection task, called the Hotelling observer in the medical literature, can be regarded as a generalization of the familiar prewhitening matched filter. Performance on the detection task is limited by randomness in the image data, which stems from randomness in the object, randomness in the imaging system, and randomness in the detector outputs due to photon and readout noise, and the Hotelling observer accounts for all of these effects in an optimal way. If multiple temporal frames of images are acquired, the resulting data set is a spatio-temporal random process, and the Hotelling observer becomes a spatio-temporal linear operator. This paper discusses the theory of the spatio-temporal Hotelling observer and estimation of the required spatio-temporal covariance matrices. It also presents a parallel implementation of the observer on a cluster of Sony PLAYSTATION 3 gaming consoles. As an example, we consider the use of the spatio-temporal Hotelling observer for exoplanet detection. PMID:19550494
Spatio-temporal Hotelling observer for signal detection from image sequences.
Caucci, Luca; Barrett, Harrison H; Rodriguez, Jeffrey J
2009-06-22
Detection of signals in noisy images is necessary in many applications, including astronomy and medical imaging. The optimal linear observer for performing a detection task, called the Hotelling observer in the medical literature, can be regarded as a generalization of the familiar prewhitening matched filter. Performance on the detection task is limited by randomness in the image data, which stems from randomness in the object, randomness in the imaging system, and randomness in the detector outputs due to photon and readout noise, and the Hotelling observer accounts for all of these effects in an optimal way. If multiple temporal frames of images are acquired, the resulting data set is a spatio-temporal random process, and the Hotelling observer becomes a spatio-temporal linear operator. This paper discusses the theory of the spatio-temporal Hotelling observer and estimation of the required spatio-temporal covariance matrices. It also presents a parallel implementation of the observer on a cluster of Sony PLAYSTATION 3 gaming consoles. As an example, we consider the use of the spatio-temporal Hotelling observer for exoplanet detection.
An object-based storage model for distributed remote sensing images
NASA Astrophysics Data System (ADS)
Yu, Zhanwu; Li, Zhongmin; Zheng, Sheng
2006-10-01
It is very difficult to design an integrated storage solution for distributed remote sensing images to offer high performance network storage services and secure data sharing across platforms using current network storage models such as direct attached storage, network attached storage and storage area network. Object-based storage, as new generation network storage technology emerged recently, separates the data path, the control path and the management path, which solves the bottleneck problem of metadata existed in traditional storage models, and has the characteristics of parallel data access, data sharing across platforms, intelligence of storage devices and security of data access. We use the object-based storage in the storage management of remote sensing images to construct an object-based storage model for distributed remote sensing images. In the storage model, remote sensing images are organized as remote sensing objects stored in the object-based storage devices. According to the storage model, we present the architecture of a distributed remote sensing images application system based on object-based storage, and give some test results about the write performance comparison of traditional network storage model and object-based storage model.
Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam
2013-01-01
Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness. PMID:23867746
Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam
2013-07-17
Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness.
Chang, Gregory; Wiggins, Graham C.; Xia, Ding; Lattanzi, Riccardo; Madelin, Guillaume; Raya, Jose G.; Finnerty, Matthew; Fujita, Hiroyuki; Recht, Michael P.; Regatte, Ravinder R.
2011-01-01
Purpose To compare a new birdcage-transmit, 28 channel-receive array (28 Ch) coil and a quadrature volume coil for 7 Tesla morphologic MRI and T2 mapping of knee cartilage. Methods The right knees of ten healthy subjects were imaged on a 7 Tesla whole body MR scanner using both coils. 3-dimensional fast low-angle shot (3D-FLASH) and multi-echo spin-echo (MESE) sequences were implemented. Cartilage signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), thickness, and T2 values were assessed. Results SNR/CNR was 17–400% greater for the 28 Ch compared to the quadrature coil (p≤0.005). Bland-Altman plots show mean differences between measurements of tibial/femoral cartilage thickness and T2 values obtained with each coil to be small (−0.002±0.009 cm/0.003±0.011 cm) and large (−6.8±6.7 ms/−8.2±9.7 ms), respectively. For the 28 Ch coil, when parallel imaging with acceleration factors (AF) 2, 3, and 4 was performed, SNR retained was: 62–69%, 51–55%, and 39–45%. Conclusion A 28 Ch knee coil provides increased SNR/CNR for 7T cartilage morphologic imaging and T2 mapping. Coils should be switched with caution during clinical studies because T2 values may differ. The greater SNR of the 28 Ch coil could be used to perform parallel imaging with AF2 and obtain similar SNR as the quadrature coil. PMID:22095723
Teodoro, George; Kurc, Tahsin; Andrade, Guilherme; Kong, Jun; Ferreira, Renato; Saltz, Joel
2015-01-01
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations. PMID:28239253
Block iterative restoration of astronomical images with the massively parallel processor
NASA Technical Reports Server (NTRS)
Heap, Sara R.; Lindler, Don J.
1987-01-01
A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.
Non-parametric analysis of LANDSAT maps using neural nets and parallel computers
NASA Technical Reports Server (NTRS)
Salu, Yehuda; Tilton, James
1991-01-01
Nearest neighbor approaches and a new neural network, the Binary Diamond, are used for the classification of images of ground pixels obtained by LANDSAT satellite. The performances are evaluated by comparing classifications of a scene in the vicinity of Washington DC. The problem of optimal selection of categories is addressed as a step in the classification process.
CFHT data processing and calibration ESPaDOnS pipeline: Upena and OPERA (optical spectropolarimetry)
NASA Astrophysics Data System (ADS)
Martioli, Eder; Teeple, D.; Manset, Nadine
2011-03-01
CFHT is ESPaDOnS responsible for processing raw images, removing instrument related artifacts, and delivering science-ready data to the PIs. Here we describe the Upena pipeline, which is the software used to reduce the echelle spectro-polarimetric data obtained with the ESPaDOnS instrument. Upena is an automated pipeline that performs calibration and reduction of raw images. Upena has the capability of both performing real-time image-by-image basis reduction and a post observing night complete reduction. Upena produces polarization and intensity spectra in FITS format. The pipeline is designed to perform parallel computing for improved speed, which assures that the final products are delivered to the PIs before noon HST after each night of observations. We also present the OPERA project, which is an open-source pipeline to reduce ESPaDOnS data that will be developed as a collaborative work between CFHT and the scientific community. OPERA will match the core capabilities of Upena and in addition will be open-source, flexible and extensible.
Kameda, Hiroyuki; Kudo, Kohsuke; Matsuda, Tsuyoshi; Harada, Taisuke; Iwadate, Yuji; Uwano, Ikuko; Yamashita, Fumio; Yoshioka, Kunihiro; Sasaki, Makoto; Shirato, Hiroki
2017-12-04
Respiration-induced phase shift affects B 0 /B 1 + mapping repeatability in parallel transmission (pTx) calibration for 7T brain MRI, but is improved by breath-holding (BH). However, BH cannot be applied during long scans. To examine whether interleaved acquisition during calibration scanning could improve pTx repeatability and image homogeneity. Prospective. Nine healthy subjects. 7T MRI with a two-channel RF transmission system was used. Calibration scanning for B 0 /B 1 + mapping was performed under sequential acquisition/free-breathing (Seq-FB), Seq-BH, and interleaved acquisition/FB (Int-FB) conditions. The B 0 map was calculated with two echo times, and the B 1 + map was obtained using the Bloch-Siegert method. Actual flip-angle imaging (AFI) and gradient echo (GRE) imaging were performed using pTx and quadrature-Tx (qTx). All scans were acquired in five sessions. Repeatability was evaluated using intersession standard deviation (SD) or coefficient of variance (CV), and in-plane homogeneity was evaluated using in-plane CV. A paired t-test with Bonferroni correction for multiple comparisons was used. The intersession CV/SDs for the B 0 /B 1 + maps were significantly smaller in Int-FB than in Seq-FB (Bonferroni-corrected P < 0.05 for all). The intersession CVs for the AFI and GRE images were also significantly smaller in Int-FB, Seq-BH, and qTx than in Seq-FB (Bonferroni-corrected P < 0.05 for all). The in-plane CVs for the AFI and GRE images in Seq-FB, Int-FB, and Seq-BH were significantly smaller than in qTx (Bonferroni-corrected P < 0.01 for all). Using interleaved acquisition during calibration scans of pTx for 7T brain MRI improved the repeatability of B 0 /B 1 + mapping, AFI, and GRE images, without BH. 1 Technical Efficacy Stage 1 J. Magn. Reson. Imaging 2017. © 2017 International Society for Magnetic Resonance in Medicine.
Parallel workflow tools to facilitate human brain MRI post-processing
Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang
2015-01-01
Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043
NASA Astrophysics Data System (ADS)
Newman, Gregory A.; Commer, Michael
2009-07-01
Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
Real-time text extraction based on the page layout analysis system
NASA Astrophysics Data System (ADS)
Soua, M.; Benchekroun, A.; Kachouri, R.; Akil, M.
2017-05-01
Several approaches were proposed in order to extract text from scanned documents. However, text extraction in heterogeneous documents stills a real challenge. Indeed, text extraction in this context is a difficult task because of the variation of the text due to the differences of sizes, styles and orientations, as well as to the complexity of the document region background. Recently, we have proposed the improved hybrid binarization based on Kmeans method (I-HBK)5 to extract suitably the text from heterogeneous documents. In this method, the Page Layout Analysis (PLA), part of the Tesseract OCR engine, is used to identify text and image regions. Afterwards our hybrid binarization is applied separately on each kind of regions. In one side, gamma correction is employed before to process image regions. In the other side, binarization is performed directly on text regions. Then, a foreground and background color study is performed to correct inverted region colors. Finally, characters are located from the binarized regions based on the PLA algorithm. In this work, we extend the integration of the PLA algorithm within the I-HBK method. In addition, to speed up the separation of text and image step, we employ an efficient GPU acceleration. Through the performed experiments, we demonstrate the high F-measure accuracy of the PLA algorithm reaching 95% on the LRDE dataset. In addition, we illustrate the sequential and the parallel compared PLA versions. The obtained results give a speedup of 3.7x when comparing the parallel PLA implementation on GPU GTX 660 to the CPU version.
Reese, Timothy G.; Jackowski, Marcel P.; Cauley, Stephen F.; Setsompop, Kawin; Bhat, Himanshu; Sosnovik, David E.
2017-01-01
Purpose To develop a clinically feasible whole-heart free-breathing diffusion-tensor (DT) magnetic resonance (MR) imaging approach with an imaging time of approximately 15 minutes to enable three-dimensional (3D) tractography. Materials and Methods The study was compliant with HIPAA and the institutional review board and required written consent from the participants. DT imaging was performed in seven healthy volunteers and three patients with pulmonary hypertension by using a stimulated echo sequence. Twelve contiguous short-axis sections and six four-chamber sections that covered the entire left ventricle were acquired by using simultaneous multisection (SMS) excitation with a blipped-controlled aliasing in parallel imaging readout. Rate 2 and rate 3 SMS excitation was defined as two and three times accelerated in the section axis, respectively. Breath-hold and free-breathing images with and without SMS acceleration were acquired. Diffusion-encoding directions were acquired sequentially, spatiotemporally registered, and retrospectively selected by using an entropy-based approach. Myofiber helix angle, mean diffusivity, fractional anisotropy, and 3D tractograms were analyzed by using paired t tests and analysis of variance. Results No significant differences (P > .63) were seen between breath-hold rate 3 SMS and free-breathing rate 2 SMS excitation in transmural myofiber helix angle, mean diffusivity (mean ± standard deviation, [0.89 ± 0.09] × 10−3 mm2/sec vs [0.9 ± 0.09] × 10−3 mm2/sec), or fractional anisotropy (0.43 ± 0.05 vs 0.42 ± 0.06). Three-dimensional tractograms of the left ventricle with no SMS and rate 2 and rate 3 SMS excitation were qualitatively similar. Conclusion Free-breathing DT imaging of the entire human heart can be performed in approximately 15 minutes without section gaps by using SMS excitation with a blipped-controlled aliasing in parallel imaging readout, followed by spatiotemporal registration and entropy-based retrospective image selection. This method may lead to clinical translation of whole-heart DT imaging, enabling broad application in patients with cardiac disease. © RSNA, 2016 Online supplemental material is available for this article. PMID:27681278
Mekkaoui, Choukri; Reese, Timothy G; Jackowski, Marcel P; Cauley, Stephen F; Setsompop, Kawin; Bhat, Himanshu; Sosnovik, David E
2017-03-01
Purpose To develop a clinically feasible whole-heart free-breathing diffusion-tensor (DT) magnetic resonance (MR) imaging approach with an imaging time of approximately 15 minutes to enable three-dimensional (3D) tractography. Materials and Methods The study was compliant with HIPAA and the institutional review board and required written consent from the participants. DT imaging was performed in seven healthy volunteers and three patients with pulmonary hypertension by using a stimulated echo sequence. Twelve contiguous short-axis sections and six four-chamber sections that covered the entire left ventricle were acquired by using simultaneous multisection (SMS) excitation with a blipped-controlled aliasing in parallel imaging readout. Rate 2 and rate 3 SMS excitation was defined as two and three times accelerated in the section axis, respectively. Breath-hold and free-breathing images with and without SMS acceleration were acquired. Diffusion-encoding directions were acquired sequentially, spatiotemporally registered, and retrospectively selected by using an entropy-based approach. Myofiber helix angle, mean diffusivity, fractional anisotropy, and 3D tractograms were analyzed by using paired t tests and analysis of variance. Results No significant differences (P > .63) were seen between breath-hold rate 3 SMS and free-breathing rate 2 SMS excitation in transmural myofiber helix angle, mean diffusivity (mean ± standard deviation, [0.89 ± 0.09] × 10 -3 mm 2 /sec vs [0.9 ± 0.09] × 10 -3 mm 2 /sec), or fractional anisotropy (0.43 ± 0.05 vs 0.42 ± 0.06). Three-dimensional tractograms of the left ventricle with no SMS and rate 2 and rate 3 SMS excitation were qualitatively similar. Conclusion Free-breathing DT imaging of the entire human heart can be performed in approximately 15 minutes without section gaps by using SMS excitation with a blipped-controlled aliasing in parallel imaging readout, followed by spatiotemporal registration and entropy-based retrospective image selection. This method may lead to clinical translation of whole-heart DT imaging, enabling broad application in patients with cardiac disease. © RSNA, 2016 Online supplemental material is available for this article.
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away
NASA Astrophysics Data System (ADS)
Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.
2012-09-01
By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
Sequential and parallel image restoration: neural network implementations.
Figueiredo, M T; Leitao, J N
1994-01-01
Sequential and parallel image restoration algorithms and their implementations on neural networks are proposed. For images degraded by linear blur and contaminated by additive white Gaussian noise, maximum a posteriori (MAP) estimation and regularization theory lead to the same high dimension convex optimization problem. The commonly adopted strategy (in using neural networks for image restoration) is to map the objective function of the optimization problem into the energy of a predefined network, taking advantage of its energy minimization properties. Departing from this approach, we propose neural implementations of iterative minimization algorithms which are first proved to converge. The developed schemes are based on modified Hopfield (1985) networks of graded elements, with both sequential and parallel updating schedules. An algorithm supported on a fully standard Hopfield network (binary elements and zero autoconnections) is also considered. Robustness with respect to finite numerical precision is studied, and examples with real images are presented.
Imaging Polarimetry in Central Serous Chorioretinopathy
MIURA, MASAHIRO; ELSNER, ANN E.; WEBER, ANKE; CHENEY, MICHAEL C.; OSAKO, MASAHIRO; USUI, MASAHIKO; IWASAKI, TAKUYA
2006-01-01
PURPOSE To evaluate a noninvasive technique to detect the leakage point of central serous chorioretinopathy (CSR), using a polarimetry method. DESIGN Prospective cohort study. METHODS SETTING Institutional practice. PATIENTS We examined 30 eyes of 30 patients with CSR. MAIN OUTCOME MEASURES Polarimetry images were recorded using the GDx-N (Laser Diagnostic Technologies). We computed four images that differed in their polarization content: a depolarized light image, an average reflectance image, a parallel polarized light image, and a birefringence image. Each polarimetry image was compared with abnormalities seen on fluorescein angiography. RESULTS In all eyes, leakage area could be clearly visualized as a bright area in the depolarized light images. Michelson contrasts for the leakage areas were 0.58 ± 0.28 in the depolarized light images, 0.17 ± 0.11 in the average reflectance images, 0.09 ± 0.09 in the parallel polarized light images, and 0.11 ± 0.21 in the birefringence images from the same raw data. Michelson contrasts in depolarized light images were significantly higher than for the other three images (P < .0001, for all tests, paired t test). The fluid accumulated in the retina was well-visualized in the average and parallel polarized light images. CONCLUSIONS Polarization-sensitive imaging could readily localize the leakage point and area of fluid in CSR. This may assist with the rapid, noninvasive assessment of CSR. PMID:16376644
NASA Technical Reports Server (NTRS)
1994-01-01
CESDIS, the Center of Excellence in Space Data and Information Sciences was developed jointly by NASA, Universities Space Research Association (USRA), and the University of Maryland in 1988 to focus on the design of advanced computing techniques and data systems to support NASA Earth and space science research programs. CESDIS is operated by USRA under contract to NASA. The Director, Associate Director, Staff Scientists, and administrative staff are located on-site at NASA's Goddard Space Flight Center in Greenbelt, Maryland. The primary CESDIS mission is to increase the connection between computer science and engineering research programs at colleges and universities and NASA groups working with computer applications in Earth and space science. Research areas of primary interest at CESDIS include: 1) High performance computing, especially software design and performance evaluation for massively parallel machines; 2) Parallel input/output and data storage systems for high performance parallel computers; 3) Data base and intelligent data management systems for parallel computers; 4) Image processing; 5) Digital libraries; and 6) Data compression. CESDIS funds multiyear projects at U. S. universities and colleges. Proposals are accepted in response to calls for proposals and are selected on the basis of peer reviews. Funds are provided to support faculty and graduate students working at their home institutions. Project personnel visit Goddard during academic recess periods to attend workshops, present seminars, and collaborate with NASA scientists on research projects. Additionally, CESDIS takes on specific research tasks of shorter duration for computer science research requested by NASA Goddard scientists.
Design and DSP implementation of star image acquisition and star point fast acquiring and tracking
NASA Astrophysics Data System (ADS)
Zhou, Guohui; Wang, Xiaodong; Hao, Zhihang
2006-02-01
Star sensor is a special high accuracy photoelectric sensor. Attitude acquisition time is an important function index of star sensor. In this paper, the design target is to acquire 10 samples per second dynamic performance. On the basis of analyzing CCD signals timing and star image processing, a new design and a special parallel architecture for improving star image processing are presented in this paper. In the design, the operation moving the data in expanded windows including the star to the on-chip memory of DSP is arranged in the invalid period of CCD frame signal. During the CCD saving the star image to memory, DSP processes the data in the on-chip memory. This parallelism greatly improves the efficiency of processing. The scheme proposed here results in enormous savings of memory normally required. In the scheme, DSP HOLD mode and CPLD technology are used to make a shared memory between CCD and DSP. The efficiency of processing is discussed in numerical tests. Only in 3.5ms is acquired the five lightest stars in the star acquisition stage. In 43us, the data in five expanded windows including stars are moved into the internal memory of DSP, and in 1.6ms, five star coordinates are achieved in the star tracking stage.
Design and manufacture of the integrated field unit for the NIRSpec spectrometer on JWST
NASA Astrophysics Data System (ADS)
Lobb, Daniel; Robertson, David; Closs, Martin; Barnes, Andy
2008-09-01
The NIRSpec imaging spectrometer, which forms part of the James Webb Space Telescope instrumentation, will include an integrated field unit (IFU). The IFU will be tasked specifically with efficient analysis of extended objects, including galaxies; it will accept a square image area at the spectrometer entrance field, dissect this area into 30 parallel sub-slits, and image the sub-slits end-to-end, forming a single virtual entrance slit. The IFU, uses all-mirror optics to operate over the spectral range 700nm to 5000nm. 95 mirrors and the main support structure are made in a common aluminium alloy, to achieve athermal performance down to an operating temperature of around 30K. Relatively complex mirror surface shapes are produced by diamond machining. The IFU has been designed and constructed by SSTL, with optics produced by CfAI; the unit is currently undergoing performance tests. This paper describes the IFU optical design and performance, and outlines the mirror manufacturing methods and alignment procedures.
Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P.
2016-01-01
Purpose Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. Theory and Methods The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly-accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely-used calibrationless uniformly-undersampled trajectories. Results Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. Conclusion The SENSE-LORAKS framework provides promising new opportunities for highly-accelerated MRI. PMID:27037836
Xu, Jian; Kim, Daniel; Otazo, Ricardo; Srichai, Monvadi B; Lim, Ruth P; Axel, Leon; Mcgorty, Kelly Anne; Niendorf, Thoralf; Sodickson, Daniel K
2013-07-01
To evaluate the feasibility and perform initial comparative evaluations of a 5-minute comprehensive whole-heart magnetic resonance imaging (MRI) protocol with four image acquisition types: perfusion (PERF), function (CINE), coronary artery imaging (CAI), and late gadolinium enhancement (LGE). This study protocol was Health Insurance Portability and Accountability Act (HIPAA)-compliant and Institutional Review Board-approved. A 5-minute comprehensive whole-heart MRI examination protocol (Accelerated) using 6-8-fold-accelerated volumetric parallel imaging was incorporated into and compared with a standard 2D clinical routine protocol (Standard). Following informed consent, 20 patients were imaged with both protocols. Datasets were reviewed for image quality using a 5-point Likert scale (0 = non-diagnostic, 4 = excellent) in blinded fashion by two readers. Good image quality with full whole-heart coverage was achieved using the accelerated protocol, particularly for CAI, although significant degradations in quality, as compared with traditional lengthy examinations, were observed for the other image types. Mean total scan time was significantly lower for the Accelerated as compared to Standard protocols (28.99 ± 4.59 min vs. 1.82 ± 0.05 min, P < 0.05). Overall image quality for the Standard vs. Accelerated protocol was 3.67 ± 0.29 vs. 1.5 ± 0.51 (P < 0.005) for PERF, 3.48 ± 0.64 vs. 2.6 ± 0.68 (P < 0.005) for CINE, 2.35 ± 1.01 vs. 2.48 ± 0.68 (P = 0.75) for CAI, and 3.67 ± 0.42 vs. 2.67 ± 0.84 (P < 0.005) for LGE. Diagnostic image quality for Standard vs. Accelerated protocols was 20/20 (100%) vs. 10/20 (50%) for PERF, 20/20 (100%) vs. 18/20 (90%) for CINE, 18/20 (90%) vs. 18/20 (90%) for CAI, and 20/20 (100%) vs. 18/20 (90%) for LGE. This study demonstrates the technical feasibility and promising image quality of 5-minute comprehensive whole-heart cardiac examinations, with simplified scan prescription and high spatial and temporal resolution enabled by highly parallel imaging technology. The study also highlights technical hurdles that remain to be addressed. Although image quality remained diagnostic for most scan types, the reduced image quality of PERF, CINE, and LGE scans in the Accelerated protocol remain a concern. Copyright © 2012 Wiley Periodicals, Inc.
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop
NASA Astrophysics Data System (ADS)
Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.
2018-04-01
The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
Parallel ptychographic reconstruction
Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; ...
2014-12-19
Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps tomore » take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.« less
High data volume and transfer rate techniques used at NASA's image processing facility
NASA Technical Reports Server (NTRS)
Heffner, P.; Connell, E.; Mccaleb, F.
1978-01-01
Data storage and transfer operations at a new image processing facility are described. The equipment includes high density digital magnetic tape drives and specially designed controllers to provide an interface between the tape drives and computerized image processing systems. The controller performs the functions necessary to convert the continuous serial data stream from the tape drive to a word-parallel blocked data stream which then goes to the computer-based system. With regard to the tape packing density, 1.8 times 10 to the tenth data bits are stored on a reel of one-inch tape. System components and their operation are surveyed, and studies on advanced storage techniques are summarized.
Li, Yiming; Ishitsuka, Yuji; Hedde, Per Niklas; Nienhaus, G Ulrich
2013-06-25
In localization-based super-resolution microscopy, individual fluorescent markers are stochastically photoactivated and subsequently localized within a series of camera frames, yielding a final image with a resolution far beyond the diffraction limit. Yet, before localization can be performed, the subregions within the frames where the individual molecules are present have to be identified-oftentimes in the presence of high background. In this work, we address the importance of reliable molecule identification for the quality of the final reconstructed super-resolution image. We present a fast and robust algorithm (a-livePALM) that vastly improves the molecule detection efficiency while minimizing false assignments that can lead to image artifacts.
Reconstruction of multiple-pinhole micro-SPECT data using origin ensembles.
Lyon, Morgan C; Sitek, Arkadiusz; Metzler, Scott D; Moore, Stephen C
2016-10-01
The authors are currently developing a dual-resolution multiple-pinhole microSPECT imaging system based on three large NaI(Tl) gamma cameras. Two multiple-pinhole tungsten collimator tubes will be used sequentially for whole-body "scout" imaging of a mouse, followed by high-resolution (hi-res) imaging of an organ of interest, such as the heart or brain. Ideally, the whole-body image will be reconstructed in real time such that data need only be acquired until the area of interest can be visualized well-enough to determine positioning for the hi-res scan. The authors investigated the utility of the origin ensemble (OE) algorithm for online and offline reconstructions of the scout data. This algorithm operates directly in image space, and can provide estimates of image uncertainty, along with reconstructed images. Techniques for accelerating the OE reconstruction were also introduced and evaluated. System matrices were calculated for our 39-pinhole scout collimator design. SPECT projections were simulated for a range of count levels using the MOBY digital mouse phantom. Simulated data were used for a comparison of OE and maximum-likelihood expectation maximization (MLEM) reconstructions. The OE algorithm convergence was evaluated by calculating the total-image entropy and by measuring the counts in a volume-of-interest (VOI) containing the heart. Total-image entropy was also calculated for simulated MOBY data reconstructed using OE with various levels of parallelization. For VOI measurements in the heart, liver, bladder, and soft-tissue, MLEM and OE reconstructed images agreed within 6%. Image entropy converged after ∼2000 iterations of OE, while the counts in the heart converged earlier at ∼200 iterations of OE. An accelerated version of OE completed 1000 iterations in <9 min for a 6.8M count data set, with some loss of image entropy performance, whereas the same dataset required ∼79 min to complete 1000 iterations of conventional OE. A combination of the two methods showed decreased reconstruction time and no loss of performance when compared to conventional OE alone. OE-reconstructed images were found to be quantitatively and qualitatively similar to MLEM, yet OE also provided estimates of image uncertainty. Some acceleration of the reconstruction can be gained through the use of parallel computing. The OE algorithm is useful for reconstructing multiple-pinhole SPECT data and can be easily modified for real-time reconstruction.
Fellner, C; Doenitz, C; Finkenzeller, T; Jung, E M; Rennert, J; Schlaier, J
2009-01-01
Geometric distortions and low spatial resolution are current limitations in functional magnetic resonance imaging (fMRI). The aim of this study was to evaluate if application of parallel imaging or significant reduction of voxel size in combination with a new 32-channel head array coil can reduce those drawbacks at 1.5 T for a simple hand motor task. Therefore, maximum t-values (tmax) in different regions of activation, time-dependent signal-to-noise ratios (SNR(t)) as well as distortions within the precentral gyrus were evaluated. Comparing fMRI with and without parallel imaging in 17 healthy subjects revealed significantly reduced geometric distortions in anterior-posterior direction. Using parallel imaging, tmax only showed a mild reduction (7-11%) although SNR(t) was significantly diminished (25%). In 7 healthy subjects high-resolution (2 x 2 x 2 mm3) fMRI was compared with standard fMRI (3 x 3 x 3 mm3) in a 32-channel coil and with high-resolution fMRI in a 12-channel coil. The new coil yielded a clear improvement for tmax (21-32%) and SNR(t) (51%) in comparison with the 12-channel coil. Geometric distortions were smaller due to the smaller voxel size. Therefore, the reduction in tmax (8-16%) and SNR(t) (52%) in the high-resolution experiment seems to be tolerable with this coil. In conclusion, parallel imaging is an alternative to reduce geometric distortions in fMRI at 1.5 T. Using a 32-channel coil, reduction of the voxel size might be the preferable way to improve spatial accuracy.
Study on parallel and distributed management of RS data based on spatial database
NASA Astrophysics Data System (ADS)
Chen, Yingbiao; Qian, Qinglan; Wu, Hongqiao; Liu, Shijin
2009-10-01
With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.
Study on parallel and distributed management of RS data based on spatial data base
NASA Astrophysics Data System (ADS)
Chen, Yingbiao; Qian, Qinglan; Liu, Shijin
2006-12-01
With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.
Mürtz, Petra; Kaschner, Marius; Träber, Frank; Kukuk, Guido M; Büdenbender, Sarah M; Skowasch, Dirk; Gieseke, Jürgen; Schild, Hans H; Willinek, Winfried A
2012-11-01
To evaluate the use of dual-source parallel RF excitation (TX) for diffusion-weighted whole-body MRI with background body signal suppression (DWIBS) at 3.0 T. Forty consecutive patients were examined on a clinical 3.0-T MRI system using a diffusion-weighted (DW) spin-echo echo-planar imaging sequence with a combination of short TI inversion recovery and slice-selective gradient reversal fat suppression. DWIBS of the neck (n=5), thorax (n=8), abdomen (n=6) and pelvis (n=21) was performed both with TX (2:56 min) and with standard single-source RF excitation (4:37 min). The quality of DW images and reconstructed inverted maximum intensity projections was visually judged by two readers (blinded to acquisition technique). Signal homogeneity and fat suppression were scored as "improved", "equal", "worse" or "ambiguous". Moreover, the apparent diffusion coefficient (ADC) values were measured in muscles, urinary bladder, lymph nodes and lesions. By the use of TX, signal homogeneity was "improved" in 25/40 and "equal" in 15/40 cases. Fat suppression was "improved" in 17/40 and "equal" in 23/40 cases. These improvements were statistically significant (p<0.001, Wilcoxon signed-rank test). In five patients, fluid-related dielectric shading was present, which improved remarkably. The ADC values did not significantly differ for the two RF excitation methods (p=0.630 over all data, pairwise Student's t-test). Dual-source parallel RF excitation improved image quality of DWIBS at 3.0 T with respect to signal homogeneity and fat suppression, reduced scan time by approximately one-third, and did not influence the measured ADC values. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Nakazato, Ryo; Slomka, Piotr J; Fish, Mathews; Schwartz, Ronald G; Hayes, Sean W; Thomson, Louise E J; Friedman, John D; Lemley, Mark; Mackin, Maria L; Peterson, Benjamin; Schwartz, Arielle M; Doran, Jesse A; Germano, Guido; Berman, Daniel S
2015-04-01
Obesity is a common source of artifact on conventional SPECT myocardial perfusion imaging (MPI). We evaluated image quality and diagnostic performance of high-efficiency (HE) cadmium-zinc-telluride parallel-hole SPECT MPI for coronary artery disease (CAD) in obese patients. 118 consecutive obese patients at three centers (BMI 43.6 ± 8.9 kg·m(-2), range 35-79.7 kg·m(-2)) had upright/supine HE-SPECT and invasive coronary angiography > 6 months (n = 67) or low likelihood of CAD (n = 51). Stress quantitative total perfusion deficit (TPD) for upright (U-TPD), supine (S-TPD), and combined acquisitions (C-TPD) was assessed. Image quality (IQ; 5 = excellent; < 3 nondiagnostic) was compared among BMI 35-39.9 (n = 58), 40-44.9 (n = 24) and ≥45 (n = 36) groups. ROC curve area for CAD detection (≥50% stenosis) for U-TPD, S-TPD, and C-TPD were 0.80, 0.80, and 0.87, respectively. Sensitivity/specificity was 82%/57% for U-TPD, 74%/71% for S-TPD, and 80%/82% for C-TPD. C-TPD had highest specificity (P = .02). C-TPD normalcy rate was higher than U-TPD (88% vs 75%, P = .02). Mean IQ was similar among BMI 35-39.9, 40-44.9 and ≥45 groups [4.6 vs 4.4 vs 4.5, respectively (P = .6)]. No patient had a nondiagnostic stress scan. In obese patients, HE-SPECT MPI with dedicated parallel-hole collimation demonstrated high image quality, normalcy rate, and diagnostic accuracy for CAD by quantitative analysis of combined upright/supine acquisitions.
Nakazato, Ryo; Slomka, Piotr J.; Fish, Mathews; Schwartz, Ronald G.; Hayes, Sean W.; Thomson, Louise E.J.; Friedman, John D.; Lemley, Mark; Mackin, Maria L.; Peterson, Benjamin; Schwartz, Arielle M.; Doran, Jesse A.; Germano, Guido; Berman, Daniel S.
2014-01-01
Background Obesity is a common source of artifact on conventional SPECT myocardial perfusion imaging (MPI). We evaluated image quality and diagnostic performance of high-efficiency (HE) cadmium-zinc-telluride (CZT) parallel-hole SPECT-MPI for coronary artery disease (CAD) in obese patients. Methods and Results 118 consecutive obese patients at 3 centers (BMI 43.6±8.9 kg/m2, range 35–79.7 kg/m2) had upright/supine HE-SPECT and ICA >6 months (n=67) or low-likelihood of CAD (n=51). Stress quantitative total perfusion deficit (TPD) for upright (U-TPD), supine (S-TPD) and combined acquisitions (C-TPD) was assessed. Image quality (IQ; 5=excellent; <3 nondiagnostic) was compared among BMI 35–39.9 (n=58), 40–44.9 (n=24) and ≥45 (n=36) groups. ROC-curve area for CAD detection (≥50% stenosis) for U-TPD, S-TPD, and C-TPD were 0.80, 0.80, and 0.87, respectively. Sensitivity/specificity was 82%/57% for U-TPD, 74%/71% for S-TPD, and 80%/82% for C-TPD. C-TPD had highest specificity (P=.02). C-TPD normalcy rate was higher than U-TPD (88% vs. 75%, P=.02). Mean IQ was similar among BMI 35–39.9, 40–44.9 and ≥45 groups [4.6 vs. 4.4 vs. 4.5, respectively (P=.6)]. No patient had a non-diagnostic stress scan. Conclusions In obese patients, HE-SPECT MPI with dedicated parallel-hole collimation demonstrated high image quality, normalcy rate, and diagnostic accuracy for CAD by quantitative analysis of combined upright/supine acquisitions. PMID:25388380
Pulse-coupled neural network implementation in FPGA
NASA Astrophysics Data System (ADS)
Waldemark, Joakim T. A.; Lindblad, Thomas; Lindsey, Clark S.; Waldemark, Karina E.; Oberg, Johnny; Millberg, Mikael
1998-03-01
Pulse Coupled Neural Networks (PCNN) are biologically inspired neural networks, mainly based on studies of the visual cortex of small mammals. The PCNN is very well suited as a pre- processor for image processing, particularly in connection with object isolation, edge detection and segmentation. Several implementations of PCNN on von Neumann computers, as well as on special parallel processing hardware devices (e.g. SIMD), exist. However, these implementations are not as flexible as required for many applications. Here we present an implementation in Field Programmable Gate Arrays (FPGA) together with a performance analysis. The FPGA hardware implementation may be considered a platform for further, extended implementations and easily expanded into various applications. The latter may include advanced on-line image analysis with close to real-time performance.
Fenchel, Michael; Nael, Kambiz; Deshpande, Vibhas S; Finn, J Paul; Kramer, Ulrich; Miller, Stephan; Ruehm, Stefan; Laub, Gerhard
2006-09-01
The aim of the present study was to assess the feasibility of renal magnetic resonance angiography at 3.0 T using a phased-array coil system with 32-coil elements. Specifically, high parallel imaging factors were used for an increased spatial resolution and anatomic coverage of the whole abdomen. Signal-to-noise values and the g-factor distribution of the 32 element coil were examined in phantom studies for the magnetic resonance angiography (MRA) sequence. Eleven volunteers (6 men, median age of 30.0 years) were examined on a 3.0-T MR scanner (Magnetom Trio, Siemens Medical Solutions, Malvern, PA) using a 32-element phased-array coil (prototype from In vivo Corp.). Contrast-enhanced 3D-MRA (TR 2.95 milliseconds, TE 1.12 milliseconds, flip angle 25-30 degrees , bandwidth 650 Hz/pixel) was acquired with integrated generalized autocalibrating partially parallel acquisition (GRAPPA), in both phase- and slice-encoding direction. Images were assessed by 2 independent observers with regard to image quality, noise and presence of artifacts. Signal-to-noise levels of 22.2 +/- 22.0 and 57.9 +/- 49.0 were measured with (GRAPPAx6) and without parallel-imaging, respectively. The mean g-factor of the 32-element coil for GRAPPA with an acceleration of 3 and 2 in the phase-encoding and slice-encoding direction, respectively, was 1.61. High image quality was found in 9 of 11 volunteers (2.6 +/- 0.8) with good overall interobserver agreement (k = 0.87). Relatively low image quality with higher noise levels were encountered in 2 volunteers. MRA at 3.0 T using a 32-element phased-array coil is feasible in healthy volunteers. High diagnostic image quality and extended anatomic coverage could be achieved with application of high parallel imaging factors.
High performance compression of science data
NASA Technical Reports Server (NTRS)
Storer, James A.; Cohn, Martin
1994-01-01
Two papers make up the body of this report. One presents a single-pass adaptive vector quantization algorithm that learns a codebook of variable size and shape entries; the authors present experiments on a set of test images showing that with no training or prior knowledge of the data, for a given fidelity, the compression achieved typically equals or exceeds that of the JPEG standard. The second paper addresses motion compensation, one of the most effective techniques used in the interframe data compression. A parallel block-matching algorithm for estimating interframe displacement of blocks with minimum error is presented. The algorithm is designed for a simple parallel architecture to process video in real time.
NASA Astrophysics Data System (ADS)
Ramirez, Andres; Rahnemoonfar, Maryam
2017-04-01
A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
Integration in PACS of DICOM with TCP/IP, SQL, and X Windows
NASA Astrophysics Data System (ADS)
Reijns, Gerard L.
1994-05-01
The DICOM standard (Digital Imaging and Communications in Medicine) has been developed in order to obtain compatibility at the higher OSI levels. This paper describes the implementation of DICOM into our developed low cost PACS, which uses as much as possible standard software and standard protocols such as SQL, X Windows and TCP/IP. We adopted the requirement that all messages on the communication network have to be DICOM compatible. The translation between DICOM messages and SQL commands, which drive the relational data base, has been accommodated in our PACS supervisor. The translation takes only between 10 and 20 milliseconds. Images, that will be used the current day are stored in a distributed, parallel operating image base for reasons of performance. Extensive use has been made of X Windows to visualize images. A maximum of 12 images can simultaneously be displayed, of which one selected image can be manipulated (e.g., magnified, rotated, etc.), without affecting the other displayed images. The emphasis of the future work will be on performance measurements and modeling of our PACS and bringing the results of both methods in agreement with each other.
Durham extremely large telescope adaptive optics simulation platform.
Basden, Alastair; Butterley, Timothy; Myers, Richard; Wilson, Richard
2007-03-01
Adaptive optics systems are essential on all large telescopes for which image quality is important. These are complex systems with many design parameters requiring optimization before good performance can be achieved. The simulation of adaptive optics systems is therefore necessary to categorize the expected performance. We describe an adaptive optics simulation platform, developed at Durham University, which can be used to simulate adaptive optics systems on the largest proposed future extremely large telescopes as well as on current systems. This platform is modular, object oriented, and has the benefit of hardware application acceleration that can be used to improve the simulation performance, essential for ensuring that the run time of a given simulation is acceptable. The simulation platform described here can be highly parallelized using parallelization techniques suited for adaptive optics simulation, while still offering the user complete control while the simulation is running. The results from the simulation of a ground layer adaptive optics system are provided as an example to demonstrate the flexibility of this simulation platform.
Fusion of local and global detection systems to detect tuberculosis in chest radiographs.
Hogeweg, Laurens; Mol, Christian; de Jong, Pim A; Dawson, Rodney; Ayles, Helen; van Ginneken, Bramin
2010-01-01
Automatic detection of tuberculosis (TB) on chest radiographs is a difficult problem because of the diverse presentation of the disease. A combination of detection systems for abnormalities and normal anatomy is used to improve detection performance. A textural abnormality detection system operating at the pixel level is combined with a clavicle detection system to suppress false positive responses. The output of a shape abnormality detection system operating at the image level is combined in a next step to further improve performance by reducing false negatives. Strategies for combining systems based on serial and parallel configurations were evaluated using the minimum, maximum, product, and mean probability combination rules. The performance of TB detection increased, as measured using the area under the ROC curve, from 0.67 for the textural abnormality detection system alone to 0.86 when the three systems were combined. The best result was achieved using the sum and product rule in a parallel combination of outputs.
Quantitative image reconstruction for total-body PET imaging using the 2-meter long EXPLORER scanner
NASA Astrophysics Data System (ADS)
Zhang, Xuezhu; Zhou, Jian; Cherry, Simon R.; Badawi, Ramsey D.; Qi, Jinyi
2017-03-01
The EXPLORER project aims to build a 2 meter long total-body PET scanner, which will provide extremely high sensitivity for imaging the entire human body. It will possess a range of capabilities currently unavailable to state-of-the-art clinical PET scanners with a limited axial field-of-view. The huge number of lines-of-response (LORs) of the EXPLORER poses a challenge to the data handling and image reconstruction. The objective of this study is to develop a quantitative image reconstruction method for the EXPLORER and compare its performance with current whole-body scanners. Fully 3D image reconstruction was performed using time-of-flight list-mode data with parallel computation. To recover the resolution loss caused by the parallax error between crystal pairs at a large axial ring difference or transaxial radial offset, we applied an image domain resolution model estimated from point source data. To evaluate the image quality, we conducted computer simulations using the SimSET Monte-Carlo toolkit and XCAT 2.0 anthropomorphic phantom to mimic a 20 min whole-body PET scan with an injection of 25 MBq 18F-FDG. We compare the performance of the EXPLORER with a current clinical scanner that has an axial FOV of 22 cm. The comparison results demonstrated superior image quality from the EXPLORER with a 6.9-fold reduction in noise standard deviation comparing with multi-bed imaging using the clinical scanner.
Quantitative Image Reconstruction for Total-Body PET Imaging Using the 2-meter Long EXPLORER Scanner
Zhang, Xuezhu; Zhou, Jian; Cherry, Simon R.; Badawi, Ramsey D.
2017-01-01
The EXPLORER project aims to build a 2-meter long total-body PET scanner, which will provide extremely high sensitivity for imaging the entire human body. It will possess a range of capabilities currently unavailable to state-of-the-art clinical PET scanners with a limited axial field-of-view. The huge number of lines-of-response (LORs) of the EXPLORER poses a challenge to the data handling and image reconstruction. The objective of this study is to develop a quantitative image reconstruction method for the EXPLORER and compare its performance with current whole-body scanners. Fully 3D image reconstruction was performed using time-of-flight list-mode data with parallel computation. To recover the resolution loss caused by the parallax error between crystal pairs at a large axial ring difference or transaxial radial offset, we applied an image domain resolution model estimated from point source data. To evaluate the image quality, we conducted computer simulations using the SimSET Monte-Carlo toolkit and XCAT 2.0 anthropomorphic phantom to mimic a 20-minute whole-body PET scan with an injection of 25 MBq 18F-FDG. We compare the performance of the EXPLORER with a current clinical scanner that has an axial FOV of 22 cm. The comparison results demonstrated superior image quality from the EXPLORER with a 6.9-fold reduction in noise standard deviation comparing with multi-bed imaging using the clinical scanner. PMID:28240215
Microscope mode secondary ion mass spectrometry imaging with a Timepix detector.
Kiss, Andras; Jungmann, Julia H; Smith, Donald F; Heeren, Ron M A
2013-01-01
In-vacuum active pixel detectors enable high sensitivity, highly parallel time- and space-resolved detection of ions from complex surfaces. For the first time, a Timepix detector assembly was combined with a secondary ion mass spectrometer for microscope mode secondary ion mass spectrometry (SIMS) imaging. Time resolved images from various benchmark samples demonstrate the imaging capabilities of the detector system. The main advantages of the active pixel detector are the higher signal-to-noise ratio and parallel acquisition of arrival time and position. Microscope mode SIMS imaging of biomolecules is demonstrated from tissue sections with the Timepix detector.
NASA Astrophysics Data System (ADS)
Yu, H.; Wang, Z.; Zhang, C.; Chen, N.; Zhao, Y.; Sawchuk, A. P.; Dalsing, M. C.; Teague, S. D.; Cheng, Y.
2014-11-01
Existing research of patient-specific computational hemodynamics (PSCH) heavily relies on software for anatomical extraction of blood arteries. Data reconstruction and mesh generation have to be done using existing commercial software due to the gap between medical image processing and CFD, which increases computation burden and introduces inaccuracy during data transformation thus limits the medical applications of PSCH. We use lattice Boltzmann method (LBM) to solve the level-set equation over an Eulerian distance field and implicitly and dynamically segment the artery surfaces from radiological CT/MRI imaging data. The segments seamlessly feed to the LBM based CFD computation of PSCH thus explicit mesh construction and extra data management are avoided. The LBM is ideally suited for GPU (graphic processing unit)-based parallel computing. The parallel acceleration over GPU achieves excellent performance in PSCH computation. An application study will be presented which segments an aortic artery from a chest CT dataset and models PSCH of the segmented artery.
Single-spin stochastic optical reconstruction microscopy
Pfender, Matthias; Aslam, Nabeel; Waldherr, Gerald; Neumann, Philipp; Wrachtrup, Jörg
2014-01-01
We experimentally demonstrate precision addressing of single-quantum emitters by combined optical microscopy and spin resonance techniques. To this end, we use nitrogen vacancy (NV) color centers in diamond confined within a few ten nanometers as individually resolvable quantum systems. By developing a stochastic optical reconstruction microscopy (STORM) technique for NV centers, we are able to simultaneously perform sub–diffraction-limit imaging and optically detected spin resonance (ODMR) measurements on NV spins. This allows the assignment of spin resonance spectra to individual NV center locations with nanometer-scale resolution and thus further improves spatial discrimination. For example, we resolved formerly indistinguishable emitters by their spectra. Furthermore, ODMR spectra contain metrology information allowing for sub–diffraction-limit sensing of, for instance, magnetic or electric fields with inherently parallel data acquisition. As an example, we have detected nuclear spins with nanometer-scale precision. Finally, we give prospects of how this technique can evolve into a fully parallel quantum sensor for nanometer resolution imaging of delocalized quantum correlations. PMID:25267655
In vivo and ex vivo imaging with ultrahigh resolution full-field OCT
NASA Astrophysics Data System (ADS)
Grieve, Kate; Moneron, Gael; Schwartz, Wilfrid; Boccara, Albert C.; Dubois, Arnaud
2005-08-01
Imaging of in vivo and ex vivo biological samples using full-field optical coherence tomography is demonstrated. Three variations on the original full-field optical coherence tomography instrument are presented, and evaluated in terms of performance. The instruments are based on the Linnik interferometer illuminated by a white light source. Images in the en face orientation are obtained in real-time without scanning by using a two-dimensional parallel detector array. An isotropic resolution capability better than 1 μm is achieved thanks to the use of a broad spectrum source and high numerical aperture microscope objectives. Detection sensitivity up to 90 dB is demonstrated. Image acquisition times as short as 10 μs per en face image are possible. A variety of in vivo and ex vivo imaging applications is explored, particularly in the fields of embryology, ophthalmology and botany.
Image Harvest: an open-source platform for high-throughput plant image processing and analysis
Knecht, Avi C.; Campbell, Malachy T.; Caprez, Adam; Swanson, David R.; Walia, Harkamal
2016-01-01
High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. PMID:27141917
Fast GPU-based Monte Carlo code for SPECT/CT reconstructions generates improved 177Lu images.
Rydén, T; Heydorn Lagerlöf, J; Hemmingsson, J; Marin, I; Svensson, J; Båth, M; Gjertsson, P; Bernhardt, P
2018-01-04
Full Monte Carlo (MC)-based SPECT reconstructions have a strong potential for correcting for image degrading factors, but the reconstruction times are long. The objective of this study was to develop a highly parallel Monte Carlo code for fast, ordered subset expectation maximum (OSEM) reconstructions of SPECT/CT images. The MC code was written in the Compute Unified Device Architecture language for a computer with four graphics processing units (GPUs) (GeForce GTX Titan X, Nvidia, USA). This enabled simulations of parallel photon emissions from the voxels matrix (128 3 or 256 3 ). Each computed tomography (CT) number was converted to attenuation coefficients for photo absorption, coherent scattering, and incoherent scattering. For photon scattering, the deflection angle was determined by the differential scattering cross sections. An angular response function was developed and used to model the accepted angles for photon interaction with the crystal, and a detector scattering kernel was used for modeling the photon scattering in the detector. Predefined energy and spatial resolution kernels for the crystal were used. The MC code was implemented in the OSEM reconstruction of clinical and phantom 177 Lu SPECT/CT images. The Jaszczak image quality phantom was used to evaluate the performance of the MC reconstruction in comparison with attenuated corrected (AC) OSEM reconstructions and attenuated corrected OSEM reconstructions with resolution recovery corrections (RRC). The performance of the MC code was 3200 million photons/s. The required number of photons emitted per voxel to obtain a sufficiently low noise level in the simulated image was 200 for a 128 3 voxel matrix. With this number of emitted photons/voxel, the MC-based OSEM reconstruction with ten subsets was performed within 20 s/iteration. The images converged after around six iterations. Therefore, the reconstruction time was around 3 min. The activity recovery for the spheres in the Jaszczak phantom was clearly improved with MC-based OSEM reconstruction, e.g., the activity recovery was 88% for the largest sphere, while it was 66% for AC-OSEM and 79% for RRC-OSEM. The GPU-based MC code generated an MC-based SPECT/CT reconstruction within a few minutes, and reconstructed patient images of 177 Lu-DOTATATE treatments revealed clearly improved resolution and contrast.
Xu, Zheng; Wang, Sheng; Li, Yeqing; Zhu, Feiyun; Huang, Junzhou
2018-02-08
The most recent history of parallel Magnetic Resonance Imaging (pMRI) has in large part been devoted to finding ways to reduce acquisition time. While joint total variation (JTV) regularized model has been demonstrated as a powerful tool in increasing sampling speed for pMRI, however, the major bottleneck is the inefficiency of the optimization method. While all present state-of-the-art optimizations for the JTV model could only reach a sublinear convergence rate, in this paper, we squeeze the performance by proposing a linear-convergent optimization method for the JTV model. The proposed method is based on the Iterative Reweighted Least Squares algorithm. Due to the complexity of the tangled JTV objective, we design a novel preconditioner to further accelerate the proposed method. Extensive experiments demonstrate the superior performance of the proposed algorithm for pMRI regarding both accuracy and efficiency compared with state-of-the-art methods.
A parallel-processing approach to computing for the geographic sciences
Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Haga, Jim; Maddox, Brian; Feller, Mark
2001-01-01
The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting research into various areas, such as advanced computer architecture, algorithms to meet the processing needs for real-time image and data processing, the creation of custom datasets from seamless source data, rapid turn-around of products for emergency response, and support for computationally intense spatial and temporal modeling.
Adaptive proxy map server for efficient vector spatial data rendering
NASA Astrophysics Data System (ADS)
Sayar, Ahmet
2013-01-01
The rapid transmission of vector map data over the Internet is becoming a bottleneck of spatial data delivery and visualization in web-based environment because of increasing data amount and limited network bandwidth. In order to improve both the transmission and rendering performances of vector spatial data over the Internet, we propose a proxy map server enabling parallel vector data fetching as well as caching to improve the performance of web-based map servers in a dynamic environment. Proxy map server is placed seamlessly anywhere between the client and the final services, intercepting users' requests. It employs an efficient parallelization technique based on spatial proximity and data density in case distributed replica exists for the same spatial data. The effectiveness of the proposed technique is proved at the end of the article by the application of creating map images enriched with earthquake seismic data records.
Parallel Polarization State Generation
NASA Astrophysics Data System (ADS)
She, Alan; Capasso, Federico
2016-05-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy
2017-10-06
The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.
NASA Astrophysics Data System (ADS)
Ren, Zhong; Liu, Guodong; Huang, Zhen
2012-11-01
The image reconstruction is a key step in medical imaging (MI) and its algorithm's performance determinates the quality and resolution of reconstructed image. Although some algorithms have been used, filter back-projection (FBP) algorithm is still the classical and commonly-used algorithm in clinical MI. In FBP algorithm, filtering of original projection data is a key step in order to overcome artifact of the reconstructed image. Since simple using of classical filters, such as Shepp-Logan (SL), Ram-Lak (RL) filter have some drawbacks and limitations in practice, especially for the projection data polluted by non-stationary random noises. So, an improved wavelet denoising combined with parallel-beam FBP algorithm is used to enhance the quality of reconstructed image in this paper. In the experiments, the reconstructed effects were compared between the improved wavelet denoising and others (directly FBP, mean filter combined FBP and median filter combined FBP method). To determine the optimum reconstruction effect, different algorithms, and different wavelet bases combined with three filters were respectively test. Experimental results show the reconstruction effect of improved FBP algorithm is better than that of others. Comparing the results of different algorithms based on two evaluation standards i.e. mean-square error (MSE), peak-to-peak signal-noise ratio (PSNR), it was found that the reconstructed effects of the improved FBP based on db2 and Hanning filter at decomposition scale 2 was best, its MSE value was less and the PSNR value was higher than others. Therefore, this improved FBP algorithm has potential value in the medical imaging.
Choe, Leila H; Lee, Kelvin H
2003-10-01
We investigate one approach to assess the quantitative variability in two-dimensional gel electrophoresis (2-DE) separations based on gel-to-gel variability, sample preparation variability, sample load differences, and the effect of automation on image analysis. We observe that 95% of spots present in three out of four replicate gels exhibit less than a 0.52 coefficient of variation (CV) in fluorescent stain intensity (% volume) for a single sample run on multiple gels. When four parallel sample preparations are performed, this value increases to 0.57. We do not observe any significant change in quantitative value for an increase or decrease in sample load of 30% when using appropriate image analysis variables. Increasing use of automation, while necessary in modern 2-DE experiments, does change the observed level of quantitative and qualitative variability among replicate gels. The number of spots that change qualitatively for a single sample run in parallel varies from a CV = 0.03 for fully manual analysis to CV = 0.20 for a fully automated analysis. We present a systematic method by which a single laboratory can measure gel-to-gel variability using only three gel runs.
ACCELERATING MR PARAMETER MAPPING USING SPARSITY-PROMOTING REGULARIZATION IN PARAMETRIC DIMENSION
Velikina, Julia V.; Alexander, Andrew L.; Samsonov, Alexey
2013-01-01
MR parameter mapping requires sampling along additional (parametric) dimension, which often limits its clinical appeal due to a several-fold increase in scan times compared to conventional anatomic imaging. Data undersampling combined with parallel imaging is an attractive way to reduce scan time in such applications. However, inherent SNR penalties of parallel MRI due to noise amplification often limit its utility even at moderate acceleration factors, requiring regularization by prior knowledge. In this work, we propose a novel regularization strategy, which utilizes smoothness of signal evolution in the parametric dimension within compressed sensing framework (p-CS) to provide accurate and precise estimation of parametric maps from undersampled data. The performance of the method was demonstrated with variable flip angle T1 mapping and compared favorably to two representative reconstruction approaches, image space-based total variation regularization and an analytical model-based reconstruction. The proposed p-CS regularization was found to provide efficient suppression of noise amplification and preservation of parameter mapping accuracy without explicit utilization of analytical signal models. The developed method may facilitate acceleration of quantitative MRI techniques that are not suitable to model-based reconstruction because of complex signal models or when signal deviations from the expected analytical model exist. PMID:23213053
Fast disk array for image storage
NASA Astrophysics Data System (ADS)
Feng, Dan; Zhu, Zhichun; Jin, Hai; Zhang, Jiangling
1997-01-01
A fast disk array is designed for the large continuous image storage. It includes a high speed data architecture and the technology of data striping and organization on the disk array. The high speed data path which is constructed by two dual port RAM and some control circuit is configured to transfer data between a host system and a plurality of disk drives. The bandwidth can be more than 100 MB/s if the data path based on PCI (peripheral component interconnect). The organization of data stored on the disk array is similar to RAID 4. Data are striped on a plurality of disk, and each striping unit is equal to a track. I/O instructions are performed in parallel on the disk drives. An independent disk is used to store the parity information in the fast disk array architecture. By placing the parity generation circuit directly on the SCSI (or SCSI 2) bus, the parity information can be generated on the fly. It will affect little on the data writing in parallel on the other disks. The fast disk array architecture designed in the paper can meet the demands of the image storage.
Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study
DOE Office of Scientific and Technical Information (OSTI.GOV)
E, J. C.; Huang, J. Y.; Bie, B. X.
Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less
Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study
E, J. C.; Huang, J. Y.; Bie, B. X.; ...
2016-08-02
Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less
Cache write generate for parallel image processing on shared memory architectures.
Wittenbrink, C M; Somani, A K; Chen, C H
1996-01-01
We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
Breast ultrasound tomography with two parallel transducer arrays: preliminary clinical results
NASA Astrophysics Data System (ADS)
Huang, Lianjie; Shin, Junseob; Chen, Ting; Lin, Youzuo; Intrator, Miranda; Hanson, Kenneth; Epstein, Katherine; Sandoval, Daniel; Williamson, Michael
2015-03-01
Ultrasound tomography has great potential to provide quantitative estimations of physical properties of breast tumors for accurate characterization of breast cancer. We design and manufacture a new synthetic-aperture breast ultrasound tomography system with two parallel transducer arrays. The distance of these two transducer arrays is adjustable for scanning breasts with different sizes. The ultrasound transducer arrays are translated vertically to scan the entire breast slice by slice and acquires ultrasound transmission and reflection data for whole-breast ultrasound imaging and tomographic reconstructions. We use the system to acquire patient data at the University of New Mexico Hospital for clinical studies. We present some preliminary imaging results of in vivo patient ultrasound data. Our preliminary clinical imaging results show promising of our breast ultrasound tomography system with two parallel transducer arrays for breast cancer imaging and characterization.
The 2nd Symposium on the Frontiers of Massively Parallel Computations
NASA Technical Reports Server (NTRS)
Mills, Ronnie (Editor)
1988-01-01
Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Moody, Katherine Lynn; Hollingsworth, Neal A.; Zhao, Feng; Nielsen, Jon-Fredrik; Noll, Douglas C.; Wright, Steven M.; McDougall, Mary Preston
2014-01-01
Parallel transmit is an emerging technology to address the technical challenges associated with MR imaging at high field strengths. When developing arrays for parallel transmit systems, one of the primary factors to be considered is the mechanism to manage coupling and create independently operating channels. Recent work has demonstrated the use of amplifiers to provide some or all of the channel-to-channel isolation, reducing the need for on-coil decoupling networks in a manner analogous to the use of isolation preamplifiers with receive coils. This paper discusses an eight-channel transmit/receive head array for use with an ultra-low output impedance (ULOI) parallel transmit system. The ULOI amplifiers eliminated the need for a complex lumped element network to decouple the eight rung array. The design and construction details of the array are discussed in addition to the measurement considerations required for appropriately characterizing an array when using ULOI amplifiers. B1 maps and coupling matrices are used to verify the performance of the system. PMID:25072190
NASA Astrophysics Data System (ADS)
Moody, Katherine Lynn; Hollingsworth, Neal A.; Zhao, Feng; Nielsen, Jon-Fredrik; Noll, Douglas C.; Wright, Steven M.; McDougall, Mary Preston
2014-09-01
Parallel transmit is an emerging technology to address the technical challenges associated with MR imaging at high field strengths. When developing arrays for parallel transmit systems, one of the primary factors to be considered is the mechanism to manage coupling and create independently operating channels. Recent work has demonstrated the use of amplifiers to provide some or all of the channel-to-channel isolation, reducing the need for on-coil decoupling networks in a manner analogous to the use of isolation preamplifiers with receive coils. This paper discusses an eight-channel transmit/receive head array for use with an ultra-low output impedance (ULOI) parallel transmit system. The ULOI amplifiers eliminated the need for a complex lumped element network to decouple the eight-rung array. The design and construction details of the array are discussed in addition to the measurement considerations required for appropriately characterizing an array when using ULOI amplifiers. B1 maps and coupling matrices are used to verify the performance of the system.
SU-E-J-91: FFT Based Medical Image Registration Using a Graphics Processing Unit (GPU).
Luce, J; Hoggarth, M; Lin, J; Block, A; Roeske, J
2012-06-01
To evaluate the efficiency gains obtained from using a Graphics Processing Unit (GPU) to perform a Fourier Transform (FT) based image registration. Fourier-based image registration involves obtaining the FT of the component images, and analyzing them in Fourier space to determine the translations and rotations of one image set relative to another. An important property of FT registration is that by enlarging the images (adding additional pixels), one can obtain translations and rotations with sub-pixel resolution. The expense, however, is an increased computational time. GPUs may decrease the computational time associated with FT image registration by taking advantage of their parallel architecture to perform matrix computations much more efficiently than a Central Processor Unit (CPU). In order to evaluate the computational gains produced by a GPU, images with known translational shifts were utilized. A program was written in the Interactive Data Language (IDL; Exelis, Boulder, CO) to performCPU-based calculations. Subsequently, the program was modified using GPU bindings (Tech-X, Boulder, CO) to perform GPU-based computation on the same system. Multiple image sizes were used, ranging from 256×256 to 2304×2304. The time required to complete the full algorithm by the CPU and GPU were benchmarked and the speed increase was defined as the ratio of the CPU-to-GPU computational time. The ratio of the CPU-to- GPU time was greater than 1.0 for all images, which indicates the GPU is performing the algorithm faster than the CPU. The smallest improvement, a 1.21 ratio, was found with the smallest image size of 256×256, and the largest speedup, a 4.25 ratio, was observed with the largest image size of 2304×2304. GPU programming resulted in a significant decrease in computational time associated with a FT image registration algorithm. The inclusion of the GPU may provide near real-time, sub-pixel registration capability. © 2012 American Association of Physicists in Medicine.
General phase regularized reconstruction using phase cycling.
Ong, Frank; Cheng, Joseph Y; Lustig, Michael
2018-07-01
To develop a general phase regularized image reconstruction method, with applications to partial Fourier imaging, water-fat imaging and flow imaging. The problem of enforcing phase constraints in reconstruction was studied under a regularized inverse problem framework. A general phase regularized reconstruction algorithm was proposed to enable various joint reconstruction of partial Fourier imaging, water-fat imaging and flow imaging, along with parallel imaging (PI) and compressed sensing (CS). Since phase regularized reconstruction is inherently non-convex and sensitive to phase wraps in the initial solution, a reconstruction technique, named phase cycling, was proposed to render the overall algorithm invariant to phase wraps. The proposed method was applied to retrospectively under-sampled in vivo datasets and compared with state of the art reconstruction methods. Phase cycling reconstructions showed reduction of artifacts compared to reconstructions without phase cycling and achieved similar performances as state of the art results in partial Fourier, water-fat and divergence-free regularized flow reconstruction. Joint reconstruction of partial Fourier + water-fat imaging + PI + CS, and partial Fourier + divergence-free regularized flow imaging + PI + CS were demonstrated. The proposed phase cycling reconstruction provides an alternative way to perform phase regularized reconstruction, without the need to perform phase unwrapping. It is robust to the choice of initial solutions and encourages the joint reconstruction of phase imaging applications. Magn Reson Med 80:112-125, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Goerner, Frank L.; Duong, Timothy; Stafford, R. Jason; Clarke, Geoffrey D.
2013-01-01
Purpose: To investigate the utility of five different standard measurement methods for determining image uniformity for partially parallel imaging (PPI) acquisitions in terms of consistency across a variety of pulse sequences and reconstruction strategies. Methods: Images were produced with a phantom using a 12-channel head matrix coil in a 3T MRI system (TIM TRIO, Siemens Medical Solutions, Erlangen, Germany). Images produced using echo-planar, fast spin echo, gradient echo, and balanced steady state free precession pulse sequences were evaluated. Two different PPI reconstruction methods were investigated, generalized autocalibrating partially parallel acquisition algorithm (GRAPPA) and modified sensitivity-encoding (mSENSE) with acceleration factors (R) of 2, 3, and 4. Additionally images were acquired with conventional, two-dimensional Fourier imaging methods (R = 1). Five measurement methods of uniformity, recommended by the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) were considered. The methods investigated were (1) an ACR method and a (2) NEMA method for calculating the peak deviation nonuniformity, (3) a modification of a NEMA method used to produce a gray scale uniformity map, (4) determining the normalized absolute average deviation uniformity, and (5) a NEMA method that focused on 17 areas of the image to measure uniformity. Changes in uniformity as a function of reconstruction method at the same R-value were also investigated. Two-way analysis of variance (ANOVA) was used to determine whether R-value or reconstruction method had a greater influence on signal intensity uniformity measurements for partially parallel MRI. Results: Two of the methods studied had consistently negative slopes when signal intensity uniformity was plotted against R-value. The results obtained comparing mSENSE against GRAPPA found no consistent difference between GRAPPA and mSENSE with regard to signal intensity uniformity. The results of the two-way ANOVA analysis suggest that R-value and pulse sequence type produce the largest influences on uniformity and PPI reconstruction method had relatively little effect. Conclusions: Two of the methods of measuring signal intensity uniformity, described by the (NEMA) MRI standards, consistently indicated a decrease in uniformity with an increase in R-value. Other methods investigated did not demonstrate consistent results for evaluating signal uniformity in MR images obtained by partially parallel methods. However, because the spatial distribution of noise affects uniformity, it is recommended that additional uniformity quality metrics be investigated for partially parallel MR images. PMID:23927345
New Imaging Strategies Using a Motion-Resistant Liver Sequence in Uncooperative Patients
Kim, Bong Soo; Lee, Kyung Ryeol; Goh, Myeng Ju
2014-01-01
MR imaging has unique benefits for evaluating the liver because of its high-resolution capability and ability to permit detailed assessment of anatomic lesions. In uncooperative patients, motion artifacts can impair the image quality and lead to the loss of diagnostic information. In this setting, the recent advances in motion-resistant liver MR techniques, including faster imaging protocols (e.g., dual-echo magnetization-prepared rapid-acquisition gradient echo (MP-RAGE), view-sharing technique), the data under-sampling (e.g., gradient recalled echo (GRE) with controlled aliasing in parallel imaging results in higher acceleration (CAIPIRINHA), single-shot echo-train spin-echo (SS-ETSE)), and motion-artifact minimization method (e.g., radial GRE with/without k-space-weighted image contrast (KWIC)), can provide consistent, artifact-free images with adequate image quality and can lead to promising diagnostic performance. Understanding of the different motion-resistant options allows radiologists to adopt the most appropriate technique for their clinical practice and thereby significantly improve patient care. PMID:25243115
Nakada, Tsutomu; Matsuzawa, Hitoshi; Fujii, Yukihiko; Takahashi, Hitoshi; Nishizawa, Masatoyo; Kwee, Ingrid L
2006-07-01
Clinical magnetic resonance imaging (MRI) has recently entered the "high-field" era, and systems equipped with 3.0-4.0T superconductive magnets are becoming the gold standard for diagnostic imaging. While higher signal-to-noise ratio (S/N) is a definite advantage of higher field systems, higher susceptibility effect remains to be a significant trade-off. To take advantage of a higher field system in performing routine clinical images of higher anatomical resolution, we implemented a vector contrast image technique to 3.0T imaging, three-dimensional anisotropy contrast (3DAC), with a PROPELLER (Periodically Rotated Overlapping Parallel Lines with Enhanced Reconstruction) sequence, a method capable of effectively eliminating undesired artifacts on rapid diffusion imaging sequences. One hundred subjects (20 normal volunteers and 80 volunteers with various central nervous system diseases) participated in the study. Anisotropic diffusion-weighted PROPELLER images were obtained on a General Electric (Waukesha, WI, USA) Signa 3.0T for each axis, with b-value of 1100 sec/mm(2). Subsequently, 3DAC images were constructed using in-house software written on MATLAB (MathWorks, Natick, MA, USA). The vector contrast allows for providing exquisite anatomical detail illustrated by clear identification of all major tracts through the entire brain. 3DAC images provide better anatomical resolution for brainstem glioma than higher-resolution T2 reversed images. Degenerative processes of disease-specific tracts were clearly identified as illustrated in cases of multiple system atrophy and Joseph-Machado disease. Anatomical images of significantly higher resolution than the best current standard, T2 reversed images, were successfully obtained. As a technique readily applicable under routine clinical setting, 3DAC PROPELLER on a 3.0T system will be a powerful addition to diagnostic imaging.
Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P
2017-03-01
Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely used calibrationless uniformly undersampled trajectories. Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. The SENSE-LORAKS framework provides promising new opportunities for highly accelerated MRI. Magn Reson Med 77:1021-1035, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee
2008-01-01
High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
An information theory of image gathering
NASA Technical Reports Server (NTRS)
Fales, Carl L.; Huck, Friedrich O.
1991-01-01
Shannon's mathematical theory of communication is extended to image gathering. Expressions are obtained for the total information that is received with a single image-gathering channel and with parallel channels. It is concluded that the aliased signal components carry information even though these components interfere with the within-passband components in conventional image gathering and restoration, thereby degrading the fidelity and visual quality of the restored image. An examination of the expression for minimum mean-square-error, or Wiener-matrix, restoration from parallel image-gathering channels reveals a method for unscrambling the within-passband and aliased signal components to restore spatial frequencies beyond the sampling passband out to the spatial frequency response cutoff of the optical aperture.
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms
Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein
2017-01-01
Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts’ Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2–100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms. PMID:28487831
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms.
Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein
2017-01-01
Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts' Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2-100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms.
NASA Astrophysics Data System (ADS)
Sourbier, F.; Operto, S.; Virieux, J.
2006-12-01
We present a distributed-memory parallel algorithm for 2D visco-acoustic full-waveform inversion of wide-angle seismic data. Our code is written in fortran90 and use MPI for parallelism. The algorithm was applied to real wide-angle data set recorded by 100 OBSs with a 1-km spacing in the eastern-Nankai trough (Japan) to image the deep structure of the subduction zone. Full-waveform inversion is applied sequentially to discrete frequencies by proceeding from the low to the high frequencies. The inverse problem is solved with a classic gradient method. Full-waveform modeling is performed with a frequency-domain finite-difference method. In the frequency-domain, solving the wave equation requires resolution of a large unsymmetric system of linear equations. We use the massively parallel direct solver MUMPS (http://www.enseeiht.fr/irit/apo/MUMPS) for distributed-memory computer to solve this system. The MUMPS solver is based on a multifrontal method for the parallel factorization. The MUMPS algorithm is subdivided in 3 main steps: a symbolic analysis step that performs re-ordering of the matrix coefficients to minimize the fill-in of the matrix during the subsequent factorization and an estimation of the assembly tree of the matrix. Second, the factorization is performed with dynamic scheduling to accomodate numerical pivoting and provides the LU factors distributed over all the processors. Third, the resolution is performed for multiple sources. To compute the gradient of the cost function, 2 simulations per shot are required (one to compute the forward wavefield and one to back-propagate residuals). The multi-source resolutions can be performed in parallel with MUMPS. In the end, each processor stores in core a sub-domain of all the solutions. These distributed solutions can be exploited to compute in parallel the gradient of the cost function. Since the gradient of the cost function is a weighted stack of the shot and residual solutions of MUMPS, each processor computes the corresponding sub-domain of the gradient. In the end, the gradient is centralized on the master processor using a collective communation. The gradient is scaled by the diagonal elements of the Hessian matrix. This scaling is computed only once per frequency before the first iteration of the inversion. Estimation of the diagonal terms of the Hessian requires performing one simulation per non redondant shot and receiver position. The same strategy that the one used for the gradient is used to compute the diagonal Hessian in parallel. This algorithm was applied to a dense wide-angle data set recorded by 100 OBSs in the eastern Nankai trough, offshore Japan. Thirteen frequencies ranging from 3 and 15 Hz were inverted. Tweny iterations per frequency were computed leading to 260 tomographic velocity models of increasing resolution. The velocity model dimensions are 105 km x 25 km corresponding to a finite-difference grid of 4201 x 1001 grid with a 25-m grid interval. The number of shot was 1005 and the number of inverted OBS gathers was 93. The inversion requires 20 days on 6 32-bits bi-processor nodes with 4 Gbytes of RAM memory per node when only the LU factorization is performed in parallel. Preliminary estimations of the time required to perform the inversion with the fully-parallelized code is 6 and 4 days using 20 and 50 processors respectively.
Parallelization of a blind deconvolution algorithm
NASA Astrophysics Data System (ADS)
Matson, Charles L.; Borelli, Kathy J.
2006-09-01
Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.
NASA Astrophysics Data System (ADS)
Denz, Cornelia; Dellwig, Thilo; Lembcke, Jan; Tschudi, Theo
1996-02-01
We propose and demonstrate experimentally a method for utilizing a dynamic phase-encoded photorefractive memory to realize parallel optical addition, subtraction, and inversion operations of stored images. The phase-encoded holographic memory is realized in photorefractive BaTiO3, storing eight images using WalshHadamard binary phase codes and an incremental recording procedure. By subsampling the set of reference beams during the recall operation, the selectivity of the phase address is decreased, allowing one to combine images in such a way that different linear combination of the images can be realized at the output of the memory.
Lens-based wavefront sensorless adaptive optics swept source OCT
NASA Astrophysics Data System (ADS)
Jian, Yifan; Lee, Sujin; Ju, Myeong Jin; Heisler, Morgan; Ding, Weiguang; Zawadzki, Robert J.; Bonora, Stefano; Sarunic, Marinko V.
2016-06-01
Optical coherence tomography (OCT) has revolutionized modern ophthalmology, providing depth resolved images of the retinal layers in a system that is suited to a clinical environment. Although the axial resolution of OCT system, which is a function of the light source bandwidth, is sufficient to resolve retinal features at a micrometer scale, the lateral resolution is dependent on the delivery optics and is limited by ocular aberrations. Through the combination of wavefront sensorless adaptive optics and the use of dual deformable transmissive optical elements, we present a compact lens-based OCT system at an imaging wavelength of 1060 nm for high resolution retinal imaging. We utilized a commercially available variable focal length lens to correct for a wide range of defocus commonly found in patient’s eyes, and a novel multi-actuator adaptive lens for aberration correction to achieve near diffraction limited imaging performance at the retina. With a parallel processing computational platform, high resolution cross-sectional and en face retinal image acquisition and display was performed in real time. In order to demonstrate the system functionality and clinical utility, we present images of the photoreceptor cone mosaic and other retinal layers acquired in vivo from research subjects.
Restoration for Noise Removal in Quantum Images
NASA Astrophysics Data System (ADS)
Liu, Kai; Zhang, Yi; Lu, Kai; Wang, Xiaoping
2017-09-01
Quantum computation has become increasingly attractive in the past few decades due to its extraordinary performance. As a result, some studies focusing on image representation and processing via quantum mechanics have been done. However, few of them have considered the quantum operations for images restoration. To address this problem, three noise removal algorithms are proposed in this paper based on the novel enhanced quantum representation model, oriented to two kinds of noise pollution (Salt-and-Pepper noise and Gaussian noise). For the first algorithm Q-Mean, it is designed to remove the Salt-and-Pepper noise. The noise points are extracted through comparisons with the adjacent pixel values, after which the restoration operation is finished by mean filtering. As for the second method Q-Gauss, a special mask is applied to weaken the Gaussian noise pollution. The third algorithm Q-Adapt is effective for the source image containing unknown noise. The type of noise can be judged through the quantum statistic operations for the color value of the whole image, and then different noise removal algorithms are used to conduct image restoration respectively. Performance analysis reveals that our methods can offer high restoration quality and achieve significant speedup through inherent parallelism of quantum computation.
Thalhammer, Christof; Renz, Wolfgang; Winter, Lukas; Hezel, Fabian; Rieger, Jan; Pfeiffer, Harald; Graessl, Andreas; Seifert, Frank; Hoffmann, Werner; von Knobelsdorff-Brenkenhoff, Florian; Tkachenko, Valeriy; Schulz-Menger, Jeanette; Kellman, Peter; Niendorf, Thoralf
2012-01-01
Purpose To design, evaluate and apply a two-dimensional 16 channel transmit/receive coil array tailored for cardiac MRI at 7.0 Tesla. Material and Methods The cardiac coil array consists of 2 sections each using 8 elements arranged in a 2 × 4 array. RF safety was validated by SAR simulations. Cardiac imaging was performed using 2D CINE FLASH imaging, T2* mapping and fat-water separation imaging. The characteristics of the coil array were analyzed including parallel imaging performance, left ventricular chamber quantification and overall image quality. Results RF characteristics were found to be appropriate for all subjects included in the study. The SAR values derived from the simulations fall well in the limits of legal guidelines. The baseline SNR advantage at 7.0 T was put to use to acquire 2D CINE images of the heart with a very high spatial resolution of (1 × 1 × 4) mm3. The proposed coil array supports 1D acceleration factors of up to R=4 without impairing image quality significantly. Conclusions The 16 channel TX/RX coil has the capability to acquire high contrast and high spatial resolution images of the heart at 7.0 Tesla. PMID:22706727
NASA Astrophysics Data System (ADS)
Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie
2018-01-01
The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.
Implementation of a high-speed face recognition system that uses an optical parallel correlator.
Watanabe, Eriko; Kodate, Kashiko
2005-02-10
We implement a fully automatic fast face recognition system by using a 1000 frame/s optical parallel correlator designed and assembled by us. The operational speed for the 1:N (i.e., matching one image against N, where N refers to the number of images in the database) identification experiment (4000 face images) amounts to less than 1.5 s, including the preprocessing and postprocessing times. The binary real-only matched filter is devised for the sake of face recognition, and the system is optimized by the false-rejection rate (FRR) and the false-acceptance rate (FAR), according to 300 samples selected by the biometrics guideline. From trial 1:N identification experiments with the optical parallel correlator, we acquired low error rates of 2.6% FRR and 1.3% FAR. Facial images of people wearing thin glasses or heavy makeup that rendered identification difficult were identified with this system.
GPU real-time processing in NA62 trigger system
NASA Astrophysics Data System (ADS)
Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cretaro, P.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Piccini, M.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.
2017-01-01
A commercial Graphics Processing Unit (GPU) is used to build a fast Level 0 (L0) trigger system tested parasitically with the TDAQ (Trigger and Data Acquisition systems) of the NA62 experiment at CERN. In particular, the parallel computing power of the GPU is exploited to perform real-time fitting in the Ring Imaging CHerenkov (RICH) detector. Direct GPU communication using a FPGA-based board has been used to reduce the data transmission latency. The performance of the system for multi-ring reconstrunction obtained during the NA62 physics run will be presented.
Novel approach for image skeleton and distance transformation parallel algorithms
NASA Astrophysics Data System (ADS)
Qing, Kent P.; Means, Robert W.
1994-05-01
Image Understanding is more important in medical imaging than ever, particularly where real-time automatic inspection, screening and classification systems are installed. Skeleton and distance transformations are among the common operations that extract useful information from binary images and aid in Image Understanding. The distance transformation describes the objects in an image by labeling every pixel in each object with the distance to its nearest boundary. The skeleton algorithm starts from the distance transformation and finds the set of pixels that have a locally maximum label. The distance algorithm has to scan the entire image several times depending on the object width. For each pixel, the algorithm must access the neighboring pixels and find the maximum distance from the nearest boundary. It is a computational and memory access intensive procedure. In this paper, we propose a novel parallel approach to the distance transform and skeleton algorithms using the latest VLSI high- speed convolutional chips such as HNC's ViP. The algorithm speed is dependent on the object's width and takes (k + [(k-1)/3]) * 7 milliseconds for a 512 X 512 image with k being the maximum distance of the largest object. All objects in the image will be skeletonized at the same time in parallel.
Design and Development of a Megavoltage CT Scanner for Radiation Therapy.
NASA Astrophysics Data System (ADS)
Chen, Ching-Tai
A Varian 4 MeV isocentric therapy accelerator has been modified to perform also as a CT scanner. The goal is to provide low cost computed tomography capability for use in radiotherapy. The system will have three principal uses. These are (i) to provide 2- and 3-dimensional maps of electron density distribution for CT assisted therapy planning, (ii) to aid in patient set up by providing sectional views of the treatment volume and high contrast scout-mode verification images and (iii) to provide a means for periodically checking the patients anatomical conformation against what was used to generate the original therapy plan. The treatment machine was modified by mounting an array of detectors on a frame bolted to the counter weight end of the gantry in such a manner as to define a 'third generation' CT Scanner geometry. The data gathering is controlled by a Z-80 based microcomputer system which transfers the x-ray transmission data to a general purpose PDP 11/34 for processing. There a series of calibration processes and a logarithmic conversion are performed to get projection data. After reordering the projection data to an equivalent parallel beam sinogram format a convolution algorithm is employed to construct the image from the equivalent parallel projection data. Results of phantom studies have shown a spatial resolution of 2.6 mm and an electron density discrimination of less than 1% which are sufficiently good for accurate therapy planning. Results also show that the system is linear to within the precision of our measurement ((DBLTURN).75%) over a wide range of electron densities corresponding to those found in body tissues. Animal and human images are also presented to demonstrate that the system's imaging capability is sufficient to allow the necessary visualization of anatomy.
Vergara, Victor M; Ulloa, Alvaro; Calhoun, Vince D; Boutte, David; Chen, Jiayu; Liu, Jingyu
2014-09-01
Multi-modal data analysis techniques, such as the Parallel Independent Component Analysis (pICA), are essential in neuroscience, medical imaging and genetic studies. The pICA algorithm allows the simultaneous decomposition of up to two data modalities achieving better performance than separate ICA decompositions and enabling the discovery of links between modalities. However, advances in data acquisition techniques facilitate the collection of more than two data modalities from each subject. Examples of commonly measured modalities include genetic information, structural magnetic resonance imaging (MRI) and functional MRI. In order to take full advantage of the available data, this work extends the pICA approach to incorporate three modalities in one comprehensive analysis. Simulations demonstrate the three-way pICA performance in identifying pairwise links between modalities and estimating independent components which more closely resemble the true sources than components found by pICA or separate ICA analyses. In addition, the three-way pICA algorithm is applied to real experimental data obtained from a study that investigate genetic effects on alcohol dependence. Considered data modalities include functional MRI (contrast images during alcohol exposure paradigm), gray matter concentration images from structural MRI and genetic single nucleotide polymorphism (SNP). The three-way pICA approach identified links between a SNP component (pointing to brain function and mental disorder associated genes, including BDNF, GRIN2B and NRG1), a functional component related to increased activation in the precuneus area, and a gray matter component comprising part of the default mode network and the caudate. Although such findings need further verification, the simulation and in-vivo results validate the three-way pICA algorithm presented here as a useful tool in biomedical data fusion applications. Copyright © 2014 Elsevier Inc. All rights reserved.
Lossless data compression for improving the performance of a GPU-based beamformer.
Lok, U-Wai; Fan, Gang-Wei; Li, Pai-Chi
2015-04-01
The powerful parallel computation ability of a graphics processing unit (GPU) makes it feasible to perform dynamic receive beamforming However, a real time GPU-based beamformer requires high data rate to transfer radio-frequency (RF) data from hardware to software memory, as well as from central processing unit (CPU) to GPU memory. There are data compression methods (e.g. Joint Photographic Experts Group (JPEG)) available for the hardware front end to reduce data size, alleviating the data transfer requirement of the hardware interface. Nevertheless, the required decoding time may even be larger than the transmission time of its original data, in turn degrading the overall performance of the GPU-based beamformer. This article proposes and implements a lossless compression-decompression algorithm, which enables in parallel compression and decompression of data. By this means, the data transfer requirement of hardware interface and the transmission time of CPU to GPU data transfers are reduced, without sacrificing image quality. In simulation results, the compression ratio reached around 1.7. The encoder design of our lossless compression approach requires low hardware resources and reasonable latency in a field programmable gate array. In addition, the transmission time of transferring data from CPU to GPU with the parallel decoding process improved by threefold, as compared with transferring original uncompressed data. These results show that our proposed lossless compression plus parallel decoder approach not only mitigate the transmission bandwidth requirement to transfer data from hardware front end to software system but also reduce the transmission time for CPU to GPU data transfer. © The Author(s) 2014.
Spacecraft on-board SAR image generation for EOS-type missions
NASA Technical Reports Server (NTRS)
Liu, K. Y.; Arens, W. E.; Assal, H. M.; Vesecky, J. F.
1987-01-01
Spacecraft on-board synthetic aperture radar (SAR) image generation is an extremely difficult problem because of the requirements for high computational rates (usually on the order of Giga-operations per second), high reliability (some missions last up to 10 years), and low power dissipation and mass (typically less than 500 watts and 100 Kilograms). Recently, a JPL study was performed to assess the feasibility of on-board SAR image generation for EOS-type missions. This paper summarizes the results of that study. Specifically, it proposes a processor architecture using a VLSI time-domain parallel array for azimuth correlation. Using available space qualifiable technology to implement the proposed architecture, an on-board SAR processor having acceptable power and mass characteristics appears feasible for EOS-type applications.
SAND: an automated VLBI imaging and analysing pipeline - I. Stripping component trajectories
NASA Astrophysics Data System (ADS)
Zhang, M.; Collioud, A.; Charlot, P.
2018-02-01
We present our implementation of an automated very long baseline interferometry (VLBI) data-reduction pipeline that is dedicated to interferometric data imaging and analysis. The pipeline can handle massive VLBI data efficiently, which makes it an appropriate tool to investigate multi-epoch multiband VLBI data. Compared to traditional manual data reduction, our pipeline provides more objective results as less human interference is involved. The source extraction is carried out in the image plane, while deconvolution and model fitting are performed in both the image plane and the uv plane for parallel comparison. The output from the pipeline includes catalogues of CLEANed images and reconstructed models, polarization maps, proper motion estimates, core light curves and multiband spectra. We have developed a regression STRIP algorithm to automatically detect linear or non-linear patterns in the jet component trajectories. This algorithm offers an objective method to match jet components at different epochs and to determine their proper motions.
Improved image reconstruction of low-resolution multichannel phase contrast angiography
P. Krishnan, Akshara; Joy, Ajin; Paul, Joseph Suresh
2016-01-01
Abstract. In low-resolution phase contrast magnetic resonance angiography, the maximum intensity projected channel images will be blurred with consequent loss of vascular details. The channel images are enhanced using a stabilized deblurring filter, applied to each channel prior to combining the individual channel images. The stabilized deblurring is obtained by the addition of a nonlocal regularization term to the reverse heat equation, referred to as nonlocally stabilized reverse diffusion filter. Unlike reverse diffusion filter, which is highly unstable and blows up noise, nonlocal stabilization enhances intensity projected parallel images uniformly. Application to multichannel vessel enhancement is illustrated using both volunteer data and simulated multichannel angiograms. Robustness of the filter applied to volunteer datasets is shown using statistically validated improvement in flow quantification. Improved performance in terms of preserving vascular structures and phased array reconstruction in both simulated and real data is demonstrated using structureness measure and contrast ratio. PMID:26835501
NASA Astrophysics Data System (ADS)
Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.
1986-06-01
In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.
NASA Technical Reports Server (NTRS)
Otaguro, W. S.; Kesler, L. O.; Land, K. C.; Rhoades, D. E.
1987-01-01
An intelligent tracker capable of robotic applications requiring guidance and control of platforms, robotic arms, and end effectors has been developed. This packaged system capable of supervised autonomous robotic functions is partitioned into a multiple processor/parallel processing configuration. The system currently interfaces to cameras but has the capability to also use three-dimensional inputs from scanning laser rangers. The inputs are fed into an image processing and tracking section where the camera inputs are conditioned for the multiple tracker algorithms. An executive section monitors the image processing and tracker outputs and performs all the control and decision processes. The present architecture of the system is presented with discussion of its evolutionary growth for space applications. An autonomous rendezvous demonstration of this system was performed last year. More realistic demonstrations in planning are discussed.
Upper canine inclination influences the aesthetics of a smile.
Bothung, C; Fischer, K; Schiffer, H; Springer, I; Wolfart, S
2015-02-01
This current study investigated which angle of canine inclination (angle between canine tooth axis (CA-line) and the line between the lateral canthus and the ipsilateral labial angle (EM-line)) is perceived to be most attractive in a smile. The second objective was to determine whether laymen and dental experts share the same opinion. A Q-sort assessment was performed with 48 posed smile photographs to obtain two models of neutral facial attractiveness. Two sets of images (1 male model set, 1 female model set), each containing seven images with incrementally altered canine and posterior teeth inclinations, were generated. The images were ranked for attractiveness by three groups (61 laymen, 59 orthodontists, 60 dentists). The images with 0° inclination, that is CA-line (maxillary canine axis) parallel to EM-line (the line formed by the lateral canthus and the ipsilateral corner of the mouth) (male model set: 54·4%; female model set: 38·9%), or -5° (inward) inclination (male model set: 20%; female model set: 29·4%) were perceived to be most attractive within each set. Images showing inward canine inclinations were regarded to be more attractive than those with outward inclinations. Dental experts and laymen were in accordance with the aesthetics. Smiles were perceived to be most attractive when the upper canine tooth axis was parallel to the EM-line. In reconstructive or orthodontic therapy, it is thus important to incline canines more inwardly than outwardly. © 2014 John Wiley & Sons Ltd.
Gudino, Natalia; Duan, Qi; de Zwart, Jacco A; Murphy-Boesch, Joe; Dodd, Stephen J; Merkle, Hellmut; van Gelderen, Peter; Duyn, Jeff H
2015-01-01
Purpose We tested the feasibility of implementing parallel transmission (pTX) for high field MRI using a radiofrequency (RF) amplifier design to be located on or in the immediate vicinity of a RF transmit coil. Method We designed a current-source switch-mode amplifier based on miniaturized, non-magnetic electronics. Optical RF carrier and envelope signals to control the amplifier were derived, through a custom-built interface, from the RF source accessible in the scanner control. Amplifier performance was tested by benchtop measurements as well as with imaging at 7 T (300 MHz) and 11.7 T (500 MHz). The ability to perform pTX was evaluated by measuring inter-channel coupling and phase adjustment in a 2-channel setup. Results The amplifier delivered in excess of 44 W RF power and caused minimal interference with MRI. The interface derived accurate optical control signals with carrier frequencies ranging from 64 to 750 MHz. Decoupling better than 14 dB was obtained between 2 coil loops separated by only 1 cm. Application to MRI was demonstrated by acquiring artifact-free images at 7 T and 11.7 T. Conclusion An optically controlled miniaturized RF amplifier for on-coil implementation at high field is demonstrated that should facilitate implementation of high-density pTX arrays. PMID:26256671
Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi
2017-01-01
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. PMID:28786986
Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong
2017-01-01
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
Calibrationless parallel magnetic resonance imaging: a joint sparsity model.
Majumdar, Angshul; Chaudhury, Kunal Narayan; Ward, Rabab
2013-12-05
State-of-the-art parallel MRI techniques either explicitly or implicitly require certain parameters to be estimated, e.g., the sensitivity map for SENSE, SMASH and interpolation weights for GRAPPA, SPIRiT. Thus all these techniques are sensitive to the calibration (parameter estimation) stage. In this work, we have proposed a parallel MRI technique that does not require any calibration but yields reconstruction results that are at par with (or even better than) state-of-the-art methods in parallel MRI. Our proposed method required solving non-convex analysis and synthesis prior joint-sparsity problems. This work also derives the algorithms for solving them. Experimental validation was carried out on two datasets-eight channel brain and eight channel Shepp-Logan phantom. Two sampling methods were used-Variable Density Random sampling and non-Cartesian Radial sampling. For the brain data, acceleration factor of 4 was used and for the other an acceleration factor of 6 was used. The reconstruction results were quantitatively evaluated based on the Normalised Mean Squared Error between the reconstructed image and the originals. The qualitative evaluation was based on the actual reconstructed images. We compared our work with four state-of-the-art parallel imaging techniques; two calibrated methods-CS SENSE and l1SPIRiT and two calibration free techniques-Distributed CS and SAKE. Our method yields better reconstruction results than all of them.
Multitasking TORT under UNICOS: Parallel performance models and measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barnett, A.; Azmy, Y.Y.
1999-09-27
The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.
Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azmy, Y.Y.; Barnett, D.A.
1999-09-27
The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.
Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F
2002-02-01
This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2012-07-01
Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing.
Slit-Slat Collimator Equipped Gamma Camera for Whole-Mouse SPECT-CT Imaging
NASA Astrophysics Data System (ADS)
Cao, Liji; Peter, Jörg
2012-06-01
A slit-slat collimator is developed for a gamma camera intended for small-animal imaging (mice). The tungsten housing of a roof-shaped collimator forms a slit opening, and the slats are made of lead foils separated by sparse polyurethane material. Alignment of the collimator with the camera's pixelated crystal is performed by adjusting a micrometer screw while monitoring a Co-57 point source for maximum signal intensity. For SPECT, the collimator forms a cylindrical field-of-view enabling whole mouse imaging with transaxial magnification and constant on-axis sensitivity over the entire axial direction. As the gamma camera is part of a multimodal imaging system incorporating also x-ray CT, five parameters corresponding to the geometric displacements of the collimator as well as to the mechanical co-alignment between the gamma camera and the CT subsystem are estimated by means of bimodal calibration sources. To illustrate the performance of the slit-slat collimator and to compare its performance to a single pinhole collimator, a Derenzo phantom study is performed. Transaxial resolution along the entire long axis is comparable to a pinhole collimator of same pinhole diameter. Axial resolution of the slit-slat collimator is comparable to that of a parallel beam collimator. Additionally, data from an in-vivo mouse study are presented.
Concurrent computation of attribute filters on shared memory parallel machines.
Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold
2008-10-01
Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
Chahl, J S
2014-01-20
This paper describes an application for arrays of narrow-field-of-view sensors with parallel optical axes. These devices exhibit some complementary characteristics with respect to conventional perspective projection or angular projection imaging devices. Conventional imaging devices measure rotational egomotion directly by measuring the angular velocity of the projected image. Translational egomotion cannot be measured directly by these devices because the induced image motion depends on the unknown range of the viewed object. On the other hand, a known translational motion generates image velocities which can be used to recover the ranges of objects and hence the three-dimensional (3D) structure of the environment. A new method is presented for computing egomotion and range using the properties of linear arrays of independent narrow-field-of-view optical sensors. An approximate parallel projection can be used to measure translational egomotion in terms of the velocity of the image. On the other hand, a known rotational motion of the paraxial sensor array generates image velocities, which can be used to recover the 3D structure of the environment. Results of tests of an experimental array confirm these properties.
NASA Astrophysics Data System (ADS)
Bogdanov, Valery L.; Boyce-Jacino, Michael
1999-05-01
Confined arrays of biochemical probes deposited on a solid support surface (analytical microarray or 'chip') provide an opportunity to analysis multiple reactions simultaneously. Microarrays are increasingly used in genetics, medicine and environment scanning as research and analytical instruments. A power of microarray technology comes from its parallelism which grows with array miniaturization, minimization of reagent volume per reaction site and reaction multiplexing. An optical detector of microarray signals should combine high sensitivity, spatial and spectral resolution. Additionally, low-cost and a high processing rate are needed to transfer microarray technology into biomedical practice. We designed an imager that provides confocal and complete spectrum detection of entire fluorescently-labeled microarray in parallel. Imager uses microlens array, non-slit spectral decomposer, and high- sensitive detector (cooled CCD). Two imaging channels provide a simultaneous detection of localization, integrated and spectral intensities for each reaction site in microarray. A dimensional matching between microarray and imager's optics eliminates all in moving parts in instrumentation, enabling highly informative, fast and low-cost microarray detection. We report theory of confocal hyperspectral imaging with microlenses array and experimental data for implementation of developed imager to detect fluorescently labeled microarray with a density approximately 103 sites per cm2.
Harrison, Thomas C; Sigler, Albrecht; Murphy, Timothy H
2009-09-15
We describe a simple and low-cost system for intrinsic optical signal (IOS) imaging using stable LED light sources, basic microscopes, and commonly available CCD cameras. IOS imaging measures activity-dependent changes in the light reflectance of brain tissue, and can be performed with a minimum of specialized equipment. Our system uses LED ring lights that can be mounted on standard microscope objectives or video lenses to provide a homogeneous and stable light source, with less than 0.003% fluctuation across images averaged from 40 trials. We describe the equipment and surgical techniques necessary for both acute and chronic mouse preparations, and provide software that can create maps of sensory representations from images captured by inexpensive 8-bit cameras or by 12-bit cameras. The IOS imaging system can be adapted to commercial upright microscopes or custom macroscopes, eliminating the need for dedicated equipment or complex optical paths. This method can be combined with parallel high resolution imaging techniques such as two-photon microscopy.
Choi, Heejin; Tzeranis, Dimitrios S.; Cha, Jae Won; Clémenceau, Philippe; de Jong, Sander J. G.; van Geest, Lambertus K.; Moon, Joong Ho; Yannas, Ioannis V.; So, Peter T. C.
2012-01-01
Fluorescence and phosphorescence lifetime imaging are powerful techniques for studying intracellular protein interactions and for diagnosing tissue pathophysiology. While lifetime-resolved microscopy has long been in the repertoire of the biophotonics community, current implementations fall short in terms of simultaneously providing 3D resolution, high throughput, and good tissue penetration. This report describes a new highly efficient lifetime-resolved imaging method that combines temporal focusing wide-field multiphoton excitation and simultaneous acquisition of lifetime information in frequency domain using a nanosecond gated imager from a 3D-resolved plane. This approach is scalable allowing fast volumetric imaging limited only by the available laser peak power. The accuracy and performance of the proposed method is demonstrated in several imaging studies important for understanding peripheral nerve regeneration processes. Most importantly, the parallelism of this approach may enhance the imaging speed of long lifetime processes such as phosphorescence by several orders of magnitude. PMID:23187477
Denoising Sparse Images from GRAPPA using the Nullspace Method (DESIGN)
Weller, Daniel S.; Polimeni, Jonathan R.; Grady, Leo; Wald, Lawrence L.; Adalsteinsson, Elfar; Goyal, Vivek K
2011-01-01
To accelerate magnetic resonance imaging using uniformly undersampled (nonrandom) parallel imaging beyond what is achievable with GRAPPA alone, the Denoising of Sparse Images from GRAPPA using the Nullspace method (DESIGN) is developed. The trade-off between denoising and smoothing the GRAPPA solution is studied for different levels of acceleration. Several brain images reconstructed from uniformly undersampled k-space data using DESIGN are compared against reconstructions using existing methods in terms of difference images (a qualitative measure), PSNR, and noise amplification (g-factors) as measured using the pseudo-multiple replica method. Effects of smoothing, including contrast loss, are studied in synthetic phantom data. In the experiments presented, the contrast loss and spatial resolution are competitive with existing methods. Results for several brain images demonstrate significant improvements over GRAPPA at high acceleration factors in denoising performance with limited blurring or smoothing artifacts. In addition, the measured g-factors suggest that DESIGN mitigates noise amplification better than both GRAPPA and L1 SPIR-iT (the latter limited here by uniform undersampling). PMID:22213069
A review of GPU-based medical image reconstruction.
Després, Philippe; Jia, Xun
2017-10-01
Tomographic image reconstruction is a computationally demanding task, even more so when advanced models are used to describe a more complete and accurate picture of the image formation process. Such advanced modeling and reconstruction algorithms can lead to better images, often with less dose, but at the price of long calculation times that are hardly compatible with clinical workflows. Fortunately, reconstruction tasks can often be executed advantageously on Graphics Processing Units (GPUs), which are exploited as massively parallel computational engines. This review paper focuses on recent developments made in GPU-based medical image reconstruction, from a CT, PET, SPECT, MRI and US perspective. Strategies and approaches to get the most out of GPUs in image reconstruction are presented as well as innovative applications arising from an increased computing capacity. The future of GPU-based image reconstruction is also envisioned, based on current trends in high-performance computing. Copyright © 2017 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
Method for position emission mammography image reconstruction
Smith, Mark Frederick
2004-10-12
An image reconstruction method comprising accepting coincidence datat from either a data file or in real time from a pair of detector heads, culling event data that is outside a desired energy range, optionally saving the desired data for each detector position or for each pair of detector pixels on the two detector heads, and then reconstructing the image either by backprojection image reconstruction or by iterative image reconstruction. In the backprojection image reconstruction mode, rays are traced between centers of lines of response (LOR's), counts are then either allocated by nearest pixel interpolation or allocated by an overlap method and then corrected for geometric effects and attenuation and the data file updated. If the iterative image reconstruction option is selected, one implementation is to compute a grid Siddon retracing, and to perform maximum likelihood expectation maiximization (MLEM) computed by either: a) tracing parallel rays between subpixels on opposite detector heads; or b) tracing rays between randomized endpoint locations on opposite detector heads.
Iris unwrapping using the Bresenham circle algorithm for real-time iris recognition
NASA Astrophysics Data System (ADS)
Carothers, Matthew T.; Ngo, Hau T.; Rakvic, Ryan N.; Broussard, Randy P.
2015-02-01
An efficient parallel architecture design for the iris unwrapping process in a real-time iris recognition system using the Bresenham Circle Algorithm is presented in this paper. Based on the characteristics of the model parameters this algorithm was chosen over the widely used polar conversion technique as the iris unwrapping model. The architecture design is parallelized to increase the throughput of the system and is suitable for processing an inputted image size of 320 × 240 pixels in real-time using Field Programmable Gate Array (FPGA) technology. Quartus software is used to implement, verify, and analyze the design's performance using the VHSIC Hardware Description Language. The system's predicted processing time is faster than the modern iris unwrapping technique used today∗.
Ferré, Jean-Christophe; Petr, Jan; Bannier, Elise; Barillot, Christian; Gauvrit, Jean-Yves
2012-05-01
To compare 12-channel and 32-channel phased-array coils and to determine the optimal parallel imaging (PI) technique and factor for brain perfusion imaging using Pulsed Arterial Spin labeling (PASL) at 3 Tesla (T). Twenty-seven healthy volunteers underwent 10 different PASL perfusion PICORE Q2TIPS scans at 3T using 12-channel and 32-channel coils without PI and with GRAPPA or mSENSE using factor 2. PI with factor 3 and 4 were used only with the 32-channel coil. Visual quality was assessed using four parameters. Quantitative analyses were performed using temporal noise, contrast-to-noise and signal-to-noise ratios (CNR, SNR). Compared with 12-channel acquisition, the scores for 32-channel acquisition were significantly higher for overall visual quality, lower for noise and higher for SNR and CNR. With the 32-channel coil, artifact compromise achieved the best score with PI factor 2. Noise increased, SNR and CNR decreased with PI factor. However mSENSE 2 scores were not always significantly different from acquisition without PI. For PASL at 3T, the 32-channel coil at 3T provided better quality than the 12-channel coil. With the 32-channel coil, mSENSE 2 seemed to offer the best compromise for decreasing artifacts without significantly reducing SNR, CNR. Copyright © 2012 Wiley Periodicals, Inc.
A novel coil array for combined TMS/fMRI experiments at 3 T.
Navarro de Lara, Lucia I; Windischberger, Christian; Kuehne, Andre; Woletz, Michael; Sieg, Jürgen; Bestmann, Sven; Weiskopf, Nikolaus; Strasser, Bernhard; Moser, Ewald; Laistler, Elmar
2015-11-01
To overcome current limitations in combined transcranial magnetic stimulation (TMS) and functional magnetic resonance imaging (fMRI) studies by employing a dedicated coil array design for 3 Tesla. The state-of-the-art setup for concurrent TMS/fMRI is to use a large birdcage head coil, with the TMS between the subject's head and the MR coil. This setup has drawbacks in sensitivity, positioning, and available imaging techniques. In this study, an ultraslim 7-channel receive-only coil array for 3 T, which can be placed between the subject's head and the TMS, is presented. Interactions between the devices are investigated and the performance of the new setup is evaluated in comparison to the state-of-the-art setup. MR sensitivity obtained at the depth of the TMS stimulation is increased by a factor of five. Parallel imaging with an acceleration factor of two is feasible with low g-factors. Possible interactions between TMS and the novel hardware were investigated and were found negligible. The novel coil array is safe, strongly improves signal-to-noise ratio in concurrent TMS/fMRI experiments, enables parallel imaging, and allows for flexible positioning of the TMS on the head while ensuring efficient TMS stimulation due to its ultraslim design. © 2014 The Authors. Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.
NASA Astrophysics Data System (ADS)
Ding, Xuemei; Wang, Bingyuan; Liu, Dongyuan; Zhang, Yao; He, Jie; Zhao, Huijuan; Gao, Feng
2018-02-01
During the past two decades there has been a dramatic rise in the use of functional near-infrared spectroscopy (fNIRS) as a neuroimaging technique in cognitive neuroscience research. Diffuse optical tomography (DOT) and optical topography (OT) can be employed as the optical imaging techniques for brain activity investigation. However, most current imagers with analogue detection are limited by sensitivity and dynamic range. Although photon-counting detection can significantly improve detection sensitivity, the intrinsic nature of sequential excitations reduces temporal resolution. To improve temporal resolution, sensitivity and dynamic range, we develop a multi-channel continuous-wave (CW) system for brain functional imaging based on a novel lock-in photon-counting technique. The system consists of 60 Light-emitting device (LED) sources at three wavelengths of 660nm, 780nm and 830nm, which are modulated by current-stabilized square-wave signals at different frequencies, and 12 photomultiplier tubes (PMT) based on lock-in photon-counting technique. This design combines the ultra-high sensitivity of the photon-counting technique with the parallelism of the digital lock-in technique. We can therefore acquire the diffused light intensity for all the source-detector pairs (SD-pairs) in parallel. The performance assessments of the system are conducted using phantom experiments, and demonstrate its excellent measurement linearity, negligible inter-channel crosstalk, strong noise robustness and high temporal resolution.
Development of an add-on kit for scanning confocal microscopy (Conference Presentation)
NASA Astrophysics Data System (ADS)
Guo, Kaikai; Zheng, Guoan
2017-03-01
Scanning confocal microscopy is a standard choice for many fluorescence imaging applications in basic biomedical research. It is able to produce optically sectioned images and provide acquisition versatility to address many samples and application demands. However, scanning a focused point across the specimen limits the speed of image acquisition. As a result, scanning confocal microscope only works well with stationary samples. Researchers have performed parallel confocal scanning using digital-micromirror-device (DMD), which was used to project a scanning multi-point pattern across the sample. The DMD based parallel confocal systems increase the imaging speed while maintaining the optical sectioning ability. In this paper, we report the development of an add-on kit for high-speed and low-cost confocal microscopy. By adapting this add-on kit to an existing regular microscope, one can convert it into a confocal microscope without significant hardware modifications. Compared with current DMD-based implementations, the reported approach is able to recover multiple layers along the z axis simultaneously. It may find applications in wafer inspection and 3D metrology of semiconductor circuit. The dissemination of the proposed add-on kit under $1000 budget could also lead to new types of experimental designs for biological research labs, e.g., cytology analysis in cell culture experiments, genetic studies on multicellular organisms, pharmaceutical drug profiling, RNA interference studies, investigation of microbial communities in environmental systems, and etc.
Moving-Article X-Ray Imaging System and Method for 3-D Image Generation
NASA Technical Reports Server (NTRS)
Fernandez, Kenneth R. (Inventor)
2012-01-01
An x-ray imaging system and method for a moving article are provided for an article moved along a linear direction of travel while the article is exposed to non-overlapping x-ray beams. A plurality of parallel linear sensor arrays are disposed in the x-ray beams after they pass through the article. More specifically, a first half of the plurality are disposed in a first of the x-ray beams while a second half of the plurality are disposed in a second of the x-ray beams. Each of the parallel linear sensor arrays is oriented perpendicular to the linear direction of travel. Each of the parallel linear sensor arrays in the first half is matched to a corresponding one of the parallel linear sensor arrays in the second half in terms of an angular position in the first of the x-ray beams and the second of the x-ray beams, respectively.
Parallel and serial grouping of image elements in visual perception.
Houtkamp, Roos; Roelfsema, Pieter R
2010-12-01
The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some situations, but we demonstrate that there are also situations where Gestalt grouping becomes serial. We observe substantial time delays when image elements have to be grouped indirectly through a chain of local groupings. We call this chaining process incremental grouping and demonstrate that it can occur for only a single object at a time. We suggest that incremental grouping requires the gradual spread of object-based attention so that eventually all the object's parts become grouped explicitly by an attentional labeling process. Our findings inspire a new incremental grouping theory that relates the parallel, local grouping process to feedforward processing and the serial, incremental grouping process to recurrent processing in the visual cortex.
Parallel design patterns for a low-power, software-defined compressed video encoder
NASA Astrophysics Data System (ADS)
Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar
2011-06-01
Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
Riffel, Philipp; Zoellner, Frank G; Budjan, Johannes; Grimm, Robert; Block, Tobias K; Schoenberg, Stefan O; Hausmann, Daniel
2016-11-01
The purpose of the present study was to evaluate a recently introduced technique for free-breathing dynamic contrast-enhanced renal magnetic resonance imaging (MRI) applying a combination of radial k-space sampling, parallel imaging, and compressed sensing. The technique allows retrospective reconstruction of 2 motion-suppressed sets of images from the same acquisition: one with lower temporal resolution but improved image quality for subjective image analysis, and one with high temporal resolution for quantitative perfusion analysis. In this study, 25 patients underwent a kidney examination, including a prototypical fat-suppressed, golden-angle radial stack-of-stars T1-weighted 3-dimensional spoiled gradient-echo examination (GRASP) performed after contrast agent administration during free breathing. Images were reconstructed at temporal resolutions of 55 spokes per frame (6.2 seconds) and 13 spokes per frame (1.5 seconds). The GRASP images were evaluated by 2 blinded radiologists. First, the reconstructions with low temporal resolution underwent subjective image analysis: the radiologists assessed the best arterial phase and the best renal phase and rated image quality score for each patient on a 5-point Likert-type scale.In addition, the diagnostic confidence was rated according to a 3-point Likert-type scale. Similarly, respiratory motion artifacts and streak artifacts were rated according to a 3-point Likert-type scale.Then, the reconstructions with high temporal resolution were analyzed with a voxel-by-voxel deconvolution approach to determine the renal plasma flow, and the results were compared with values reported in previous literature. Reader 1 and reader 2 rated the overall image quality score for the best arterial phase and the best renal phase with a median image quality score of 4 (good image quality) for both phases, respectively. A high diagnostic confidence (median score of 3) was observed. There were no respiratory motion artifacts in any of the patients. Streak artifacts were present in all of the patients, but did not compromise diagnostic image quality.The estimated renal plasma flow was slightly higher (295 ± 78 mL/100 mL per minute) than reported in previous MRI-based studies, but also closer to the physiologically expected value. Dynamic, motion-suppressed contrast-enhanced renal MRI can be performed in high diagnostic quality during free breathing using a combination of golden-angle radial sampling, parallel imaging, and compressed sensing. Both morphologic and quantitative functional information can be acquired within a single acquisition.
Toward imaging the body at 10.5 tesla.
Ertürk, M Arcan; Wu, Xiaoping; Eryaman, Yiğitcan; Van de Moortele, Pierre-François; Auerbach, Edward J; Lagore, Russell L; DelaBarre, Lance; Vaughan, J Thomas; Uğurbil, Kâmil; Adriany, Gregor; Metzger, Gregory J
2017-01-01
To explore the potential of performing body imaging at 10.5 Tesla (T) compared with 7.0T through evaluating the transmit/receive performance of similarly configured dipole antenna arrays. Fractionated dipole antenna elements for 10.5T body imaging were designed and evaluated using numerical simulations. Transmit performance of antenna arrays inside the prostate, kidneys and heart were investigated and compared with those at 7.0T using both phase-only radiofrequency (RF) shimming and multi-spoke pulses. Signal-to-noise ratio (SNR) comparisons were also performed. A 10-channel antenna array was constructed to image the abdomen of a swine at 10.5T. Numerical methods were validated with phantom studies at both field strengths. Similar power efficiencies were observed inside target organs with phase-only shimming, but RF nonuniformity was significantly higher at 10.5T. Spokes RF pulses allowed similar transmit performance with accompanying local specific absorption rate increases of 25-90% compared with 7.0T. Relative SNR gains inside the target anatomies were calculated to be >two-fold higher at 10.5T, and 2.2-fold SNR gain was measured in a phantom. Gradient echo and fast spin echo imaging demonstrated the feasibility of body imaging at 10.5T with the designed array. While comparable power efficiencies can be achieved using dipole antenna arrays with static shimming at 10.5T; increasing RF nonuniformities underscore the need for efficient, robust, and safe parallel transmission methods. Magn Reson Med 77:434-443, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
10-channel fiber array fabrication technique for parallel optical coherence tomography system
NASA Astrophysics Data System (ADS)
Arauz, Lina J.; Luo, Yuan; Castillo, Jose E.; Kostuk, Raymond K.; Barton, Jennifer
2007-02-01
Optical Coherence Tomography (OCT) shows great promise for low intrusive biomedical imaging applications. A parallel OCT system is a novel technique that replaces mechanical transverse scanning with electronic scanning. This will reduce the time required to acquire image data. In this system an array of small diameter fibers is required to obtain an image in the transverse direction. Each fiber in the array is configured in an interferometer and is used to image one pixel in the transverse direction. In this paper we describe a technique to package 15μm diameter fibers on a siliconsilica substrate to be used in a 2mm endoscopic probe tip. Single mode fibers are etched to reduce the cladding diameter from 125μm to 15μm. Etched fibers are placed into a 4mm by 150μm trench in a silicon-silica substrate and secured with UV glue. Active alignment was used to simplify the lay out of the fibers and minimize unwanted horizontal displacement of the fibers. A 10-channel fiber array was built, tested and later incorporated into a parallel optical coherence system. This paper describes the packaging, testing, and operation of the array in a parallel OCT system.
Synthetic Foveal Imaging Technology
NASA Technical Reports Server (NTRS)
Nikzad, Shouleh (Inventor); Monacos, Steve P. (Inventor); Hoenk, Michael E. (Inventor)
2013-01-01
Apparatuses and methods are disclosed that create a synthetic fovea in order to identify and highlight interesting portions of an image for further processing and rapid response. Synthetic foveal imaging implements a parallel processing architecture that uses reprogrammable logic to implement embedded, distributed, real-time foveal image processing from different sensor types while simultaneously allowing for lossless storage and retrieval of raw image data. Real-time, distributed, adaptive processing of multi-tap image sensors with coordinated processing hardware used for each output tap is enabled. In mosaic focal planes, a parallel-processing network can be implemented that treats the mosaic focal plane as a single ensemble rather than a set of isolated sensors. Various applications are enabled for imaging and robotic vision where processing and responding to enormous amounts of data quickly and efficiently is important.
Image Harvest: an open-source platform for high-throughput plant image processing and analysis.
Knecht, Avi C; Campbell, Malachy T; Caprez, Adam; Swanson, David R; Walia, Harkamal
2016-05-01
High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Optimal exposure techniques for iodinated contrast enhanced breast CT
NASA Astrophysics Data System (ADS)
Glick, Stephen J.; Makeev, Andrey
2016-03-01
Screening for breast cancer using mammography has been very successful in the effort to reduce breast cancer mortality, and its use has largely resulted in the 30% reduction in breast cancer mortality observed since 1990 [1]. However, diagnostic mammography remains an area of breast imaging that is in great need for improvement. One imaging modality proposed for improving the accuracy of diagnostic workup is iodinated contrast-enhanced breast CT [2]. In this study, a mathematical framework is used to evaluate optimal exposure techniques for contrast-enhanced breast CT. The ideal observer signal-to-noise ratio (i.e., d') figure-of-merit is used to provide a task performance based assessment of optimal acquisition parameters under the assumptions of a linear, shift-invariant imaging system. A parallel-cascade model was used to estimate signal and noise propagation through the detector, and a realistic lesion model with iodine uptake was embedded into a structured breast background. Ideal observer performance was investigated across kVp settings, filter materials, and filter thickness. Results indicated many kVp spectra/filter combinations can improve performance over currently used x-ray spectra.
Recent advances in automatic alignment system for the National Ignition Facility
NASA Astrophysics Data System (ADS)
Wilhelmsen, Karl; Awwal, Abdul A. S.; Kalantar, Dan; Leach, Richard; Lowe-Webb, Roger; McGuigan, David; Miller Kamm, Vicki
2011-03-01
The automatic alignment system for the National Ignition Facility (NIF) is a large-scale parallel system that directs all 192 laser beams along the 300-m optical path to a 50-micron focus at target chamber in less than 50 minutes. The system automatically commands 9,000 stepping motors to adjust mirrors and other optics based upon images acquired from high-resolution digital cameras viewing beams at various locations. Forty-five control loops per beamline request image processing services running on a LINUX cluster to analyze these images of the beams and references, and automatically steer the beams toward the target. This paper discusses the upgrades to the NIF automatic alignment system to handle new alignment needs and evolving requirements as related to various types of experiments performed. As NIF becomes a continuously-operated system and more experiments are performed, performance monitoring is increasingly important for maintenance and commissioning work. Data, collected during operations, is analyzed for tuning of the laser and targeting maintenance work. Handling evolving alignment and maintenance needs is expected for the planned 30-year operational life of NIF.
Chavhan, Govind B; Babyn, Paul S; Vasanawala, Shreyas S
2013-05-01
Familiarity with basic sequence properties and their trade-offs is necessary for radiologists performing abdominal magnetic resonance (MR) imaging. Acquiring diagnostic-quality MR images in the pediatric abdomen is challenging due to motion, inability to breath hold, varying patient size, and artifacts. Motion-compensation techniques (eg, respiratory gating, signal averaging, suppression of signal from moving tissue, swapping phase- and frequency-encoding directions, use of faster sequences with breath holding, parallel imaging, and radial k-space filling) can improve image quality. Each of these techniques is more suitable for use with certain sequences and acquisition planes and in specific situations and age groups. Different T1- and T2-weighted sequences work better in different age groups and with differing acquisition planes and have specific advantages and disadvantages. Dynamic imaging should be performed differently in younger children than in older children. In younger children, the sequence and the timing of dynamic phases need to be adjusted. Different sequences work better in smaller children and in older children because of differing breath-holding ability, breathing patterns, field of view, and use of sedation. Hence, specific protocols should be maintained for younger children and older children. Combining longer-higher-resolution sequences and faster-lower-resolution sequences helps acquire diagnostic-quality images in a reasonable time. © RSNA, 2013.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Chuan, E-mail: chuan.huang@stonybrookmedicine.edu; Department of Radiology, Harvard Medical School, Boston, Massachusetts 02115; Departments of Radiology, Psychiatry, Stony Brook Medicine, Stony Brook, New York 11794
2015-02-15
Purpose: Degradation of image quality caused by cardiac and respiratory motions hampers the diagnostic quality of cardiac PET. It has been shown that improved diagnostic accuracy of myocardial defect can be achieved by tagged MR (tMR) based PET motion correction using simultaneous PET-MR. However, one major hurdle for the adoption of tMR-based PET motion correction in the PET-MR routine is the long acquisition time needed for the collection of fully sampled tMR data. In this work, the authors propose an accelerated tMR acquisition strategy using parallel imaging and/or compressed sensing and assess the impact on the tMR-based motion corrected PETmore » using phantom and patient data. Methods: Fully sampled tMR data were acquired simultaneously with PET list-mode data on two simultaneous PET-MR scanners for a cardiac phantom and a patient. Parallel imaging and compressed sensing were retrospectively performed by GRAPPA and kt-FOCUSS algorithms with various acceleration factors. Motion fields were estimated using nonrigid B-spline image registration from both the accelerated and fully sampled tMR images. The motion fields were incorporated into a motion corrected ordered subset expectation maximization reconstruction algorithm with motion-dependent attenuation correction. Results: Although tMR acceleration introduced image artifacts into the tMR images for both phantom and patient data, motion corrected PET images yielded similar image quality as those obtained using the fully sampled tMR images for low to moderate acceleration factors (<4). Quantitative analysis of myocardial defect contrast over ten independent noise realizations showed similar results. It was further observed that although the image quality of the motion corrected PET images deteriorates for high acceleration factors, the images were still superior to the images reconstructed without motion correction. Conclusions: Accelerated tMR images obtained with more than 4 times acceleration can still provide relatively accurate motion fields and yield tMR-based motion corrected PET images with similar image quality as those reconstructed using fully sampled tMR data. The reduction of tMR acquisition time makes it more compatible with routine clinical cardiac PET-MR studies.« less
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.
2003-01-01
Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
Theory, Image Simulation, and Data Analysis of Chemical Release Experiments
NASA Technical Reports Server (NTRS)
Wescott, Eugene M.
1994-01-01
The final phase of Grant NAG6-1 involved analysis of physics of chemical releases in the upper atmosphere and analysis of data obtained on previous NASA sponsored chemical release rocket experiments. Several lines of investigation of past chemical release experiments and computer simulations have been proceeding in parallel. This report summarizes the work performed and the resulting publications. The following topics are addressed: analysis of the 1987 Greenland rocket experiments; calculation of emission rates for barium, strontium, and calcium; the CRIT 1 and 2 experiments (Collisional Ionization Cross Section experiments); image calibration using background stars; rapid ray motions in ionospheric plasma clouds; and the NOONCUSP rocket experiments.
The architecture of a video image processor for the space station
NASA Technical Reports Server (NTRS)
Yalamanchili, S.; Lee, D.; Fritze, K.; Carpenter, T.; Hoyme, K.; Murray, N.
1987-01-01
The architecture of a video image processor for space station applications is described. The architecture was derived from a study of the requirements of algorithms that are necessary to produce the desired functionality of many of these applications. Architectural options were selected based on a simulation of the execution of these algorithms on various architectural organizations. A great deal of emphasis was placed on the ability of the system to evolve and grow over the lifetime of the space station. The result is a hierarchical parallel architecture that is characterized by high level language programmability, modularity, extensibility and can meet the required performance goals.
NASA Technical Reports Server (NTRS)
Hasler, A. F.; Strong, J.; Woodward, R. H.; Pierce, H.
1991-01-01
Results are presented on an automatic stereo analysis of cloud-top heights from nearly simultaneous satellite image pairs from the GOES and NOAA satellites, using a massively parallel processor computer. Comparisons of computer-derived height fields and manually analyzed fields show that the automatic analysis technique shows promise for performing routine stereo analysis in a real-time environment, providing a useful forecasting tool by augmenting observational data sets of severe thunderstorms and hurricanes. Simulations using synthetic stereo data show that it is possible to automatically resolve small-scale features such as 4000-m-diam clouds to about 1500 m in the vertical.
Zhong, Jidan; Rifkin-Graboi, Anne; Ta, Anh Tuan; Yap, Kar Lai; Chuang, Kai-Hsiang; Meaney, Michael J; Qiu, Anqi
2014-07-01
Children begin performing similarly to adults on tasks requiring executive functions in late childhood, a transition that is probably due to neuroanatomical fine-tuning processes, including myelination and synaptic pruning. In parallel to such structural changes in neuroanatomical organization, development of functional organization may also be associated with cognitive behaviors in children. We examined 6- to 10-year-old children's cortical thickness, functional organization, and cognitive performance. We used structural magnetic resonance imaging (MRI) to identify areas with cortical thinning, resting-state fMRI to identify functional organization in parallel to cortical development, and working memory/response inhibition tasks to assess executive functioning. We found that neuroanatomical changes in the form of cortical thinning spread over bilateral frontal, parietal, and occipital regions. These regions were engaged in 3 functional networks: sensorimotor and auditory, executive control, and default mode network. Furthermore, we found that working memory and response inhibition only associated with regional functional connectivity, but not topological organization (i.e., local and global efficiency of information transfer) of these functional networks. Interestingly, functional connections associated with "bottom-up" as opposed to "top-down" processing were more clearly related to children's performance on working memory and response inhibition, implying an important role for brain systems involved in late childhood. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Optical recognition of statistical patterns
NASA Astrophysics Data System (ADS)
Lee, S. H.
1981-12-01
Optical implementation of the Fukunaga-Koontz transform (FKT) and the Least-Squares Linear Mapping Technique (LSLMT) is described. The FKT is a linear transformation which performs image feature extraction for a two-class image classification problem. The LSLMT performs a transform from large dimensional feature space to small dimensional decision space for separating multiple image classes by maximizing the interclass differences while minimizing the intraclass variations. The FKT and the LSLMT were optically implemented by utilizing a coded phase optical processor. The transform was used for classifying birds and fish. After the F-K basis functions were calculated, those most useful for classification were incorporated into a computer generated hologram. The output of the optical processor, consisting of the squared magnitude of the F-K coefficients, was detected by a T.V. camera, digitized, and fed into a micro-computer for classification. A simple linear classifier based on only two F-K coefficients was able to separate the images into two classes, indicating that the F-K transform had chosen good features. Two advantages of optically implementing the FKT and LSLMT are parallel and real time processing.
Optical recognition of statistical patterns
NASA Technical Reports Server (NTRS)
Lee, S. H.
1981-01-01
Optical implementation of the Fukunaga-Koontz transform (FKT) and the Least-Squares Linear Mapping Technique (LSLMT) is described. The FKT is a linear transformation which performs image feature extraction for a two-class image classification problem. The LSLMT performs a transform from large dimensional feature space to small dimensional decision space for separating multiple image classes by maximizing the interclass differences while minimizing the intraclass variations. The FKT and the LSLMT were optically implemented by utilizing a coded phase optical processor. The transform was used for classifying birds and fish. After the F-K basis functions were calculated, those most useful for classification were incorporated into a computer generated hologram. The output of the optical processor, consisting of the squared magnitude of the F-K coefficients, was detected by a T.V. camera, digitized, and fed into a micro-computer for classification. A simple linear classifier based on only two F-K coefficients was able to separate the images into two classes, indicating that the F-K transform had chosen good features. Two advantages of optically implementing the FKT and LSLMT are parallel and real time processing.