Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations
NASA Astrophysics Data System (ADS)
Detrixhe, Miles; Gibou, Frédéric
2016-10-01
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu; University of California Santa Barbara, Santa Barbara, CA, 93106; Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling,more » and show state-of-the-art speedup values for the fast sweeping method.« less
Fast parallel approach for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2009-12-01
Two-dimensional fast Gabor transform algorithms are useful for real-time applications due to the high computational complexity of the traditional 2-D complex-valued discrete Gabor transform (CDGT). This paper presents two block time-recursive algorithms for 2-D DHT-based real-valued discrete Gabor transform (RDGT) and its inverse transform and develops a fast parallel approach for the implementation of the two algorithms. The computational complexity of the proposed parallel approach is analyzed and compared with that of the existing 2-D CDGT algorithms. The results indicate that the proposed parallel approach is attractive for real time image processing.
Fast, Massively Parallel Data Processors
NASA Technical Reports Server (NTRS)
Heaton, Robert A.; Blevins, Donald W.; Davis, ED
1994-01-01
Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Analysis techniques for diagnosing runaway ion distributions in the reversed field pinch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, J., E-mail: jkim536@wisc.edu; Anderson, J. K.; Capecchi, W.
2016-11-15
An advanced neutral particle analyzer (ANPA) on the Madison Symmetric Torus measures deuterium ions of energy ranges 8-45 keV with an energy resolution of 2-4 keV and time resolution of 10 μs. Three different experimental configurations measure distinct portions of the naturally occurring fast ion distributions: fast ions moving parallel, anti-parallel, or perpendicular to the plasma current. On a radial-facing port, fast ions moving perpendicular to the current have the necessary pitch to be measured by the ANPA. With the diagnostic positioned on a tangent line through the plasma core, a chord integration over fast ion density, background neutral density,more » and local appropriate pitch defines the measured sample. The plasma current can be reversed to measure anti-parallel fast ions in the same configuration. Comparisons of energy distributions for the three configurations show an anisotropic fast ion distribution favoring high pitch ions.« less
Fast I/O for Massively Parallel Applications
NASA Technical Reports Server (NTRS)
OKeefe, Matthew T.
1996-01-01
The two primary goals for this report were the design, contruction and modeling of parallel disk arrays for scientific visualization and animation, and a study of the IO requirements of highly parallel applications. In addition, further work in parallel display systems required to project and animate the very high-resolution frames resulting from our supercomputing simulations in ocean circulation and compressible gas dynamics.
Multitasking domain decomposition fast Poisson solvers on the Cray Y-MP
NASA Technical Reports Server (NTRS)
Chan, Tony F.; Fatoohi, Rod A.
1990-01-01
The results of multitasking implementation of a domain decomposition fast Poisson solver on eight processors of the Cray Y-MP are presented. The object of this research is to study the performance of domain decomposition methods on a Cray supercomputer and to analyze the performance of different multitasking techniques using highly parallel algorithms. Two implementations of multitasking are considered: macrotasking (parallelism at the subroutine level) and microtasking (parallelism at the do-loop level). A conventional FFT-based fast Poisson solver is also multitasked. The results of different implementations are compared and analyzed. A speedup of over 7.4 on the Cray Y-MP running in a dedicated environment is achieved for all cases.
The multigrid preconditioned conjugate gradient method
NASA Technical Reports Server (NTRS)
Tatebe, Osamu
1993-01-01
A multigrid preconditioned conjugate gradient method (MGCG method), which uses the multigrid method as a preconditioner of the PCG method, is proposed. The multigrid method has inherent high parallelism and improves convergence of long wavelength components, which is important in iterative methods. By using this method as a preconditioner of the PCG method, an efficient method with high parallelism and fast convergence is obtained. First, it is considered a necessary condition of the multigrid preconditioner in order to satisfy requirements of a preconditioner of the PCG method. Next numerical experiments show a behavior of the MGCG method and that the MGCG method is superior to both the ICCG method and the multigrid method in point of fast convergence and high parallelism. This fast convergence is understood in terms of the eigenvalue analysis of the preconditioned matrix. From this observation of the multigrid preconditioner, it is realized that the MGCG method converges in very few iterations and the multigrid preconditioner is a desirable preconditioner of the conjugate gradient method.
NASA Technical Reports Server (NTRS)
Farhat, Charbel
1998-01-01
In this grant, we have proposed a three-year research effort focused on developing High Performance Computation and Communication (HPCC) methodologies for structural analysis on parallel processors and clusters of workstations, with emphasis on reducing the structural design cycle time. Besides consolidating and further improving the FETI solver technology to address plate and shell structures, we have proposed to tackle the following design related issues: (a) parallel coupling and assembly of independently designed and analyzed three-dimensional substructures with non-matching interfaces, (b) fast and smart parallel re-analysis of a given structure after it has undergone design modifications, (c) parallel evaluation of sensitivity operators (derivatives) for design optimization, and (d) fast parallel analysis of mildly nonlinear structures. While our proposal was accepted, support was provided only for one year.
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fijany, A.; Milman, M.; Redding, D.
1994-12-31
In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less
A Domain Decomposition Parallelization of the Fast Marching Method
NASA Technical Reports Server (NTRS)
Herrmann, M.
2003-01-01
In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
Crosetto, D.B.
1996-12-31
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Crosetto, Dario B.
1996-01-01
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
NASA Astrophysics Data System (ADS)
Kim, Stephan D.; Luo, Jiajun; Buchholz, D. Bruce; Chang, R. P. H.; Grayson, M.
2016-09-01
A modular time division multiplexer (MTDM) device is introduced to enable parallel measurement of multiple samples with both fast and slow decay transients spanning from millisecond to month-long time scales. This is achieved by dedicating a single high-speed measurement instrument for rapid data collection at the start of a transient, and by multiplexing a second low-speed measurement instrument for slow data collection of several samples in parallel for the later transients. The MTDM is a high-level design concept that can in principle measure an arbitrary number of samples, and the low cost implementation here allows up to 16 samples to be measured in parallel over several months, reducing the total ensemble measurement duration and equipment usage by as much as an order of magnitude without sacrificing fidelity. The MTDM was successfully demonstrated by simultaneously measuring the photoconductivity of three amorphous indium-gallium-zinc-oxide thin films with 20 ms data resolution for fast transients and an uninterrupted parallel run time of over 20 days. The MTDM has potential applications in many areas of research that manifest response times spanning many orders of magnitude, such as photovoltaics, rechargeable batteries, amorphous semiconductors such as silicon and amorphous indium-gallium-zinc-oxide.
Kim, Stephan D; Luo, Jiajun; Buchholz, D Bruce; Chang, R P H; Grayson, M
2016-09-01
A modular time division multiplexer (MTDM) device is introduced to enable parallel measurement of multiple samples with both fast and slow decay transients spanning from millisecond to month-long time scales. This is achieved by dedicating a single high-speed measurement instrument for rapid data collection at the start of a transient, and by multiplexing a second low-speed measurement instrument for slow data collection of several samples in parallel for the later transients. The MTDM is a high-level design concept that can in principle measure an arbitrary number of samples, and the low cost implementation here allows up to 16 samples to be measured in parallel over several months, reducing the total ensemble measurement duration and equipment usage by as much as an order of magnitude without sacrificing fidelity. The MTDM was successfully demonstrated by simultaneously measuring the photoconductivity of three amorphous indium-gallium-zinc-oxide thin films with 20 ms data resolution for fast transients and an uninterrupted parallel run time of over 20 days. The MTDM has potential applications in many areas of research that manifest response times spanning many orders of magnitude, such as photovoltaics, rechargeable batteries, amorphous semiconductors such as silicon and amorphous indium-gallium-zinc-oxide.
NASA Astrophysics Data System (ADS)
Palmesi, P.; Abert, C.; Bruckner, F.; Suess, D.
2018-05-01
Fast stray field calculation is commonly considered of great importance for micromagnetic simulations, since it is the most time consuming part of the simulation. The Fast Multipole Method (FMM) has displayed linear O(N) parallelization behavior on many cores. This article investigates the error of a recent FMM approach approximating sources using linear—instead of constant—finite elements in the singular integral for calculating the stray field and the corresponding potential. After measuring performance in an earlier manuscript, this manuscript investigates the convergence of the relative L2 error for several FMM simulation parameters. Various scenarios either calculating the stray field directly or via potential are discussed.
Research on the Application of Fast-steering Mirror in Stellar Interferometer
NASA Astrophysics Data System (ADS)
Mei, R.; Hu, Z. W.; Xu, T.; Sun, C. S.
2017-07-01
For a stellar interferometer, the fast-steering mirror (FSM) is widely utilized to correct wavefront tilt caused by atmospheric turbulence and internal instrumental vibration due to its high resolution and fast response frequency. In this study, the non-coplanar error between the FSM and actuator deflection axis introduced by manufacture, assembly, and adjustment is analyzed. Via a numerical method, the additional optical path difference (OPD) caused by above factors is studied, and its effects on tracking accuracy of stellar interferometer are also discussed. On the other hand, the starlight parallelism between the beams of two arms is one of the main factors of the loss of fringe visibility. By analyzing the influence of wavefront tilt caused by the atmospheric turbulence on fringe visibility, a simple and efficient real-time correction scheme of starlight parallelism is proposed based on a single array detector. The feasibility of this scheme is demonstrated by laboratory experiment. The results show that starlight parallelism meets the requirement of stellar interferometer in wavefront tilt preliminarily after the correction of fast-steering mirror.
Parallel Fast Multipole Method For Molecular Dynamics
2007-06-01
Parallel Fast Multipole Method For Molecular Dynamics THESIS Reid G. Ormseth, Captain, USAF AFIT/GAP/ENP/07-J02 DEPARTMENT OF THE AIR FORCE AIR...the United States Government. AFIT/GAP/ENP/07-J02 Parallel Fast Multipole Method For Molecular Dynamics THESIS Presented to the Faculty Department of...has also been provided by ‘The Art of Molecular Dynamics Simulation ’ by Dennis Rapaport. This work is the clearest treatment of the Fast Multipole
Crustal origin of trench-parallel shear-wave fast polarizations in the Central Andes
NASA Astrophysics Data System (ADS)
Wölbern, I.; Löbl, U.; Rümpker, G.
2014-04-01
In this study, SKS and local S phases are analyzed to investigate variations of shear-wave splitting parameters along two dense seismic profiles across the central Andean Altiplano and Puna plateaus. In contrast to previous observations, the vast majority of the measurements reveal fast polarizations sub-parallel to the subduction direction of the Nazca plate with delay times between 0.3 and 1.2 s. Local phases show larger variations of fast polarizations and exhibit delay times ranging between 0.1 and 1.1 s. Two 70 km and 100 km wide sections along the Altiplano profile exhibit larger delay times and are characterized by fast polarizations oriented sub-parallel to major fault zones. Based on finite-difference wavefield calculations for anisotropic subduction zone models we demonstrate that the observations are best explained by fossil slab anisotropy with fast symmetry axes oriented sub-parallel to the slab movement in combination with a significant component of crustal anisotropy of nearly trench-parallel fast-axis orientation. From the modeling we exclude a sub-lithospheric origin of the observed strong anomalies due to the short-scale variations of the fast polarizations. Instead, our results indicate that anisotropy in the Central Andes generally reflects the direction of plate motion while the observed trench-parallel fast polarizations likely originate in the continental crust above the subducting slab.
Liu, Peilu; Li, Xinghua; Li, Haopeng; Su, Zhikun; Zhang, Hongxu
2017-01-01
In order to improve the accuracy of ultrasonic phased array focusing time delay, analyzing the original interpolation Cascade-Integrator-Comb (CIC) filter, an 8× interpolation CIC filter parallel algorithm was proposed, so that interpolation and multichannel decomposition can simultaneously process. Moreover, we summarized the general formula of arbitrary multiple interpolation CIC filter parallel algorithm and established an ultrasonic phased array focusing time delay system based on 8× interpolation CIC filter parallel algorithm. Improving the algorithmic structure, 12.5% of addition and 29.2% of multiplication was reduced, meanwhile the speed of computation is still very fast. Considering the existing problems of the CIC filter, we compensated the CIC filter; the compensated CIC filter’s pass band is flatter, the transition band becomes steep, and the stop band attenuation increases. Finally, we verified the feasibility of this algorithm on Field Programming Gate Array (FPGA). In the case of system clock is 125 MHz, after 8× interpolation filtering and decomposition, time delay accuracy of the defect echo becomes 1 ns. Simulation and experimental results both show that the algorithm we proposed has strong feasibility. Because of the fast calculation, small computational amount and high resolution, this algorithm is especially suitable for applications with high time delay accuracy and fast detection. PMID:29023385
Liu, Peilu; Li, Xinghua; Li, Haopeng; Su, Zhikun; Zhang, Hongxu
2017-10-12
In order to improve the accuracy of ultrasonic phased array focusing time delay, analyzing the original interpolation Cascade-Integrator-Comb (CIC) filter, an 8× interpolation CIC filter parallel algorithm was proposed, so that interpolation and multichannel decomposition can simultaneously process. Moreover, we summarized the general formula of arbitrary multiple interpolation CIC filter parallel algorithm and established an ultrasonic phased array focusing time delay system based on 8× interpolation CIC filter parallel algorithm. Improving the algorithmic structure, 12.5% of addition and 29.2% of multiplication was reduced, meanwhile the speed of computation is still very fast. Considering the existing problems of the CIC filter, we compensated the CIC filter; the compensated CIC filter's pass band is flatter, the transition band becomes steep, and the stop band attenuation increases. Finally, we verified the feasibility of this algorithm on Field Programming Gate Array (FPGA). In the case of system clock is 125 MHz, after 8× interpolation filtering and decomposition, time delay accuracy of the defect echo becomes 1 ns. Simulation and experimental results both show that the algorithm we proposed has strong feasibility. Because of the fast calculation, small computational amount and high resolution, this algorithm is especially suitable for applications with high time delay accuracy and fast detection.
Parallel and pipeline computation of fast unitary transforms
NASA Technical Reports Server (NTRS)
Fino, B. J.; Algazi, V. R.
1975-01-01
The letter discusses the parallel and pipeline organization of fast-unitary-transform algorithms such as the fast Fourier transform, and points out the efficiency of a combined parallel-pipeline processor of a transform such as the Haar transform, in which (2 to the n-th power) -1 hardware 'butterflies' generate a transform of order 2 to the n-th power every computation cycle.
Algorithm for fast event parameters estimation on GEM acquired data
NASA Astrophysics Data System (ADS)
Linczuk, Paweł; Krawczyk, Rafał D.; Poźniak, Krzysztof T.; Kasprowicz, Grzegorz; Wojeński, Andrzej; Chernyshova, Maryna; Czarski, Tomasz
2016-09-01
We present study of a software-hardware environment for developing fast computation with high throughput and low latency methods, which can be used as back-end in High Energy Physics (HEP) and other High Performance Computing (HPC) systems, based on high amount of input from electronic sensor based front-end. There is a parallelization possibilities discussion and testing on Intel HPC solutions with consideration of applications with Gas Electron Multiplier (GEM) measurement systems presented in this paper.
A note on parallel and pipeline computation of fast unitary transforms
NASA Technical Reports Server (NTRS)
Fino, B. J.; Algazi, V. R.
1974-01-01
The parallel and pipeline organization of fast unitary transform algorithms such as the Fast Fourier Transform are discussed. The efficiency is pointed out of a combined parallel-pipeline processor of a transform such as the Haar transform in which 2 to the n minus 1 power hardware butterflies generate a transform of order 2 to the n power every computation cycle.
Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter
2015-01-20
While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.
Zhu, Xiang; Zhang, Dianwen
2013-01-01
We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
A distributed parallel storage architecture and its potential application within EOSDIS
NASA Technical Reports Server (NTRS)
Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony
1994-01-01
We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Fast Face-Recognition Optical Parallel Correlator Using High Accuracy Correlation Filter
NASA Astrophysics Data System (ADS)
Watanabe, Eriko; Kodate, Kashiko
2005-11-01
We designed and fabricated a fully automatic fast face recognition optical parallel correlator [E. Watanabe and K. Kodate: Appl. Opt. 44 (2005) 5666] based on the VanderLugt principle. The implementation of an as-yet unattained ultra high-speed system was aided by reconfiguring the system to make it suitable for easier parallel processing, as well as by composing a higher accuracy correlation filter and high-speed ferroelectric liquid crystal-spatial light modulator (FLC-SLM). In running trial experiments using this system (dubbed FARCO), we succeeded in acquiring remarkably low error rates of 1.3% for false match rate (FMR) and 2.6% for false non-match rate (FNMR). Given the results of our experiments, the aim of this paper is to examine methods of designing correlation filters and arranging database image arrays for even faster parallel correlation, underlining the issues of calculation technique, quantization bit rate, pixel size and shift from optical axis. The correlation filter has proved its excellent performance and higher precision than classical correlation and joint transform correlator (JTC). Moreover, arrangement of multi-object reference images leads to 10-channel correlation signals, as sharply marked as those of a single channel. This experiment result demonstrates great potential for achieving the process speed of 10000 face/s.
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-06-01
We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-01-01
We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Adaptive multiple super fast simulated annealing for stochastic microstructure reconstruction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ryu, Seun; Lin, Guang; Sun, Xin
2013-01-01
Fast image reconstruction from statistical information is critical in image fusion from multimodality chemical imaging instrumentation to create high resolution image with large domain. Stochastic methods have been used widely in image reconstruction from two point correlation function. The main challenge is to increase the efficiency of reconstruction. A novel simulated annealing method is proposed for fast solution of image reconstruction. Combining the advantage of very fast cooling schedules, dynamic adaption and parallelization, the new simulation annealing algorithm increases the efficiencies by several orders of magnitude, making the large domain image fusion feasible.
2015-06-01
cient parallel code for applying the operator. Our method constructs a polynomial preconditioner using a nonlinear least squares (NLLS) algorithm. We show...apply the underlying operator. Such a preconditioner can be very attractive in scenarios where one has a highly efficient parallel code for applying...repeatedly solve a large system of linear equations where one has an extremely fast parallel code for applying an underlying fixed linear operator
[Metabolic study of the initial period of fasting in the king penguin chick].
Cherel, Y; Le Maho, Y
1985-01-01
There is an 80% decrease in the specific daily change in body mass (dm/m dt) during the first 5-6 days of fasting in king penguin chicks, which characterizes period I of fasting. Parallel decreases in plasma alanine and uric acid concentrations suggest an important reduction in protein degradation. Plasma concentration of beta-hydroxybutyrate and glucose are high, respectively 1.3 and 12.5 mmol X 1(-1), and do not change significantly.
An accurate, fast, and scalable solver for high-frequency wave propagation
NASA Astrophysics Data System (ADS)
Zepeda-Núñez, L.; Taus, M.; Hewett, R.; Demanet, L.
2017-12-01
In many science and engineering applications, solving time-harmonic high-frequency wave propagation problems quickly and accurately is of paramount importance. For example, in geophysics, particularly in oil exploration, such problems can be the forward problem in an iterative process for solving the inverse problem of subsurface inversion. It is important to solve these wave propagation problems accurately in order to efficiently obtain meaningful solutions of the inverse problems: low order forward modeling can hinder convergence. Additionally, due to the volume of data and the iterative nature of most optimization algorithms, the forward problem must be solved many times. Therefore, a fast solver is necessary to make solving the inverse problem feasible. For time-harmonic high-frequency wave propagation, obtaining both speed and accuracy is historically challenging. Recently, there have been many advances in the development of fast solvers for such problems, including methods which have linear complexity with respect to the number of degrees of freedom. While most methods scale optimally only in the context of low-order discretizations and smooth wave speed distributions, the method of polarized traces has been shown to retain optimal scaling for high-order discretizations, such as hybridizable discontinuous Galerkin methods and for highly heterogeneous (and even discontinuous) wave speeds. The resulting fast and accurate solver is consequently highly attractive for geophysical applications. To date, this method relies on a layered domain decomposition together with a preconditioner applied in a sweeping fashion, which has limited straight-forward parallelization. In this work, we introduce a new version of the method of polarized traces which reveals more parallel structure than previous versions while preserving all of its other advantages. We achieve this by further decomposing each layer and applying the preconditioner to these new components separately and in parallel. We demonstrate that this produces an even more effective and parallelizable preconditioner for a single right-hand side. As before, additional speed can be gained by pipelining several right-hand-sides.
Fast disk array for image storage
NASA Astrophysics Data System (ADS)
Feng, Dan; Zhu, Zhichun; Jin, Hai; Zhang, Jiangling
1997-01-01
A fast disk array is designed for the large continuous image storage. It includes a high speed data architecture and the technology of data striping and organization on the disk array. The high speed data path which is constructed by two dual port RAM and some control circuit is configured to transfer data between a host system and a plurality of disk drives. The bandwidth can be more than 100 MB/s if the data path based on PCI (peripheral component interconnect). The organization of data stored on the disk array is similar to RAID 4. Data are striped on a plurality of disk, and each striping unit is equal to a track. I/O instructions are performed in parallel on the disk drives. An independent disk is used to store the parity information in the fast disk array architecture. By placing the parity generation circuit directly on the SCSI (or SCSI 2) bus, the parity information can be generated on the fly. It will affect little on the data writing in parallel on the other disks. The fast disk array architecture designed in the paper can meet the demands of the image storage.
Fast Fourier Transform algorithm design and tradeoffs
NASA Technical Reports Server (NTRS)
Kamin, Ray A., III; Adams, George B., III
1988-01-01
The Fast Fourier Transform (FFT) is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional Single Instruction Stream/Multiple Data (SIMD) parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Fast Fourier Transform programs are compared to the currently best Cray-2 FFT program.
Fast Whole-Engine Stirling Analysis
NASA Technical Reports Server (NTRS)
Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako
2006-01-01
This presentation discusses the simulation approach to whole-engine for physical consistency, REV regenerator modeling, grid layering for smoothness, and quality, conjugate heat transfer method adjustment, high-speed low cost parallel cluster, and debugging.
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
NASA Astrophysics Data System (ADS)
Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji
2013-03-01
This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.
Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU
NASA Astrophysics Data System (ADS)
Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang
2017-10-01
Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.
Potential Application of a Graphical Processing Unit to Parallel Computations in the NUBEAM Code
NASA Astrophysics Data System (ADS)
Payne, J.; McCune, D.; Prater, R.
2010-11-01
NUBEAM is a comprehensive computational Monte Carlo based model for neutral beam injection (NBI) in tokamaks. NUBEAM computes NBI-relevant profiles in tokamak plasmas by tracking the deposition and the slowing of fast ions. At the core of NUBEAM are vector calculations used to track fast ions. These calculations have recently been parallelized to run on MPI clusters. However, cost and interlink bandwidth limit the ability to fully parallelize NUBEAM on an MPI cluster. Recent implementation of double precision capabilities for Graphical Processing Units (GPUs) presents a cost effective and high performance alternative or complement to MPI computation. Commercially available graphics cards can achieve up to 672 GFLOPS double precision and can handle hundreds of thousands of threads. The ability to execute at least one thread per particle simultaneously could significantly reduce the execution time and the statistical noise of NUBEAM. Progress on implementation on a GPU will be presented.
A High-Order Direct Solver for Helmholtz Equations with Neumann Boundary Conditions
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Zhuang, Yu
1997-01-01
In this study, a compact finite-difference discretization is first developed for Helmholtz equations on rectangular domains. Special treatments are then introduced for Neumann and Neumann-Dirichlet boundary conditions to achieve accuracy and separability. Finally, a Fast Fourier Transform (FFT) based technique is used to yield a fast direct solver. Analytical and experimental results show this newly proposed solver is comparable to the conventional second-order elliptic solver when accuracy is not a primary concern, and is significantly faster than that of the conventional solver if a highly accurate solution is required. In addition, this newly proposed fourth order Helmholtz solver is parallel in nature. It is readily available for parallel and distributed computers. The compact scheme introduced in this study is likely extendible for sixth-order accurate algorithms and for more general elliptic equations.
Procacci, Piero
2016-06-27
We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac .
Parallel heuristics for scalable community detection
Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth
2015-08-14
Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less
Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.
Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou
2016-01-01
For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.
Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A
2010-01-01
Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.
Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2012-07-01
Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing.
NASA Astrophysics Data System (ADS)
Wang, Yue; Yu, Jingjun; Pei, Xu
2018-06-01
A new forward kinematics algorithm for the mechanism of 3-RPS (R: Revolute; P: Prismatic; S: Spherical) parallel manipulators is proposed in this study. This algorithm is primarily based on the special geometric conditions of the 3-RPS parallel mechanism, and it eliminates the errors produced by parasitic motions to improve and ensure accuracy. Specifically, the errors can be less than 10-6. In this method, only the group of solutions that is consistent with the actual situation of the platform is obtained rapidly. This algorithm substantially improves calculation efficiency because the selected initial values are reasonable, and all the formulas in the calculation are analytical. This novel forward kinematics algorithm is well suited for real-time and high-precision control of the 3-RPS parallel mechanism.
Effect of parallel electric fields on the ponderomotive stabilization of MHD instabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Litwin, C.; Hershkowitz, N.
The contribution of the wave electric field component E/sub parallel/, parallel to the magnetic field, to the ponderomotive stabilization of curvature driven instabilities is evaluated and compared to the transverse component contribution. For the experimental density range, in which the stability is primarily determined by the m = 1 magnetosonic wave, this contribution is found to be the dominant and stabilizing when the electron temperature is neglected. For sufficiently high electron temperatures the dominant fast wave is found to be axially evanescent. In the same limit, E/sub parallel/ becomes radially oscillating. It is concluded that the increased electron temperature nearmore » the plasma surface reduces the magnitude of ponderomotive effects.« less
Efficient implementation of parallel three-dimensional FFT on clusters of PCs
NASA Astrophysics Data System (ADS)
Takahashi, Daisuke
2003-05-01
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations
NASA Technical Reports Server (NTRS)
Fijany, Amir
1993-01-01
In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.
Current drive with combined electron cyclotron wave and high harmonic fast wave in tokamak plasmas
NASA Astrophysics Data System (ADS)
Li, J. C.; Gong, X. Y.; Dong, J. Q.; Wang, J.; Zhang, N.; Zheng, P. W.; Yin, C. Y.
2016-12-01
The current driven by combined electron cyclotron wave (ECW) and high harmonic fast wave is investigated using the GENRAY/CQL3D package. It is shown that no significant synergetic current is found in a range of cases with a combined ECW and fast wave (FW). This result is consistent with a previous study [Harvey et al., in Proceedings of IAEA TCM on Fast Wave Current Drive in Reactor Scale Tokamaks (Synergy and Complimentarily with LHCD and ECRH), Arles, France, IAEA, Vienna, 1991]. However, a positive synergy effect does appear with the FW in the lower hybrid range of frequencies. This positive synergy effect can be explained using a picture of the electron distribution function induced by the ECW and a very high harmonic fast wave (helicon). The dependence of the synergy effect on the radial position of the power deposition, the wave power, the wave frequency, and the parallel refractive index is also analyzed, both numerically and physically.
Some fast elliptic solvers on parallel architectures and their complexities
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Y.
1989-01-01
The discretization of separable elliptic partial differential equations leads to linear systems with special block tridiagonal matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconstant coefficients. A method was recently proposed to parallelize and vectorize BCR. In this paper, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational compelxity lower than that of parallel BCR.
Some fast elliptic solvers on parallel architectures and their complexities
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Youcef
1989-01-01
The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR.
Parallel processing in the honeybee olfactory pathway: structure, function, and evolution.
Rössler, Wolfgang; Brill, Martin F
2013-11-01
Animals face highly complex and dynamic olfactory stimuli in their natural environments, which require fast and reliable olfactory processing. Parallel processing is a common principle of sensory systems supporting this task, for example in visual and auditory systems, but its role in olfaction remained unclear. Studies in the honeybee focused on a dual olfactory pathway. Two sets of projection neurons connect glomeruli in two antennal-lobe hemilobes via lateral and medial tracts in opposite sequence with the mushroom bodies and lateral horn. Comparative studies suggest that this dual-tract circuit represents a unique adaptation in Hymenoptera. Imaging studies indicate that glomeruli in both hemilobes receive redundant sensory input. Recent simultaneous multi-unit recordings from projection neurons of both tracts revealed widely overlapping response profiles strongly indicating parallel olfactory processing. Whereas lateral-tract neurons respond fast with broad (generalistic) profiles, medial-tract neurons are odorant specific and respond slower. In analogy to "what-" and "where" subsystems in visual pathways, this suggests two parallel olfactory subsystems providing "what-" (quality) and "when" (temporal) information. Temporal response properties may support across-tract coincidence coding in higher centers. Parallel olfactory processing likely enhances perception of complex odorant mixtures to decode the diverse and dynamic olfactory world of a social insect.
Fast adaptive composite grid methods on distributed parallel architectures
NASA Technical Reports Server (NTRS)
Lemke, Max; Quinlan, Daniel
1992-01-01
The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.
Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter
2015-01-01
To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.
A parallel input composite transimpedance amplifier.
Kim, D J; Kim, C
2018-01-01
A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
A parallel input composite transimpedance amplifier
NASA Astrophysics Data System (ADS)
Kim, D. J.; Kim, C.
2018-01-01
A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
Fast parallel tandem mass spectral library searching using GPU hardware acceleration.
Baumgardner, Lydia Ashleigh; Shanmugam, Avinash Kumar; Lam, Henry; Eng, Jimmy K; Martin, Daniel B
2011-06-03
Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate-limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper, we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching), is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA, which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment.
Dynamic grid refinement for partial differential equations on parallel computers
NASA Technical Reports Server (NTRS)
Mccormick, S.; Quinlan, D.
1989-01-01
The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems.
Massively Parallel Solution of Poisson Equation on Coarse Grain MIMD Architectures
NASA Technical Reports Server (NTRS)
Fijany, A.; Weinberger, D.; Roosta, R.; Gulati, S.
1998-01-01
In this paper a new algorithm, designated as Fast Invariant Imbedding algorithm, for solution of Poisson equation on vector and massively parallel MIMD architectures is presented. This algorithm achieves the same optimal computational efficiency as other Fast Poisson solvers while offering a much better structure for vector and parallel implementation. Our implementation on the Intel Delta and Paragon shows that a speedup of over two orders of magnitude can be achieved even for moderate size problems.
Glover, William A; Atienza, Ederlyn E; Nesbitt, Shannon; Kim, Woo J; Castor, Jared; Cook, Linda; Jerome, Keith R
2016-01-01
Quantitative DNA detection of cytomegalovirus (CMV) and BK virus (BKV) is critical in the management of transplant patients. Quantitative laboratory-developed procedures for CMV and BKV have been described in which much of the processing is automated, resulting in rapid, reproducible, and high-throughput testing of transplant patients. To increase the efficiency of such assays, the performance and stability of four commercial preassembled frozen fast qPCR master mixes (Roche FastStart Universal Probe Master Mix with Rox, Bio-Rad SsoFast Probes Supermix with Rox, Life Technologies TaqMan FastAdvanced Master Mix, and Life Technologies Fast Universal PCR Master Mix), in combination with in-house designed primers and probes, was evaluated using controls and standards from standard CMV and BK assays. A subsequent parallel evaluation using patient samples was performed comparing the performance of freshly prepared assay mixes versus aliquoted frozen master mixes made with two of the fast qPCR mixes (Life Technologies TaqMan FastAdvanced Master Mix, and Bio-Rad SsoFast Probes Supermix with Rox), chosen based on their performance and compatibility with existing PCR cycling conditions. The data demonstrate that the frozen master mixes retain excellent performance over a period of at least 10 weeks. During the parallel testing using clinical specimens, no difference in quantitative results was observed between the preassembled frozen master mixes and freshly prepared master mixes. Preassembled fast real-time qPCR frozen master mixes perform well and represent an additional strategy laboratories can implement to reduce assay preparation times, and to minimize technical errors and effort necessary to perform clinical PCR. © 2015 Wiley Periodicals, Inc.
Are Fast Radio Bursts the Birthmark of Magnetars?
NASA Astrophysics Data System (ADS)
Lieu, Richard
2017-01-01
A model of fast radio bursts, which enlists young, short period extragalactic magnetars satisfying B/P > 2 × 1016 G s-1 (1 G = 1 statvolt cm-1) as the source, is proposed. When the parallel component {{\\boldsymbol{E}}}\\parallel of the surface electric field (under the scenario of a vacuum magnetosphere) of such pulsars approaches 5% of the critical field {E}c={m}e2{c}3/(e{\\hslash }), in strength, the field can readily decay via the Schwinger mechanism into electron-positron pairs, the back reaction of which causes {{\\boldsymbol{E}}}\\parallel to oscillate on a characteristic timescale smaller than the development of a spark gap. Thus, under this scenario, the open field line region of the pulsar magnetosphere is controlled by Schwinger pairs, and their large creation and acceleration rates enable the escaping pairs to coherently emit radio waves directly from the polar cap. The majority of the energy is emitted at frequencies ≲ 1 {GHz} where the coherent radiation has the highest yield, at a rate large enough to cause the magnetar to lose spin significantly over a timescale ≈ a few × {10}-3 s, the duration of a fast radio burst. Owing to the circumstellar environment of a young magnetar, however, the ≲1 GHz radiation is likely to be absorbed or reflected by the overlying matter. It is shown that the brightness of the remaining (observable) frequencies of ≈ 1 {GHz} and above are on a par with a typical fast radio burst. Unless some spin-up mechanism is available to recover the original high rotation rate that triggered the Schwinger mechanism, the fast radio burst will not be repeated again in the same magnetar.
An efficient parallel algorithm for matrix-vector multiplication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, B.; Leland, R.; Plimpton, S.
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
S-HARP: A parallel dynamic spectral partitioner
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sohn, A.; Simon, H.
1998-01-01
Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
A general purpose subroutine for fast fourier transform on a distributed memory parallel machine
NASA Technical Reports Server (NTRS)
Dubey, A.; Zubair, M.; Grosch, C. E.
1992-01-01
One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.
NASA Technical Reports Server (NTRS)
Chew, W. C.; Song, J. M.; Lu, C. C.; Weedon, W. H.
1995-01-01
In the first phase of our work, we have concentrated on laying the foundation to develop fast algorithms, including the use of recursive structure like the recursive aggregate interaction matrix algorithm (RAIMA), the nested equivalence principle algorithm (NEPAL), the ray-propagation fast multipole algorithm (RPFMA), and the multi-level fast multipole algorithm (MLFMA). We have also investigated the use of curvilinear patches to build a basic method of moments code where these acceleration techniques can be used later. In the second phase, which is mainly reported on here, we have concentrated on implementing three-dimensional NEPAL on a massively parallel machine, the Connection Machine CM-5, and have been able to obtain some 3D scattering results. In order to understand the parallelization of codes on the Connection Machine, we have also studied the parallelization of 3D finite-difference time-domain (FDTD) code with PML material absorbing boundary condition (ABC). We found that simple algorithms like the FDTD with material ABC can be parallelized very well allowing us to solve within a minute a problem of over a million nodes. In addition, we have studied the use of the fast multipole method and the ray-propagation fast multipole algorithm to expedite matrix-vector multiplication in a conjugate-gradient solution to integral equations of scattering. We find that these methods are faster than LU decomposition for one incident angle, but are slower than LU decomposition when many incident angles are needed as in the monostatic RCS calculations.
NASA Astrophysics Data System (ADS)
Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry
2003-08-01
Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Carrera, Mónica; Gallardo, José M; Pascual, Santiago; González, Ángel F; Medina, Isabel
2016-06-16
Anisakids are fish-borne parasites that are responsible for a large number of human infections and allergic reactions around the world. World health organizations and food safety authorities aim to control and prevent this emerging health problem. In the present work, a new method for the fast monitoring of these parasites is described. The strategy is divided in three steps: (i) purification of thermostable proteins from fish-borne parasites (Anisakids), (ii) in-solution HIFU trypsin digestion and (iii) monitoring of several peptide markers by parallel reaction monitoring (PRM) mass spectrometry. This methodology allows the fast detection of Anisakids in <2h. An affordable assay utilizing this methodology will facilitate testing for regulatory and safety applications. The work describes for the first time, the Protein Biomarker Discovery and the Fast Monitoring for the identification and detection of Anisakids in fishery products. The strategy is based on the purification of thermostable proteins, the use of accelerated in-solution trypsin digestions under an ultrasonic field provided by High-Intensity Focused Ultrasound (HIFU) and the monitoring of several peptide biomarkers by Parallel Reaction Monitoring (PRM) Mass Spectrometry in a linear ion trap mass spectrometer. The workflow allows the unequivocal detection of Anisakids, in <2h. The present strategy constitutes the fastest method for Anisakids detection, whose application in the food quality control area, could provide to the authorities an effective and rapid method to guarantee the safety to the consumers. Copyright © 2016 Elsevier B.V. All rights reserved.
High-speed spectral domain optical coherence tomography using non-uniform fast Fourier transform
Chan, Kenny K. H.; Tang, Shuo
2010-01-01
The useful imaging range in spectral domain optical coherence tomography (SD-OCT) is often limited by the depth dependent sensitivity fall-off. Processing SD-OCT data with the non-uniform fast Fourier transform (NFFT) can improve the sensitivity fall-off at maximum depth by greater than 5dB concurrently with a 30 fold decrease in processing time compared to the fast Fourier transform with cubic spline interpolation method. NFFT can also improve local signal to noise ratio (SNR) and reduce image artifacts introduced in post-processing. Combined with parallel processing, NFFT is shown to have the ability to process up to 90k A-lines per second. High-speed SD-OCT imaging is demonstrated at camera-limited 100 frames per second on an ex-vivo squid eye. PMID:21258551
FastMag: Fast micromagnetic simulator for complex magnetic structures (invited)
NASA Astrophysics Data System (ADS)
Chang, R.; Li, S.; Lubarda, M. V.; Livshitz, B.; Lomakin, V.
2011-04-01
A fast micromagnetic simulator (FastMag) for general problems is presented. FastMag solves the Landau-Lifshitz-Gilbert equation and can handle multiscale problems with a high computational efficiency. The simulator derives its high performance from efficient methods for evaluating the effective field and from implementations on massively parallel graphics processing unit (GPU) architectures. FastMag discretizes the computational domain into tetrahedral elements and therefore is highly flexible for general problems. The magnetostatic field is computed via the superposition principle for both volume and surface parts of the computational domain. This is accomplished by implementing efficient quadrature rules and analytical integration for overlapping elements in which the integral kernel is singular. Thus, discretized superposition integrals are computed using a nonuniform grid interpolation method, which evaluates the field from N sources at N collocated observers in O(N) operations. This approach allows handling objects of arbitrary shape, allows easily calculating of the field outside the magnetized domains, does not require solving a linear system of equations, and requires little memory. FastMag is implemented on GPUs with ?> GPU-central processing unit speed-ups of 2 orders of magnitude. Simulations are shown of a large array of magnetic dots and a recording head fully discretized down to the exchange length, with over a hundred million tetrahedral elements on an inexpensive desktop computer.
Fast parallel tandem mass spectral library searching using GPU hardware acceleration
Baumgardner, Lydia Ashleigh; Shanmugam, Avinash Kumar; Lam, Henry; Eng, Jimmy K.; Martin, Daniel B.
2011-01-01
Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching) is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment. PMID:21545112
Bit error rate tester using fast parallel generation of linear recurring sequences
Pierson, Lyndon G.; Witzke, Edward L.; Maestas, Joseph H.
2003-05-06
A fast method for generating linear recurring sequences by parallel linear recurring sequence generators (LRSGs) with a feedback circuit optimized to balance minimum propagation delay against maximal sequence period. Parallel generation of linear recurring sequences requires decimating the sequence (creating small contiguous sections of the sequence in each LRSG). A companion matrix form is selected depending on whether the LFSR is right-shifting or left-shifting. The companion matrix is completed by selecting a primitive irreducible polynomial with 1's most closely grouped in a corner of the companion matrix. A decimation matrix is created by raising the companion matrix to the (n*k).sup.th power, where k is the number of parallel LRSGs and n is the number of bits to be generated at a time by each LRSG. Companion matrices with 1's closely grouped in a corner will yield sparse decimation matrices. A feedback circuit comprised of XOR logic gates implements the decimation matrix in hardware. Sparse decimation matrices can be implemented with minimum number of XOR gates, and therefore a minimum propagation delay through the feedback circuit. The LRSG of the invention is particularly well suited to use as a bit error rate tester on high speed communication lines because it permits the receiver to synchronize to the transmitted pattern within 2n bits.
Upper Mantle Responses to India-Eurasia Collision in Indochina, Malaysia, and the South China Sea
NASA Astrophysics Data System (ADS)
Hongsresawat, S.; Russo, R. M.
2016-12-01
We present new shear wave splitting and splitting intensity measurements from SK(K)S phases recorded at seismic stations of the Malaysian National Seismic Network. These results, in conjunction with results from Tibet and Yunnan provide a basis for testing the degree to which Indochina and South China Sea upper mantle fabrics are responses to India-Eurasia collision. Upper mantle fabrics derived from shear wave splitting measurements in Yunnan and eastern Tibet parallel geodetic surface motions north of 26°N, requiring transmission of tractions from upper mantle depths to surface, or consistent deformation boundary conditions throughout the upper 200 km of crust and mantle. Shear wave splitting fast trends and surface velocities diverge in eastern Yunnan and south of 26°N, indicating development of an asthenospheric layer that decouples crust and upper mantle, or corner flow above the subducted Indo-Burma slab. E-W fast shear wave splitting trends southwest of 26°N/104°E indicate strong gradients in any asthenospheric infiltration. Possible upper mantle flow regimes beneath Indochina include development of olivine b-axis anisotropic symmetry due to high strain and hydrous conditions in the syntaxis/Indo-Burma mantle wedge (i.e., southward flow), development of strong upper mantle corner flow in the Indo-Burma wedge with olivine a-axis anisotropic symmetry (i.e., westward flow), and simple asthenospheric flow due to eastward motion of Sundaland shearing underlying asthenosphere. Further south, shear-wave splitting delay times at Malaysian stations vary from 0.5 seconds on the Malay Peninsula to over 2 seconds at stations on Borneo. Splitting fast trends at Borneo stations and Singapore trend NE-SW, but in northern Peninsular Malaysia, the splitting fast polarization direction is NW-SE, parallel to the trend of the Peninsula. Thus, there is a sharp transition from low delay time and NW-SE fast polarization to high delay times and fast polarization directions that parallel the strike of the now-inoperative spreading center in the South China Sea. This transition appears to occur in the central portion of Peninsular Malaysia and may mark the boundary between Tethyan upper mantle extruded from the India-Asia collision zone and supra-subduction upper mantle of the Indonesian arc.
Fast Mapping Across Time: Memory Processes Support Children's Retention of Learned Words.
Vlach, Haley A; Sandhofer, Catherine M
2012-01-01
Children's remarkable ability to map linguistic labels to referents in the world is commonly called fast mapping. The current study examined children's (N = 216) and adults' (N = 54) retention of fast-mapped words over time (immediately, after a 1-week delay, and after a 1-month delay). The fast mapping literature often characterizes children's retention of words as consistently high across timescales. However, the current study demonstrates that learners forget word mappings at a rapid rate. Moreover, these patterns of forgetting parallel forgetting functions of domain-general memory processes. Memory processes are critical to children's word learning and the role of one such process, forgetting, is discussed in detail - forgetting supports extended mapping by promoting the memory and generalization of words and categories.
Very fast motion planning for highly dexterous-articulated robots
NASA Technical Reports Server (NTRS)
Challou, Daniel J.; Gini, Maria; Kumar, Vipin
1994-01-01
Due to the inherent danger of space exploration, the need for greater use of teleoperated and autonomous robotic systems in space-based applications has long been apparent. Autonomous and semi-autonomous robotic devices have been proposed for carrying out routine functions associated with scientific experiments aboard the shuttle and space station. Finally, research into the use of such devices for planetary exploration continues. To accomplish their assigned tasks, all such autonomous and semi-autonomous devices will require the ability to move themselves through space without hitting themselves or the objects which surround them. In space it is important to execute the necessary motions correctly when they are first attempted because repositioning is expensive in terms of both time and resources (e.g., fuel). Finally, such devices will have to function in a variety of different environments. Given these constraints, a means for fast motion planning to insure the correct movement of robotic devices would be ideal. Unfortunately, motion planning algorithms are rarely used in practice because of their computational complexity. Fast methods have been developed for detecting imminent collisions, but the more general problem of motion planning remains computationally intractable. However, in this paper we show how the use of multicomputers and appropriate parallel algorithms can substantially reduce the time required to synthesize paths for dexterous articulated robots with a large number of joints. We have developed a parallel formulation of the Randomized Path Planner proposed by Barraquand and Latombe. We have shown that our parallel formulation is capable of formulating plans in a few seconds or less on various parallel architectures including: the nCUBE2 multicomputer with up to 1024 processors (nCUBE2 is a registered trademark of the nCUBE corporation), and a network of workstations.
Zhu, Lei; Yin, Qiuyuan; Irwin, David M; Zhang, Shuyi
2015-01-01
Bats are an ideal mammalian group for exploring adaptations to fasting due to their large variety of diets and because fasting is a regular part of their life cycle. Mammals fed on a carbohydrate-rich diet experience a rapid decrease in blood glucose levels during a fast, thus, the development of mechanisms to resist the consequences of regular fasts, experienced on a daily basis, must have been crucial in the evolution of frugivorous bats. Phosphoenolpyruvate carboxykinase 1 (PEPCK1, encoded by the Pck1 gene) is the rate-limiting enzyme in gluconeogenesis and is largely responsible for the maintenance of glucose homeostasis during fasting in fruit-eating bats. To test whether Pck1 has experienced adaptive evolution in frugivorous bats, we obtained Pck1 coding sequence from 20 species of bats, including five Old World fruit bats (OWFBs) (Pteropodidae) and two New World fruit bats (NWFBs) (Phyllostomidae). Our molecular evolutionary analyses of these sequences revealed that Pck1 was under purifying selection in both Old World and New World fruit bats with no evidence of positive selection detected in either ancestral branch leading to fruit bats. Interestingly, however, six specific amino acid substitutions were detected on the ancestral lineage of OWFBs. In addition, we found considerable evidence for parallel evolution, at the amino acid level, between the PEPCK1 sequences of Old World fruit bats and New World fruit bats. Test for parallel evolution showed that four parallel substitutions (Q276R, R503H, I558V and Q593R) were driven by natural selection. Our study provides evidence that Pck1 underwent parallel evolution between Old World and New World fruit bats, two lineages of mammals that feed on a carbohydrate-rich diet and experience regular periods of fasting as part of their life cycle.
Irwin, David M.; Zhang, Shuyi
2015-01-01
Bats are an ideal mammalian group for exploring adaptations to fasting due to their large variety of diets and because fasting is a regular part of their life cycle. Mammals fed on a carbohydrate-rich diet experience a rapid decrease in blood glucose levels during a fast, thus, the development of mechanisms to resist the consequences of regular fasts, experienced on a daily basis, must have been crucial in the evolution of frugivorous bats. Phosphoenolpyruvate carboxykinase 1 (PEPCK1, encoded by the Pck1 gene) is the rate-limiting enzyme in gluconeogenesis and is largely responsible for the maintenance of glucose homeostasis during fasting in fruit-eating bats. To test whether Pck1 has experienced adaptive evolution in frugivorous bats, we obtained Pck1 coding sequence from 20 species of bats, including five Old World fruit bats (OWFBs) (Pteropodidae) and two New World fruit bats (NWFBs) (Phyllostomidae). Our molecular evolutionary analyses of these sequences revealed that Pck1 was under purifying selection in both Old World and New World fruit bats with no evidence of positive selection detected in either ancestral branch leading to fruit bats. Interestingly, however, six specific amino acid substitutions were detected on the ancestral lineage of OWFBs. In addition, we found considerable evidence for parallel evolution, at the amino acid level, between the PEPCK1 sequences of Old World fruit bats and New World fruit bats. Test for parallel evolution showed that four parallel substitutions (Q276R, R503H, I558V and Q593R) were driven by natural selection. Our study provides evidence that Pck1 underwent parallel evolution between Old World and New World fruit bats, two lineages of mammals that feed on a carbohydrate-rich diet and experience regular periods of fasting as part of their life cycle. PMID:25807515
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1989-01-01
The data parallel implementation of a particle simulation for hypersonic rarefied flow described by Dagum associates a single parallel data element with each particle in the simulation. The simulated space is divided into discrete regions called cells containing a variable and constantly changing number of particles. The implementation requires a global sort of the parallel data elements so as to arrange them in an order that allows immediate access to the information associated with cells in the simulation. Described here is a very fast algorithm for performing the necessary ranking of the parallel data elements. The performance of the new algorithm is compared with that of the microcoded instruction for ranking on the Connection Machine.
Wang, Zihao; Chen, Yu; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Liu, Zhiyong; Sun, Fei; Zhang, Fa
2018-03-01
Electron tomography (ET) is an important technique for studying the three-dimensional structures of the biological ultrastructure. Recently, ET has reached sub-nanometer resolution for investigating the native and conformational dynamics of macromolecular complexes by combining with the sub-tomogram averaging approach. Due to the limited sampling angles, ET reconstruction typically suffers from the "missing wedge" problem. Using a validation procedure, iterative compressed-sensing optimized nonuniform fast Fourier transform (NUFFT) reconstruction (ICON) demonstrates its power in restoring validated missing information for a low-signal-to-noise ratio biological ET dataset. However, the huge computational demand has become a bottleneck for the application of ICON. In this work, we implemented a parallel acceleration technology ICON-many integrated core (MIC) on Xeon Phi cards to address the huge computational demand of ICON. During this step, we parallelize the element-wise matrix operations and use the efficient summation of a matrix to reduce the cost of matrix computation. We also developed parallel versions of NUFFT on MIC to achieve a high acceleration of ICON by using more efficient fast Fourier transform (FFT) calculation. We then proposed a hybrid task allocation strategy (two-level load balancing) to improve the overall performance of ICON-MIC by making full use of the idle resources on Tianhe-2 supercomputer. Experimental results using two different datasets show that ICON-MIC has high accuracy in biological specimens under different noise levels and a significant acceleration, up to 13.3 × , compared with the CPU version. Further, ICON-MIC has good scalability efficiency and overall performance on Tianhe-2 supercomputer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Malony, Allen D; Shende, Sameer
This is the final progress report for the FastOS (Phase 2) (FastOS-2) project with Argonne National Laboratory and the University of Oregon (UO). The project started at UO on July 1, 2008 and ran until April 30, 2010, at which time a six-month no-cost extension began. The FastOS-2 work at UO delivered excellent results in all research work areas: * scalable parallel monitoring * kernel-level performance measurement * parallel I/0 system measurement * large-scale and hybrid application performance measurement * onlne scalable performance data reduction and analysis * binary instrumentation
A Low-Power High-Speed Smart Sensor Design for Space Exploration Missions
NASA Technical Reports Server (NTRS)
Fang, Wai-Chi
1997-01-01
A low-power high-speed smart sensor system based on a large format active pixel sensor (APS) integrated with a programmable neural processor for space exploration missions is presented. The concept of building an advanced smart sensing system is demonstrated by a system-level microchip design that is composed with an APS sensor, a programmable neural processor, and an embedded microprocessor in a SOI CMOS technology. This ultra-fast smart sensor system-on-a-chip design mimics what is inherent in biological vision systems. Moreover, it is programmable and capable of performing ultra-fast machine vision processing in all levels such as image acquisition, image fusion, image analysis, scene interpretation, and control functions. The system provides about one tera-operation-per-second computing power which is a two order-of-magnitude increase over that of state-of-the-art microcomputers. Its high performance is due to massively parallel computing structures, high data throughput rates, fast learning capabilities, and advanced VLSI system-on-a-chip implementation.
Analysis of fast and slow responses in AC conductance curves for p-type SiC MOS capacitors
NASA Astrophysics Data System (ADS)
Karamoto, Yuki; Zhang, Xufang; Okamoto, Dai; Sometani, Mitsuru; Hatakeyama, Tetsuo; Harada, Shinsuke; Iwamuro, Noriyuki; Yano, Hiroshi
2018-06-01
We used a conductance method to investigate the interface characteristics of a SiO2/p-type 4H-SiC MOS structure fabricated by dry oxidation. It was found that the measured equivalent parallel conductance–frequency (G p/ω–f) curves were not symmetric, showing that there existed both high- and low-frequency signals. We attributed high-frequency responses to fast interface states and low-frequency responses to near-interface oxide traps. To analyze the fast interface states, Nicollian’s standard conductance method was applied in the high-frequency range. By extracting the high-frequency responses from the measured G p/ω–f curves, the characteristics of the low-frequency responses were reproduced by Cooper’s model, which considers the effect of near-interface traps on the G p/ω–f curves. The corresponding density distribution of slow traps as a function of energy level was estimated.
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
A fast mass spring model solver for high-resolution elastic objects
NASA Astrophysics Data System (ADS)
Zheng, Mianlun; Yuan, Zhiyong; Zhu, Weixu; Zhang, Guian
2017-03-01
Real-time simulation of elastic objects is of great importance for computer graphics and virtual reality applications. The fast mass spring model solver can achieve visually realistic simulation in an efficient way. Unfortunately, this method suffers from resolution limitations and lack of mechanical realism for a surface geometry model, which greatly restricts its application. To tackle these problems, in this paper we propose a fast mass spring model solver for high-resolution elastic objects. First, we project the complex surface geometry model into a set of uniform grid cells as cages through *cages mean value coordinate method to reflect its internal structure and mechanics properties. Then, we replace the original Cholesky decomposition method in the fast mass spring model solver with a conjugate gradient method, which can make the fast mass spring model solver more efficient for detailed surface geometry models. Finally, we propose a graphics processing unit accelerated parallel algorithm for the conjugate gradient method. Experimental results show that our method can realize efficient deformation simulation of 3D elastic objects with visual reality and physical fidelity, which has a great potential for applications in computer animation.
NASA Astrophysics Data System (ADS)
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke
2018-01-01
Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.
A Parallel Fast Sweeping Method for the Eikonal Equation
NASA Astrophysics Data System (ADS)
Baker, B.
2017-12-01
Recently, there has been an exciting emergence of probabilistic methods for travel time tomography. Unlike gradient-based optimization strategies, probabilistic tomographic methods are resistant to becoming trapped in a local minimum and provide a much better quantification of parameter resolution than, say, appealing to ray density or performing checkerboard reconstruction tests. The benefits associated with random sampling methods however are only realized by successive computation of predicted travel times in, potentially, strongly heterogeneous media. To this end this abstract is concerned with expediting the solution of the Eikonal equation. While many Eikonal solvers use a fast marching method, the proposed solver will use the iterative fast sweeping method because the eight fixed sweep orderings in each iteration are natural targets for parallelization. To reduce the number of iterations and grid points required the high-accuracy finite difference stencil of Nobel et al., 2014 is implemented. A directed acyclic graph (DAG) is created with a priori knowledge of the sweep ordering and finite different stencil. By performing a topological sort of the DAG sets of independent nodes are identified as candidates for concurrent updating. Additionally, the proposed solver will also address scalability during earthquake relocation, a necessary step in local and regional earthquake tomography and a barrier to extending probabilistic methods from active source to passive source applications, by introducing an asynchronous parallel forward solve phase for all receivers in the network. Synthetic examples using the SEG over-thrust model will be presented.
Fasching, George E.
1977-03-08
An improved high-voltage pulse generator has been provided which is especially useful in ultrasonic testing of rock core samples. An N number of capacitors are charged in parallel to V volts and at the proper instance are coupled in series to produce a high-voltage pulse of N times V volts. Rapid switching of the capacitors from the paralleled charging configuration to the series discharging configuration is accomplished by using silicon-controlled rectifiers which are chain self-triggered following the initial triggering of a first one of the rectifiers connected between the first and second of the plurality of charging capacitors. A timing and triggering circuit is provided to properly synchronize triggering pulses to the first SCR at a time when the charging voltage is not being applied to the parallel-connected charging capacitors. Alternate circuits are provided for controlling the application of the charging voltage from a charging circuit to be applied to the parallel capacitors which provides a selection of at least two different intervals in which the charging voltage is turned "off" to allow the SCR's connecting the capacitors in series to turn "off" before recharging begins. The high-voltage pulse-generating circuit including the N capacitors and corresponding SCR's which connect the capacitors in series when triggered "on" further includes diodes and series-connected inductors between the parallel-connected charging capacitors which allow sufficiently fast charging of the capacitors for a high pulse repetition rate and yet allow considerable control of the decay time of the high-voltage pulses from the pulse-generating circuit.
The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kargaran, Hamed, E-mail: h-kargaran@sbu.ac.ir; Minuchehr, Abdolhamid; Zolfaghari, Ahmad
The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG) have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL-MODE and SHARED-MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showedmore » a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core) for GLOBAL-MODE and SHARED-MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.« less
Parallel algorithms for placement and routing in VLSI design. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Brouwer, Randall Jay
1991-01-01
The computational requirements for high quality synthesis, analysis, and verification of very large scale integration (VLSI) designs have rapidly increased with the fast growing complexity of these designs. Research in the past has focused on the development of heuristic algorithms, special purpose hardware accelerators, or parallel algorithms for the numerous design tasks to decrease the time required for solution. Two new parallel algorithms are proposed for two VLSI synthesis tasks, standard cell placement and global routing. The first algorithm, a parallel algorithm for global routing, uses hierarchical techniques to decompose the routing problem into independent routing subproblems that are solved in parallel. Results are then presented which compare the routing quality to the results of other published global routers and which evaluate the speedups attained. The second algorithm, a parallel algorithm for cell placement and global routing, hierarchically integrates a quadrisection placement algorithm, a bisection placement algorithm, and the previous global routing algorithm. Unique partitioning techniques are used to decompose the various stages of the algorithm into independent tasks which can be evaluated in parallel. Finally, results are presented which evaluate the various algorithm alternatives and compare the algorithm performance to other placement programs. Measurements are presented on the parallel speedups available.
YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste
Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less
NASA Astrophysics Data System (ADS)
Newman, Gregory A.
2014-01-01
Many geoscientific applications exploit electrostatic and electromagnetic fields to interrogate and map subsurface electrical resistivity—an important geophysical attribute for characterizing mineral, energy, and water resources. In complex three-dimensional geologies, where many of these resources remain to be found, resistivity mapping requires large-scale modeling and imaging capabilities, as well as the ability to treat significant data volumes, which can easily overwhelm single-core and modest multicore computing hardware. To treat such problems requires large-scale parallel computational resources, necessary for reducing the time to solution to a time frame acceptable to the exploration process. The recognition that significant parallel computing processes must be brought to bear on these problems gives rise to choices that must be made in parallel computing hardware and software. In this review, some of these choices are presented, along with the resulting trade-offs. We also discuss future trends in high-performance computing and the anticipated impact on electromagnetic (EM) geophysics. Topics discussed in this review article include a survey of parallel computing platforms, graphics processing units to multicore CPUs with a fast interconnect, along with effective parallel solvers and associated solver libraries effective for inductive EM modeling and imaging.
A fast pulse design for parallel excitation with gridding conjugate gradient.
Feng, Shuo; Ji, Jim
2013-01-01
Parallel excitation (pTx) is recognized as a crucial technique in high field MRI to address the transmit field inhomogeneity problem. However, it can be time consuming to design pTx pulses which is not desirable. In this work, we propose a pulse design with gridding conjugate gradient (CG) based on the small-tip-angle approximation. The two major time consuming matrix-vector multiplications are substituted by two operators which involves with FFT and gridding only. Simulation results have shown that the proposed method is 3 times faster than conventional method and the memory cost is reduced by 1000 times.
NASA Technical Reports Server (NTRS)
Schriver, D.; Ashour-Abdalla, M.; Strangeway, R. J.; Richard, R. L.; Klezting, C.; Dotan, Y.; Wygant, J.
2003-01-01
The discrete aurora results when energized electrons bombard the Earth's atmosphere at high latitudes. This paper examines the physical processes that can cause field-aligned acceleration of plasma particles in the auroral region. A data and theoretical study has been carried out to examine the acceleration mechanisms that operate in the auroral zone and to identi@ the magnetospheric drivers of these acceleration mechanisms. The observations used in the study were collected by the Fast Auroral Snapshot (FAST) and Polar satellites when the two satellites were in approximate magnetic conjunction in the auroral region. During these events FAST was in the middle of the auroral zone and Polar was above the auroral zone in the near-Earth plasma sheet. Polar data were used to determine the conditions in the magnetotail at the time field-aligned acceleration was measured by FAST in the auroral zone. For each of the magnetotail drivers identified in the data study, the physics of field-aligned acceleration in the auroral region was examined using existing theoretical efforts and/or a long-system particle in cell simulation to model the magnetically connected region between the two satellites. Results from the study indicate that there are three main drivers of auroral acceleration: (1) field-aligned currents that lead to quasistatic parallel potential drops (parallel electric fields), (2) earthward flow of high-energy plasma beams from the magnetotail into the auroral zone that lead to quasistatic parallel potential drops, and (3) large-amplitude Alfven waves that propagate into the auroral region from the magnetotail. The events examined thus far confm the previously established invariant latitudinal dependence of the drivers and show a strong dependence on magnetic activity. Alfven waves tend to occur primarily at the poleward edge of the auroral region during more magnetically active times and are correlated with intense electron precipitation. At lower latitudes away from the poleward edge of the auroral zone is the primary field-aligned current region which results in the classical field- aligned acceleration associated with the auroral zone (electrons earthward and ion beams tailward). During times of high magnetic activity, high-energy ion beams originating from the magnetotail are observed within, and overlapping, the regions of primary and return field-aligned current. Along the field lines where the high-energy magnetotail ion beams are located, field-aligned acceleration can occur in the auroral zone leading to precipitating electrons and upwelling ionospheric ion beams. Field-aligned currents are present during both quiet and active times, while the Alfven waves and magnetotail ion beams were observed only during more magnetically active events.
Symplectic molecular dynamics simulations on specially designed parallel computers.
Borstnik, Urban; Janezic, Dusanka
2005-01-01
We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
Generating unstructured nuclear reactor core meshes in parallel
Jain, Rajeev; Tautges, Timothy J.
2014-10-24
Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less
Ergül, Özgür
2011-11-01
Fast and accurate solutions of large-scale electromagnetics problems involving homogeneous dielectric objects are considered. Problems are formulated with the electric and magnetic current combined-field integral equation and discretized with the Rao-Wilton-Glisson functions. Solutions are performed iteratively by using the multilevel fast multipole algorithm (MLFMA). For the solution of large-scale problems discretized with millions of unknowns, MLFMA is parallelized on distributed-memory architectures using a rigorous technique, namely, the hierarchical partitioning strategy. Efficiency and accuracy of the developed implementation are demonstrated on very large problems involving as many as 100 million unknowns.
Petrović, Z Lj; Phelps, A V
2009-12-01
Absolute spectral emissivities for Doppler broadened H(alpha) profiles are measured and compared with predictions of energetic hydrogen ion, atom, and molecule behavior in low-current electrical discharges in H2 at very high electric field E to gas density N ratios E/N and low values of Nd , where d is the parallel-plate electrode separation. These observations reflect the energy and angular distributions for the excited atoms and quantitatively test features of multiple-scattering kinetic models in weakly ionized hydrogen in the presence of an electric field that are not tested by the spatial distributions of H(alpha) emission. Absolute spectral intensities agree well with predictions. Asymmetries in Doppler profiles observed parallel to the electric field at 4
NASA Technical Reports Server (NTRS)
Sanyal, Soumya; Jain, Amit; Das, Sajal K.; Biswas, Rupak
2003-01-01
In this paper, we propose a distributed approach for mapping a single large application to a heterogeneous grid environment. To minimize the execution time of the parallel application, we distribute the mapping overhead to the available nodes of the grid. This approach not only provides a fast mapping of tasks to resources but is also scalable. We adopt a hierarchical grid model and accomplish the job of mapping tasks to this topology using a scheduler tree. Results show that our three-phase algorithm provides high quality mappings, and is fast and scalable.
High-Dose Neutron Detector Development Using 10B Coated Cells
DOE Office of Scientific and Technical Information (OSTI.GOV)
Menlove, Howard Olsen; Henzlova, Daniela
2016-11-08
During FY16 the boron-lined parallel-plate technology was optimized to fully benefit from its fast timing characteristics in order to enhance its high count rate capability. To facilitate high count rate capability, a novel fast amplifier with timing and operating properties matched to the detector characteristics was developed and implemented in the 8” boron plate detector that was purchased from PDT. Each of the 6 sealed-cells was connected to a fast amplifier with corresponding List mode readout from each amplifier. The FY16 work focused on improvements in the boron-10 coating materials and procedures at PDT to significantly improve the neutron detectionmore » efficiency. An improvement in the efficiency of a factor of 1.5 was achieved without increasing the metal backing area for the boron coating. This improvement has allowed us to operate the detector in gamma-ray backgrounds that are four orders of magnitude higher than was previously possible while maintaining a relatively high counting efficiency for neutrons. This improvement in the gamma-ray rejection is a key factor in the development of the high dose neutron detector.« less
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
2D-RBUC for efficient parallel compression of residuals
NASA Astrophysics Data System (ADS)
Đurđević, Đorđe M.; Tartalja, Igor I.
2018-02-01
In this paper, we present a method for lossless compression of residuals with an efficient SIMD parallel decompression. The residuals originate from lossy or near lossless compression of height fields, which are commonly used to represent models of terrains. The algorithm is founded on the existing RBUC method for compression of non-uniform data sources. We have adapted the method to capture 2D spatial locality of height fields, and developed the data decompression algorithm for modern GPU architectures already present even in home computers. In combination with the point-level SIMD-parallel lossless/lossy high field compression method HFPaC, characterized by fast progressive decompression and seamlessly reconstructed surface, the newly proposed method trades off small efficiency degradation for a non negligible compression ratio (measured up to 91%) benefit.
Modularized Parallel Neutron Instrument Simulation on the TeraGrid
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Meili; Cobb, John W; Hagen, Mark E
2007-01-01
In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serialmore » instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evalua-tion, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new fea-tures seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented.« less
Holographic memory for high-density data storage and high-speed pattern recognition
NASA Astrophysics Data System (ADS)
Gu, Claire
2002-09-01
As computers and the internet become faster and faster, more and more information is transmitted, received, and stored everyday. The demand for high density and fast access time data storage is pushing scientists and engineers to explore all possible approaches including magnetic, mechanical, optical, etc. Optical data storage has already demonstrated its potential in the competition against other storage technologies. CD and DVD are showing their advantages in the computer and entertainment market. What motivated the use of optical waves to store and access information is the same as the motivation for optical communication. Light or an optical wave has an enormous capacity (or bandwidth) to carry information because of its short wavelength and parallel nature. In optical storage, there are two types of mechanism, namely localized and holographic memories. What gives the holographic data storage an advantage over localized bit storage is the natural ability to read the stored information in parallel, therefore, meeting the demand for fast access. Another unique feature that makes the holographic data storage attractive is that it is capable of performing associative recall at an incomparable speed. Therefore, volume holographic memory is particularly suitable for high-density data storage and high-speed pattern recognition. In this paper, we review previous works on volume holographic memories and discuss the challenges for this technology to become a reality.
Wideband aperture array using RF channelizers and massively parallel digital 2D IIR filterbank
NASA Astrophysics Data System (ADS)
Sengupta, Arindam; Madanayake, Arjuna; Gómez-García, Roberto; Engeberg, Erik D.
2014-05-01
Wideband receive-mode beamforming applications in wireless location, electronically-scanned antennas for radar, RF sensing, microwave imaging and wireless communications require digital aperture arrays that offer a relatively constant far-field beam over several octaves of bandwidth. Several beamforming schemes including the well-known true time-delay and the phased array beamformers have been realized using either finite impulse response (FIR) or fast Fourier transform (FFT) digital filter-sum based techniques. These beamforming algorithms offer the desired selectivity at the cost of a high computational complexity and frequency-dependant far-field array patterns. A novel approach to receiver beamforming is the use of massively parallel 2-D infinite impulse response (IIR) fan filterbanks for the synthesis of relatively frequency independent RF beams at an order of magnitude lower multiplier complexity compared to FFT or FIR filter based conventional algorithms. The 2-D IIR filterbanks demand fast digital processing that can support several octaves of RF bandwidth, fast analog-to-digital converters (ADCs) for RF-to-bits type direct conversion of wideband antenna element signals. Fast digital implementation platforms that can realize high-precision recursive filter structures necessary for real-time beamforming, at RF radio bandwidths, are also desired. We propose a novel technique that combines a passive RF channelizer, multichannel ADC technology, and single-phase massively parallel 2-D IIR digital fan filterbanks, realized at low complexity using FPGA and/or ASIC technology. There exists native support for a larger bandwidth than the maximum clock frequency of the digital implementation technology. We also strive to achieve More-than-Moore throughput by processing a wideband RF signal having content with N-fold (B = N Fclk/2) bandwidth compared to the maximum clock frequency Fclk Hz of the digital VLSI platform under consideration. Such increase in bandwidth is achieved without use of polyphase signal processing or time-interleaved ADC methods. That is, all digital processors operate at the same Fclk clock frequency without phasing, while wideband operation is achieved by sub-sampling of narrower sub-bands at the the RF channelizer outputs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seal, Sudip K; Perumalla, Kalyan S; Hirshman, Steven Paul
2013-01-01
Simulations that require solutions of block tridiagonal systems of equations rely on fast parallel solvers for runtime efficiency. Leading parallel solvers that are highly effective for general systems of equations, dense or sparse, are limited in scalability when applied to block tridiagonal systems. This paper presents scalability results as well as detailed analyses of two parallel solvers that exploit the special structure of block tridiagonal matrices to deliver superior performance, often by orders of magnitude. A rigorous analysis of their relative parallel runtimes is shown to reveal the existence of a critical block size that separates the parameter space spannedmore » by the number of block rows, the block size and the processor count, into distinct regions that favor one or the other of the two solvers. Dependence of this critical block size on the above parameters as well as on machine-specific constants is established. These formal insights are supported by empirical results on up to 2,048 cores of a Cray XT4 system. To the best of our knowledge, this is the highest reported scalability for parallel block tridiagonal solvers to date.« less
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Development of radiation tolerant monolithic active pixel sensors with fast column parallel read-out
NASA Astrophysics Data System (ADS)
Koziel, M.; Dorokhov, A.; Fontaine, J.-C.; De Masi, R.; Winter, M.
2010-12-01
Monolithic active pixel sensors (MAPS) [1] (Turchetta et al., 2001) are being developed at IPHC—Strasbourg to equip the EUDET telescope [2] (Haas, 2006) and vertex detectors for future high energy physics experiments, including the STAR upgrade at RHIC [3] (T.S. Collaboration, 2005) and the CBM experiment at FAIR/GSI [4] (Heuser, 2006). High granularity, low material budget and high read-out speed are systematically required for most applications, complemented, for some of them, with high radiation tolerance. A specific column-parallel architecture, implemented in the MIMOSA-22 sensor, was developed to achieve fast read-out MAPS. Previous studies of the front-end architecture integrated in this sensor, which includes in-pixel amplification, have shown that the fixed pattern noise increase consecutive to ionizing radiation can be controlled by means of a negative feedback [5] (Hu-Guo et al., 2008). However, an unexpected rise of the temporal noise was observed. A second version of this chip (MIMOSA-22bis) was produced in order to search for possible improvements of the radiation tolerance, regarding this type of noise. In this prototype, the feedback transistor was tuned in order to mitigate the sensitivity of the pixel to ionizing radiation. The performances of the pixels after irradiation were investigated for two types of feedback transistors: enclosed layout transistor (ELT) [6] (Snoeys et al., 2000) and "standard" transistor with either large or small transconductance. The noise performance of all test structures was studied in various conditions (expected in future experiments) regarding temperature, integration time and ionizing radiation dose. Test results are presented in this paper. Based on these observations, ideas for further improvement of the radiation tolerance of column parallel MAPS are derived.
Spectral Anisotropy of Magnetic Field Fluctuations around Ion Scales in the Fast Solar Wind
NASA Astrophysics Data System (ADS)
Wang, X.; Tu, C.; He, J.; Marsch, E.; Wang, L.
2016-12-01
The power spectra of magnetic field at ion scales are significantly influenced by waves and structures. In this work, we study the ΘRB angle dependence of the contribution of waves on the spectral index of the magnetic field. Wavelet technique is applied to the high time-resolution magnetic field data from WIND spacecraft measurements in the fast solar wind. It is found that around ion scales, the parallel spectrum has a slope of -4.6±0.1 originally. When we remove the waves, which correspond to the data points with relatively larger value of magnetic helicity, the parallel spectrum gets shallower gradually to -3.2±0.2. However, the perpendicular spectrum does not change significantly during the wave-removal process, and its slope remains -3.1±0.1. It means that when the waves are removed from the original data, the spectral anisotropy gets weaker. This result may help us understand the physical nature of the spectral anisotropy around the ion scales.
Search for auroral belt E-parallel fields with high-velocity barium ion injections
NASA Technical Reports Server (NTRS)
Heppner, J. P.; Ledley, B. G.; Miller, M. L.; Marionni, P. A.; Pongratz, M. B.
1989-01-01
In April 1984, four high-velocity shaped-charge Ba(+) injections were conducted from two sounding rockets at 770-975 km over northern Alaska under conditions of active auroral and magnetic disturbance. Spatial ionization (brightness) profiles of high-velocity Ba(+) clouds from photometric scans following each release were found to be consistent with the 28-sec theoretical time constant for Ba photoionization determined by Carlsten (1975). These observations therefore revealed no evidence of anomalous fast ionization predicted by the Alfven critical velocity hypothesis.
Performance of parallel computation using CUDA for solving the one-dimensional elasticity equations
NASA Astrophysics Data System (ADS)
Darmawan, J. B. B.; Mungkasi, S.
2017-01-01
In this paper, we investigate the performance of parallel computation in solving the one-dimensional elasticity equations. Elasticity equations are usually implemented in engineering science. Solving these equations fast and efficiently is desired. Therefore, we propose the use of parallel computation. Our parallel computation uses CUDA of the NVIDIA. Our research results show that parallel computation using CUDA has a great advantage and is powerful when the computation is of large scale.
Anisotropic Behaviour of Magnetic Power Spectra in Solar Wind Turbulence.
NASA Astrophysics Data System (ADS)
Banerjee, S.; Saur, J.; Gerick, F.; von Papen, M.
2017-12-01
Introduction:High altitude fast solar wind turbulence (SWT) shows different spectral properties as a function of the angle between the flow direction and the scale dependent mean magnetic field (Horbury et al., PRL, 2008). The average magnetic power contained in the near perpendicular direction (80º-90º) was found to be approximately 5 times larger than the average power in the parallel direction (0º- 10º). In addition, the parallel power spectra was found to give a steeper (-2) power law than the perpendicular power spectral density (PSD) which followed a near Kolmogorov slope (-5/3). Similar anisotropic behaviour has also been observed (Chen et al., MNRAS, 2011) for slow solar wind (SSW), but using a different method exploiting multi-spacecraft data of Cluster. Purpose:In the current study, using Ulysses data, we investigate (i) the anisotropic behaviour of near ecliptic slow solar wind using the same methodology (described below) as that of Horbury et al. (2008) and (ii) the dependence of the anisotropic behaviour of SWT as a function of the heliospheric latitude.Method:We apply the wavelet method to calculate the turbulent power spectra of the magnetic field fluctuations parallel and perpendicular to the local mean magnetic field (LMF). According to Horbury et al., LMF for a given scale (or size) is obtained using an envelope of the envelope of that size. Results:(i) SSW intervals always show near -5/3 perpendicular spectra. Unlike the fast solar wind (FSW) intervals, for SSW, we often find intervals where power parallel to the mean field is not observed. For a few intervals with sufficient power in parallel direction, slow wind turbulence also exhibit -2 parallel spectra similar to FSW.(ii) The behaviours of parallel and perpendicular power spectra are found to be independent of the heliospheric latitude. Conclusion:In the current study we do not find significant influence of the heliospheric latitude on the spectral slopes of parallel and perpendicular magnetic spectra. This indicates that the spectral anisotropy in parallel and perpendicular direction is governed by intrinsic properties of SWT.
Parallel MR imaging: a user's guide.
Glockner, James F; Hu, Houchun H; Stanley, David W; Angelos, Lisa; King, Kevin
2005-01-01
Parallel imaging is a recently developed family of techniques that take advantage of the spatial information inherent in phased-array radiofrequency coils to reduce acquisition times in magnetic resonance imaging. In parallel imaging, the number of sampled k-space lines is reduced, often by a factor of two or greater, thereby significantly shortening the acquisition time. Parallel imaging techniques have only recently become commercially available, and the wide range of clinical applications is just beginning to be explored. The potential clinical applications primarily involve reduction in acquisition time, improved spatial resolution, or a combination of the two. Improvements in image quality can be achieved by reducing the echo train lengths of fast spin-echo and single-shot fast spin-echo sequences. Parallel imaging is particularly attractive for cardiac and vascular applications and will likely prove valuable as 3-T body and cardiovascular imaging becomes part of standard clinical practice. Limitations of parallel imaging include reduced signal-to-noise ratio and reconstruction artifacts. It is important to consider these limitations when deciding when to use these techniques. (c) RSNA, 2005.
FastID: Extremely Fast Forensic DNA Comparisons
2017-05-19
FastID: Extremely Fast Forensic DNA Comparisons Darrell O. Ricke, PhD Bioengineering Systems & Technologies Massachusetts Institute of...Technology Lincoln Laboratory Lexington, MA USA Darrell.Ricke@ll.mit.edu Abstract—Rapid analysis of DNA forensic samples can have a critical impact on...time sensitive investigations. Analysis of forensic DNA samples by massively parallel sequencing is creating the next gold standard for DNA
fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data.
Hung, Ling-Hong; Samudrala, Ram
2014-06-15
fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) © The Author 2014. Published by Oxford University Press.
Novel Scalable 3-D MT Inverse Solver
NASA Astrophysics Data System (ADS)
Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.
2016-12-01
We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
NASA Astrophysics Data System (ADS)
Kim, Sun Ho; Hwang, Yong Seok; Jeong, Seung Ho; Wang, Son Jong; Kwak, Jong Gu
2017-10-01
An efficient current drive scheme in central or off-axis region is required for the steady state operation of tokamak fusion reactors. The current drive by using the fast wave in frequency range higher than two times lower hybrid resonance (w>2wlh) could be such a scheme in high density, high temperature reactor-grade tokamak plasmas. First, it has relatively higher parallel electric field to the magnetic field favorable to the current generation, compared to fast waves in other frequency range. Second, it can deeply penetrate into high density plasmas compared to the slow wave in the same frequency range. Third, parasitic coupling to the slow wave can contribute also to the current drive avoiding parametric instability, thermal mode conversion and ion heating occured in the frequency range w<2wlh. In this study, the propagation boundary, accessibility, and the energy flow of the fast wave are given via cold dispersion relation and group velocity. The power absorption and current drive efficiency are discussed qualitatively through the hot dispersion relation and the polarization. Finally, those characteristics are confirmed with ray tracing code GENRAY for the KSTAR plasmas.
A survey of GPU-based acceleration techniques in MRI reconstructions
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou
2018-01-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community. PMID:29675361
A survey of GPU-based acceleration techniques in MRI reconstructions.
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou; Liang, Dong
2018-03-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community.
Clock Agreement Among Parallel Supercomputer Nodes
Jones, Terry R.; Koenig, Gregory A.
2014-04-30
This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.
NASA Astrophysics Data System (ADS)
Bogdanov, Valery L.; Boyce-Jacino, Michael
1999-05-01
Confined arrays of biochemical probes deposited on a solid support surface (analytical microarray or 'chip') provide an opportunity to analysis multiple reactions simultaneously. Microarrays are increasingly used in genetics, medicine and environment scanning as research and analytical instruments. A power of microarray technology comes from its parallelism which grows with array miniaturization, minimization of reagent volume per reaction site and reaction multiplexing. An optical detector of microarray signals should combine high sensitivity, spatial and spectral resolution. Additionally, low-cost and a high processing rate are needed to transfer microarray technology into biomedical practice. We designed an imager that provides confocal and complete spectrum detection of entire fluorescently-labeled microarray in parallel. Imager uses microlens array, non-slit spectral decomposer, and high- sensitive detector (cooled CCD). Two imaging channels provide a simultaneous detection of localization, integrated and spectral intensities for each reaction site in microarray. A dimensional matching between microarray and imager's optics eliminates all in moving parts in instrumentation, enabling highly informative, fast and low-cost microarray detection. We report theory of confocal hyperspectral imaging with microlenses array and experimental data for implementation of developed imager to detect fluorescently labeled microarray with a density approximately 103 sites per cm2.
Cahyadi, Harsono; Iwatsuka, Junichi; Minamikawa, Takeo; Niioka, Hirohiko; Araki, Tsutomu; Hashimoto, Mamoru
2013-09-01
We develop a coherent anti-Stokes Raman scattering (CARS) microscopy system equipped with a tunable picosecond laser for high-speed wavelength scanning. An acousto-optic tunable filter (AOTF) is integrated in the laser cavity to enable wavelength scanning by varying the radio frequency waves applied to the AOTF crystal. An end mirror attached on a piezoelectric actuator and a pair of parallel plates driven by galvanometer motors are also introduced into the cavity to compensate for changes in the cavity length during wavelength scanning to allow synchronization with another picosecond laser. We demonstrate fast spectral imaging of 3T3-L1 adipocytes every 5 cm-1 in the Raman spectral region around 2850 cm-1 with an image acquisition time of 120 ms. We also demonstrate fast switching of Raman shifts between 2100 and 2850 cm-1, corresponding to CD2 symmetric stretching and CH2 symmetric stretching vibrations, respectively. The fast-switching CARS images reveal different locations of recrystallized deuterated and nondeuterated stearic acid.
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG
NASA Astrophysics Data System (ADS)
Griffiths, M. K.; Fedun, V.; Erdélyi, R.
2015-03-01
Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Large-scale trench-normal mantle flow beneath central South America
NASA Astrophysics Data System (ADS)
Reiss, M. C.; Rümpker, G.; Wölbern, I.
2018-01-01
We investigate the anisotropic properties of the fore-arc region of the central Andean margin between 17-25°S by analyzing shear-wave splitting from teleseismic and local earthquakes from the Nazca slab. With partly over ten years of recording time, the data set is uniquely suited to address the long-standing debate about the mantle flow field at the South American margin and in particular whether the flow field beneath the slab is parallel or perpendicular to the trench. Our measurements suggest two anisotropic layers located within the crust and mantle beneath the stations, respectively. The teleseismic measurements show a moderate change of fast polarizations from North to South along the trench ranging from parallel to subparallel to the absolute plate motion and, are oriented mostly perpendicular to the trench. Shear-wave splitting measurements from local earthquakes show fast polarizations roughly aligned trench-parallel but exhibit short-scale variations which are indicative of a relatively shallow origin. Comparisons between fast polarization directions from local earthquakes and the strike of the local fault systems yield a good agreement. To infer the parameters of the lower anisotropic layer we employ an inversion of the teleseismic waveforms based on two-layer models, where the anisotropy of the upper (crustal) layer is constrained by the results from the local splitting. The waveform inversion yields a mantle layer that is best characterized by a fast axis parallel to the absolute plate motion which is more-or-less perpendicular to the trench. This orientation is likely caused by a combination of the fossil crystallographic preferred orientation of olivine within the slab and entrained mantle flow beneath the slab. The anisotropy within the crust of the overriding continental plate is explained by the shape-preferred orientation of micro-cracks in relation to local fault zones which are oriented parallel to the overall strike of the Andean range. Our results do not provide any evidence for a significant contribution of trench-parallel mantle flow beneath the subducting slab.
Ting, Samuel T; Ahmad, Rizwan; Jin, Ning; Craft, Jason; Serafim da Silveira, Juliana; Xue, Hui; Simonetti, Orlando P
2017-04-01
Sparsity-promoting regularizers can enable stable recovery of highly undersampled magnetic resonance imaging (MRI), promising to improve the clinical utility of challenging applications. However, lengthy computation time limits the clinical use of these methods, especially for dynamic MRI with its large corpus of spatiotemporal data. Here, we present a holistic framework that utilizes the balanced sparse model for compressive sensing and parallel computing to reduce the computation time of cardiac MRI recovery methods. We propose a fast, iterative soft-thresholding method to solve the resulting ℓ1-regularized least squares problem. In addition, our approach utilizes a parallel computing environment that is fully integrated with the MRI acquisition software. The methodology is applied to two formulations of the multichannel MRI problem: image-based recovery and k-space-based recovery. Using measured MRI data, we show that, for a 224 × 144 image series with 48 frames, the proposed k-space-based approach achieves a mean reconstruction time of 2.35 min, a 24-fold improvement compared a reconstruction time of 55.5 min for the nonlinear conjugate gradient method, and the proposed image-based approach achieves a mean reconstruction time of 13.8 s. Our approach can be utilized to achieve fast reconstruction of large MRI datasets, thereby increasing the clinical utility of reconstruction techniques based on compressed sensing. Magn Reson Med 77:1505-1515, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
A parallel approach of COFFEE objective function to multiple sequence alignment
NASA Astrophysics Data System (ADS)
Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.
2015-09-01
The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.
NASA Astrophysics Data System (ADS)
Gershman, D. J.; Figueroa-Vinas, A.; Dorelli, J.; Goldstein, M. L.; Shuster, J. R.; Avanov, L. A.; Boardsen, S. A.; Stawarz, J. E.; Schwartz, S. J.; Schiff, C.; Lavraud, B.; Saito, Y.; Paterson, W. R.; Giles, B. L.; Pollock, C. J.; Strangeway, R. J.; Russell, C. T.; Torbert, R. B.; Moore, T. E.; Burch, J. L.
2017-12-01
Measurements from the Fast Plasma Investigation (FPI) on NASA's Magnetospheric Multiscale (MMS) mission have enabled unprecedented analyses of kinetic-scale plasma physics. FPI regularly provides estimates of current density and pressure gradients of sufficient accuracy to evaluate the relative contribution of terms in plasma equations of motion. In addition, high-resolution three-dimensional velocity distribution functions of both ions and electrons provide new insights into kinetic-scale processes. As an example, for a monochromatic kinetic Alfven wave (KAW) we find non-zero, but out-of-phase parallel current density and electric field fluctuations, providing direct confirmation of the conservative energy exchange between the wave field and particles. In addition, we use fluctuations in current density and magnetic field to calculate the perpendicular and parallel wavelengths of the KAW. Furthermore, examination of the electron velocity distribution inside the KAW reveals a population of electrons non-linearly trapped in the kinetic-scale magnetic mirror formed between successive wave peaks. These electrons not only contribute to the wave's parallel electric field but also account for over half of the density fluctuations within the wave, supplying an unexpected mechanism for maintaining quasi-neutrality in a KAW. Finally, we demonstrate that the employed wave vector determination technique is also applicable to broadband fluctuations found in Earth's turbulent magnetosheath.
NASA Astrophysics Data System (ADS)
Wang, Tai-Han; Huang, Da-Nian; Ma, Guo-Qing; Meng, Zhao-Hai; Li, Ye
2017-06-01
With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processing and interpretation of large-scale high-precision data, the use of the graphics processing unit process unit (GPU) and preconditioning methods are very important in the data inversion. In this paper, an improved preconditioned conjugate gradient algorithm is proposed by combining the symmetric successive over-relaxation (SSOR) technique and the incomplete Choleksy decomposition conjugate gradient algorithm (ICCG). Since preparing the preconditioner requires extra time, a parallel implement based on GPU is proposed. The improved method is then applied in the inversion of noisecontaminated synthetic data to prove its adaptability in the inversion of 3D FTG data. Results show that the parallel SSOR-ICCG algorithm based on NVIDIA Tesla C2050 GPU achieves a speedup of approximately 25 times that of a serial program using a 2.0 GHz Central Processing Unit (CPU). Real airborne gravity-gradiometry data from Vinton salt dome (southwest Louisiana, USA) are also considered. Good results are obtained, which verifies the efficiency and feasibility of the proposed parallel method in fast inversion of 3D FTG data.
Oke, Olaleke O; Magony, Andor; Anver, Himashi; Ward, Peter D; Jiruska, Premysl; Jefferys, John G R; Vreugdenhil, Martin
2010-04-01
Synchronization of neuronal activity in the visual cortex at low (30-70 Hz) and high gamma band frequencies (> 70 Hz) has been associated with distinct visual processes, but mechanisms underlying high-frequency gamma oscillations remain unknown. In rat visual cortex slices, kainate and carbachol induce high-frequency gamma oscillations (fast-gamma; peak frequency approximately 80 Hz at 37 degrees C) that can coexist with low-frequency gamma oscillations (slow-gamma; peak frequency approximately 50 Hz at 37 degrees C) in the same column. Current-source density analysis showed that fast-gamma was associated with rhythmic current sink-source sequences in layer III and slow-gamma with rhythmic current sink-source sequences in layer V. Fast-gamma and slow-gamma were not phase-locked. Slow-gamma power fluctuations were unrelated to fast-gamma power fluctuations, but were modulated by the phase of theta (3-8 Hz) oscillations generated in the deep layers. Fast-gamma was spatially less coherent than slow-gamma. Fast-gamma and slow-gamma were dependent on gamma-aminobutyric acid (GABA)(A) receptors, alpha-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptors and gap-junctions, their frequencies were reduced by thiopental and were weakly dependent on cycle amplitude. Fast-gamma and slow-gamma power were differentially modulated by thiopental and adenosine A(1) receptor blockade, and their frequencies were differentially modulated by N-methyl-D-aspartate (NMDA) receptors, GluK1 subunit-containing receptors and persistent sodium currents. Our data indicate that fast-gamma and slow-gamma both depend on and are paced by recurrent inhibition, but have distinct pharmacological modulation profiles. The independent co-existence of fast-gamma and slow-gamma allows parallel processing of distinct aspects of vision and visual perception. The visual cortex slice provides a novel in vitro model to study cortical high-frequency gamma oscillations.
Biomechanical Comparison of Parallel and Crossed Suture Repair for Longitudinal Meniscus Tears.
Milchteim, Charles; Branch, Eric A; Maughon, Ty; Hughey, Jay; Anz, Adam W
2016-04-01
Longitudinal meniscus tears are commonly encountered in clinical practice. Meniscus repair devices have been previously tested and presented; however, prior studies have not evaluated repair construct designs head to head. This study compared a new-generation meniscus repair device, SpeedCinch, with a similar established device, Fast-Fix 360, and a parallel repair construct to a crossed construct. Both devices utilize self-adjusting No. 2-0 ultra-high molecular weight polyethylene (UHMWPE) and 2 polyether ether ketone (PEEK) anchors. Crossed suture repair constructs have higher failure loads and stiffness compared with simple parallel constructs. The newer repair device would exhibit similar performance to an established device. Controlled laboratory study. Sutures were placed in an open fashion into the body and posterior horn regions of the medial and lateral menisci in 16 cadaveric knees. Evaluation of 2 repair devices and 2 repair constructs created 4 groups: 2 parallel vertical sutures created with the Fast-Fix 360 (2PFF), 2 crossed vertical sutures created with the Fast-Fix 360 (2XFF), 2 parallel vertical sutures created with the SpeedCinch (2PSC), and 2 crossed vertical sutures created with the SpeedCinch (2XSC). After open placement of the repair construct, each meniscus was explanted and tested to failure on a uniaxial material testing machine. All data were checked for normality of distribution, and 1-way analysis of variance by ranks was chosen to evaluate for statistical significance of maximum failure load and stiffness between groups. Statistical significance was defined as P < .05. The mean maximum failure loads ± 95% CI (range) were 89.6 ± 16.3 N (125.7-47.8 N) (2PFF), 72.1 ± 11.7 N (103.4-47.6 N) (2XFF), 71.9 ± 15.5 N (109.4-41.3 N) (2PSC), and 79.5 ± 25.4 N (119.1-30.9 N) (2XSC). Interconstruct comparison revealed no statistical difference between all 4 constructs regarding maximum failure loads (P = .49). Stiffness values were also similar, with no statistical difference on comparison (P = .28). Both devices in the current study had similar failure load and stiffness when 2 vertical or 2 crossed sutures were tested in cadaveric human menisci. Simple parallel vertical sutures perform similarly to crossed suture patterns at the time of implantation.
NASA Astrophysics Data System (ADS)
Lu, Dong-dong; Gu, Jin-liang; Luo, Hong-e.; Xia, Yan
2017-10-01
According to specific requirements of the X-ray machine system for measuring velocity of outfield projectile, a DC high voltage power supply system is designed for the high voltage or the smaller current. The system comprises: a series resonant circuit is selected as a full-bridge inverter circuit; a high-frequency zero-current soft switching of a high-voltage power supply is realized by PWM output by STM32; a nanocrystalline alloy transformer is chosen as a high-frequency booster transformer; and the related parameters of an LCC series-parallel resonant are determined according to the preset parameters of the transformer. The concrete method includes: a LCC series parallel resonant circuit and a voltage doubling circuit are stimulated by using MULTISM and MATLAB; selecting an optimal solution and an optimal parameter of all parts after stimulation analysis; and finally verifying the correctness of the parameter by stimulation of the whole system. Through stimulation analysis, the output voltage of the series-parallel resonant circuit gets to 10KV in 28s: then passing through the voltage doubling circuit, the output voltage gets to 120KV in one hour. According to the system, the wave range of the output voltage is so small as to provide the stable X-ray supply for the X-ray machine for measuring velocity of outfield projectile. It is fast in charging and high in efficiency.
Pteros 2.0: Evolution of the fast parallel molecular analysis library for C++ and python.
Yesylevskyy, Semen O
2015-07-15
Pteros is the high-performance open-source library for molecular modeling and analysis of molecular dynamics trajectories. Starting from version 2.0 Pteros is available for C++ and Python programming languages with very similar interfaces. This makes it suitable for writing complex reusable programs in C++ and simple interactive scripts in Python alike. New version improves the facilities for asynchronous trajectory reading and parallel execution of analysis tasks by introducing analysis plugins which could be written in either C++ or Python in completely uniform way. The high level of abstraction provided by analysis plugins greatly simplifies prototyping and implementation of complex analysis algorithms. Pteros is available for free under Artistic License from http://sourceforge.net/projects/pteros/. © 2015 Wiley Periodicals, Inc.
MMS Observations of Parallel Electric Fields During a Quasi-Perpendicular Bow Shock Crossing
NASA Astrophysics Data System (ADS)
Goodrich, K.; Schwartz, S. J.; Ergun, R.; Wilder, F. D.; Holmes, J.; Burch, J. L.; Gershman, D. J.; Giles, B. L.; Khotyaintsev, Y. V.; Le Contel, O.; Lindqvist, P. A.; Strangeway, R. J.; Russell, C.; Torbert, R. B.
2016-12-01
Previous observations of the terrestrial bow shock have frequently shown large-amplitude fluctuations in the parallel electric field. These parallel electric fields are seen as both nonlinear solitary structures, such as double layers and electron phase-space holes, and short-wavelength waves, which can reach amplitudes greater than 100 mV/m. The Magnetospheric Multi-Scale (MMS) Mission has crossed the Earth's bow shock more than 200 times. The parallel electric field signatures observed in these crossings are seen in very discrete packets and evolve over time scales of less than a second, indicating the presence of a wealth of kinetic-scale activity. The high time resolution of the Fast Particle Instrument (FPI) available on MMS offers greater detail of the kinetic-scale physics that occur at bow shocks than ever before, allowing greater insight into the overall effect of these observed electric fields. We present a characterization of these parallel electric fields found in a single bow shock event and how it reflects the kinetic-scale activity that can occur at the terrestrial bow shock.
Fast parallel algorithm for slicing STL based on pipeline
NASA Astrophysics Data System (ADS)
Ma, Xulong; Lin, Feng; Yao, Bo
2016-05-01
In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
Fox, W.; Sciortino, F.; v. Stechow, A.; ...
2017-03-21
We report detailed laboratory observations of the structure of a reconnection current sheet in a two-fluid plasma regime with a guide magnetic field. We observe and quantitatively analyze the quadrupolar electron pressure variation in the ion-diffusion region, as originally predicted by extended magnetohydrodynamics simulations. The projection of the electron pressure gradient parallel to the magnetic field contributes significantly to balancing the parallel electric field, and the resulting cross-field electron jets in the reconnection layer are diamagnetic in origin. Furthermore, these results demonstrate how parallel and perpendicular force balance are coupled in guide field reconnection and confirm basic theoretical models ofmore » the importance of electron pressure gradients for obtaining fast magnetic reconnection.« less
NASA Astrophysics Data System (ADS)
Czermak, A.; Zalewska, A.; Dulny, B.; Sowicki, B.; Jastrząb, M.; Nowak, L.
2004-07-01
The needs for real time monitoring of the hadrontherapy beam intensity and profile as well as requirements for the fast dosimetry using Monolithic Active Pixel Sensors (MAPS) forced the SUCIMA collaboration to the design of the unique Data Acquisition System (DAQ SUCIMA Imager). The DAQ system has been developed on one of the most advanced XILINX Field Programmable Gate Array chip - VERTEX II. The dedicated multifunctional electronic board for the detector's analogue signals capture, their parallel digital processing and final data compression as well as transmission through the high speed USB 2.0 port has been prototyped and tested.
Automatic recognition of vector and parallel operations in a higher level language
NASA Technical Reports Server (NTRS)
Schneck, P. B.
1971-01-01
A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.
NASA Astrophysics Data System (ADS)
Wang, Yonggang; Tong, Liqing; Liu, Kefu
2017-06-01
The purpose of impedance matching for a Marx generator and DBD lamp is to limit the output current of the Marx generator, provide a large discharge current at ignition, and obtain fast voltage rising/falling edges and large overshoot. In this paper, different impedance matching circuits (series inductor, parallel capacitor, and series inductor combined with parallel capacitor) are analyzed. It demonstrates that a series inductor could limit the Marx current. However, the discharge current is also limited. A parallel capacitor could provide a large discharge current, but the Marx current is also enlarged. A series inductor combined with a parallel capacitor takes full advantage of the inductor and capacitor, and avoids their shortcomings. Therefore, it is a good solution. Experimental results match the theoretical analysis well and show that both the series inductor and parallel capacitor improve the performance of the system. However, the series inductor combined with the parallel capacitor has the best performance. Compared with driving the DBD lamp with a Marx generator directly, an increase of 97.3% in radiant power and an increase of 59.3% in system efficiency are achieved using this matching circuit.
NASA Technical Reports Server (NTRS)
Barnard, Stephen T.; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
The design of a parallel implementation of multilevel recursive spectral bisection is described. The goal is to implement a code that is fast enough to enable dynamic repartitioning of adaptive meshes.
NASA Astrophysics Data System (ADS)
Cao, L.; Kao, H.; Wang, K.; Wang, Z.
2016-12-01
Haida Gwaii is located along the transpressive Queen Charlotte margin between the Pacific (PA) and North America (NA) plates. The highly oblique relative plate motion is partitioned, with the strike-slip component accommodated by the Queen Charlotte Fault (QCF) and the convergent component by a thrust fault offshore. To understand how the presence of a obliquely subducting slab influences shear deformation of the plate boundary, we investigate mantle anisotropy by analyzing shear-wave splitting of teleseismic SKS phases recorded at 17 seismic stations in and around Haida Gwaii. We used the MFAST program to determine the polarization direction of the fast wave (φ) and the delay time (δt) between the fast and slow phases. The fast directions derived from stations on Haida Gwaii and two stations to the north on the Alaska Panhandle are predominantly margin-parallel (NNW). However, away from the plate boundary, the fast direction transitions to WSW-trending, very oblique or perpendicular to the plate boundary. Because the average delay time of 0.6-2.45 s is much larger than values based on an associated local S phase splitting analysis in the same study area, it is reasonable to infer that most of the anisotropy from our SKS analysis originates from the upper mantle and is associated with lattice-preferred orientation of anisotropic minerals. The margin-parallel fast direction within about 100 km of the QCF (average φ = -40º and δt = 1.2 s) is likely induced by the PA-NA shear motion. The roughly margin-normal fast directions farther away, although more scatterd, are consistent with that previously observed in the NA continent and are attributed to the absolute motion of the NA plate. However, the transition between the two regimes based on our SKS analysis appears to be gradual, suggesting that the plate boundary shear influences a much broader region at mantle depths than would be inferred from the surface trace of the QCF. We think this is due to the presence of a subducted portion of the Pacific plate. Because the slab travels mostly in the strike direction, it is expected to induce margin-parallel shear deformation of the mantle material. This result has importance implications to the geodynamics of transpressive plate margins.
Parallelization and automatic data distribution for nuclear reactor simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liebrock, L.M.
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Motion streaks in fast motion rivalry cause orientation-selective suppression.
Apthorp, Deborah; Wenderoth, Peter; Alais, David
2009-05-14
We studied binocular rivalry between orthogonally translating arrays of random Gaussian blobs and measured the strength of rivalry suppression for static oriented probes. Suppression depth was quantified by expressing monocular probe thresholds during dominance relative to thresholds during suppression. Rivalry between two fast motions or two slow motions was compared in order to test the suggestion that fast-moving objects leave oriented "motion streaks" due to temporal integration (W. S. Geisler, 1999). If fast motions do produce motion streaks, then fast motion rivalry might also entail rivalry between the orthogonal streak orientations. We tested this using a static oriented probe that was aligned either parallel to the motion trajectory (hence collinear with the "streaks") or was orthogonal to the trajectory, predicting that rivalry suppression would be greater for parallel probes, and only for rivalry between fast motions. Results confirmed that suppression depth did depend on probe orientation for fast motion but not for slow motion. Further experiments showed that threshold elevations for the oriented probe during suppression exhibited clear orientation tuning. However, orientation-tuned elevations were also present during dominance, suggesting within-channel masking as the basis of the extra-deep suppression. In sum, the presence of orientation-dependent suppression in fast motion rivalry is consistent with the "motion streaks" hypothesis.
Novel Optical Processor for Phased Array Antenna.
1992-10-20
parallel glass slide into the signal beam optical loop. The parallel glass acts like a variable phase shifter to the signal beam simulating phase drift...A list of possible designs are given as follows , _ _ Velocity fa (100dB/cm) Lumit Wavelength I M2I1 TeO2 Longi 4.2 /m/ns about 3 GHz 1.4 4m 34 Fast...subject to achievable acoustic frequency, the preferred materials are the slow shear wave in TeO2 , the fast shear wave in TeO2 or the shear waves in
Scalable parallel communications
NASA Technical Reports Server (NTRS)
Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.
1992-01-01
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
Bit-parallel arithmetic in a massively-parallel associative processor
NASA Technical Reports Server (NTRS)
Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.
1992-01-01
A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Feng, Shuo
2014-01-01
Parallel excitation (pTx) techniques with multiple transmit channels have been widely used in high field MRI imaging to shorten the RF pulse duration and/or reduce the specific absorption rate (SAR). However, the efficiency of pulse design still needs substantial improvement for practical real-time applications. In this paper, we present a detailed description of a fast pulse design method with Fourier domain gridding and a conjugate gradient method. Simulation results of the proposed method show that the proposed method can design pTx pulses at an efficiency 10 times higher than that of the conventional conjugate-gradient based method, without reducing the accuracy of the desirable excitation patterns. PMID:24834420
Feng, Shuo; Ji, Jim
2014-04-01
Parallel excitation (pTx) techniques with multiple transmit channels have been widely used in high field MRI imaging to shorten the RF pulse duration and/or reduce the specific absorption rate (SAR). However, the efficiency of pulse design still needs substantial improvement for practical real-time applications. In this paper, we present a detailed description of a fast pulse design method with Fourier domain gridding and a conjugate gradient method. Simulation results of the proposed method show that the proposed method can design pTx pulses at an efficiency 10 times higher than that of the conventional conjugate-gradient based method, without reducing the accuracy of the desirable excitation patterns.
Taki, Hirofumi; Nagatani, Yoshiki; Matsukawa, Mami; Kanai, Hiroshi; Izumi, Shin-Ichi
2017-10-01
Ultrasound signals that pass through cancellous bone may be considered to consist of two longitudinal waves, which are called fast and slow waves. Accurate decomposition of these fast and slow waves is considered to be highly beneficial in determination of the characteristics of cancellous bone. In the present study, a fast decomposition method using a wave transfer function with a phase rotation parameter was applied to received signals that have passed through bovine bone specimens with various bone volume to total volume (BV/TV) ratios in a simulation study, where the elastic finite-difference time-domain method is used and the ultrasound wave propagated parallel to the bone axes. The proposed method succeeded to decompose both fast and slow waves accurately; the normalized residual intensity was less than -19.5 dB when the specimen thickness ranged from 4 to 7 mm and the BV/TV value ranged from 0.144 to 0.226. There was a strong relationship between the phase rotation value and the BV/TV value. The ratio of the peak envelope amplitude of the decomposed fast wave to that of the slow wave increased monotonically with increasing BV/TV ratio, indicating the high performance of the proposed method in estimation of the BV/TV value in cancellous bone.
Parallel processing via a dual olfactory pathway in the honeybee.
Brill, Martin F; Rosenbaum, Tobias; Reus, Isabelle; Kleineidam, Christoph J; Nawrot, Martin P; Rössler, Wolfgang
2013-02-06
In their natural environment, animals face complex and highly dynamic olfactory input. Thus vertebrates as well as invertebrates require fast and reliable processing of olfactory information. Parallel processing has been shown to improve processing speed and power in other sensory systems and is characterized by extraction of different stimulus parameters along parallel sensory information streams. Honeybees possess an elaborate olfactory system with unique neuronal architecture: a dual olfactory pathway comprising a medial projection-neuron (PN) antennal lobe (AL) protocerebral output tract (m-APT) and a lateral PN AL output tract (l-APT) connecting the olfactory lobes with higher-order brain centers. We asked whether this neuronal architecture serves parallel processing and employed a novel technique for simultaneous multiunit recordings from both tracts. The results revealed response profiles from a high number of PNs of both tracts to floral, pheromonal, and biologically relevant odor mixtures tested over multiple trials. PNs from both tracts responded to all tested odors, but with different characteristics indicating parallel processing of similar odors. Both PN tracts were activated by widely overlapping response profiles, which is a requirement for parallel processing. The l-APT PNs had broad response profiles suggesting generalized coding properties, whereas the responses of m-APT PNs were comparatively weaker and less frequent, indicating higher odor specificity. Comparison of response latencies within and across tracts revealed odor-dependent latencies. We suggest that parallel processing via the honeybee dual olfactory pathway provides enhanced odor processing capabilities serving sophisticated odor perception and olfactory demands associated with a complex olfactory world of this social insect.
Implementation of a high-speed face recognition system that uses an optical parallel correlator.
Watanabe, Eriko; Kodate, Kashiko
2005-02-10
We implement a fully automatic fast face recognition system by using a 1000 frame/s optical parallel correlator designed and assembled by us. The operational speed for the 1:N (i.e., matching one image against N, where N refers to the number of images in the database) identification experiment (4000 face images) amounts to less than 1.5 s, including the preprocessing and postprocessing times. The binary real-only matched filter is devised for the sake of face recognition, and the system is optimized by the false-rejection rate (FRR) and the false-acceptance rate (FAR), according to 300 samples selected by the biometrics guideline. From trial 1:N identification experiments with the optical parallel correlator, we acquired low error rates of 2.6% FRR and 1.3% FAR. Facial images of people wearing thin glasses or heavy makeup that rendered identification difficult were identified with this system.
Using AberOWL for fast and scalable reasoning over BioPortal ontologies.
Slater, Luke; Gkoutos, Georgios V; Schofield, Paul N; Hoehndorf, Robert
2016-08-08
Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided. We apply the AberOWL infrastructure to provide automated reasoning-based access to all accessible and consistent ontologies in BioPortal (368 ontologies). We perform an extensive performance evaluation to determine query times, both for queries of different complexity and for queries that are performed in parallel over the ontologies. We demonstrate that, with the exception of a few ontologies, even complex and parallel queries can now be answered in milliseconds, therefore allowing automated reasoning to be used on a large scale, to run in parallel, and with rapid response times.
A data distributed parallel algorithm for ray-traced volume rendering
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu; Painter, James S.; Hansen, Charles D.; Krogh, Michael F.
1993-01-01
This paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along with their implementation and performance on the Connection Machine CM-5, and networked workstations. This algorithm distributes both the data and the computations to individual processing units to achieve fast, high-quality rendering of high-resolution data. The volume data, once distributed, is left intact. The processing nodes perform local ray tracing of their subvolume concurrently. No communication between processing units is needed during this locally ray-tracing process. A subimage is generated by each processing unit and the final image is obtained by compositing subimages in the proper order, which can be determined a priori. Test results on both the CM-5 and a group of networked workstations demonstrate the practicality of our rendering algorithm and compositing method.
Parallel Computation of Flow in Heterogeneous Media Modelled by Mixed Finite Elements
NASA Astrophysics Data System (ADS)
Cliffe, K. A.; Graham, I. G.; Scheichl, R.; Stals, L.
2000-11-01
In this paper we describe a fast parallel method for solving highly ill-conditioned saddle-point systems arising from mixed finite element simulations of stochastic partial differential equations (PDEs) modelling flow in heterogeneous media. Each realisation of these stochastic PDEs requires the solution of the linear first-order velocity-pressure system comprising Darcy's law coupled with an incompressibility constraint. The chief difficulty is that the permeability may be highly variable, especially when the statistical model has a large variance and a small correlation length. For reasonable accuracy, the discretisation has to be extremely fine. We solve these problems by first reducing the saddle-point formulation to a symmetric positive definite (SPD) problem using a suitable basis for the space of divergence-free velocities. The reduced problem is solved using parallel conjugate gradients preconditioned with an algebraically determined additive Schwarz domain decomposition preconditioner. The result is a solver which exhibits a good degree of robustness with respect to the mesh size as well as to the variance and to physically relevant values of the correlation length of the underlying permeability field. Numerical experiments exhibit almost optimal levels of parallel efficiency. The domain decomposition solver (DOUG, http://www.maths.bath.ac.uk/~parsoft) used here not only is applicable to this problem but can be used to solve general unstructured finite element systems on a wide range of parallel architectures.
NASA Technical Reports Server (NTRS)
Norris, Andrew
2003-01-01
The goal was to perform 3D simulation of GE90 combustor, as part of full turbofan engine simulation. Requirements of high fidelity as well as fast turn-around time require massively parallel code. National Combustion Code (NCC) was chosen for this task as supports up to 999 processors and includes state-of-the-art combustion models. Also required is ability to take inlet conditions from compressor code and give exit conditions to turbine code.
NASA Astrophysics Data System (ADS)
Jiang, Xikai; Li, Jiyuan; Zhao, Xujun; Qin, Jian; Karpeev, Dmitry; Hernandez-Ortiz, Juan; de Pablo, Juan J.; Heinonen, Olle
2016-08-01
Large classes of materials systems in physics and engineering are governed by magnetic and electrostatic interactions. Continuum or mesoscale descriptions of such systems can be cast in terms of integral equations, whose direct computational evaluation requires O(N2) operations, where N is the number of unknowns. Such a scaling, which arises from the many-body nature of the relevant Green's function, has precluded wide-spread adoption of integral methods for solution of large-scale scientific and engineering problems. In this work, a parallel computational approach is presented that relies on using scalable open source libraries and utilizes a kernel-independent Fast Multipole Method (FMM) to evaluate the integrals in O(N) operations, with O(N) memory cost, thereby substantially improving the scalability and efficiency of computational integral methods. We demonstrate the accuracy, efficiency, and scalability of our approach in the context of two examples. In the first, we solve a boundary value problem for a ferroelectric/ferromagnetic volume in free space. In the second, we solve an electrostatic problem involving polarizable dielectric bodies in an unbounded dielectric medium. The results from these test cases show that our proposed parallel approach, which is built on a kernel-independent FMM, can enable highly efficient and accurate simulations and allow for considerable flexibility in a broad range of applications.
HeinzelCluster: accelerated reconstruction for FORE and OSEM3D.
Vollmar, S; Michel, C; Treffert, J T; Newport, D F; Casey, M; Knöss, C; Wienhard, K; Liu, X; Defrise, M; Heiss, W D
2002-08-07
Using iterative three-dimensional (3D) reconstruction techniques for reconstruction of positron emission tomography (PET) is not feasible on most single-processor machines due to the excessive computing time needed, especially so for the large sinogram sizes of our high-resolution research tomograph (HRRT). In our first approach to speed up reconstruction time we transform the 3D scan into the format of a two-dimensional (2D) scan with sinograms that can be reconstructed independently using Fourier rebinning (FORE) and a fast 2D reconstruction method. On our dedicated reconstruction cluster (seven four-processor systems, Intel PIII@700 MHz, switched fast ethernet and Myrinet, Windows NT Server), we process these 2D sinograms in parallel. We have achieved a speedup > 23 using 26 processors and also compared results for different communication methods (RPC, Syngo, Myrinet GM). The other approach is to parallelize OSEM3D (implementation of C Michel), which has produced the best results for HRRT data so far and is more suitable for an adequate treatment of the sinogram gaps that result from the detector geometry of the HRRT. We have implemented two levels of parallelization for four dedicated cluster (a shared memory fine-grain level on each node utilizing all four processors and a coarse-grain level allowing for 15 nodes) reducing the time for one core iteration from over 7 h to about 35 min.
Jiang, Xikai; Li, Jiyuan; Zhao, Xujun; ...
2016-08-10
Large classes of materials systems in physics and engineering are governed by magnetic and electrostatic interactions. Continuum or mesoscale descriptions of such systems can be cast in terms of integral equations, whose direct computational evaluation requires O( N 2) operations, where N is the number of unknowns. Such a scaling, which arises from the many-body nature of the relevant Green's function, has precluded wide-spread adoption of integral methods for solution of large-scale scientific and engineering problems. In this work, a parallel computational approach is presented that relies on using scalable open source libraries and utilizes a kernel-independent Fast Multipole Methodmore » (FMM) to evaluate the integrals in O( N) operations, with O( N) memory cost, thereby substantially improving the scalability and efficiency of computational integral methods. We demonstrate the accuracy, efficiency, and scalability of our approach in the context of two examples. In the first, we solve a boundary value problem for a ferroelectric/ferromagnetic volume in free space. In the second, we solve an electrostatic problem involving polarizable dielectric bodies in an unbounded dielectric medium. Lastly, the results from these test cases show that our proposed parallel approach, which is built on a kernel-independent FMM, can enable highly efficient and accurate simulations and allow for considerable flexibility in a broad range of applications.« less
Large-scale trench-perpendicular mantle flow beneath northern Chile
NASA Astrophysics Data System (ADS)
Reiss, M. C.; Rumpker, G.; Woelbern, I.
2017-12-01
We investigate the anisotropic properties of the forearc region of the central Andean margin by analyzing shear-wave splitting from teleseismic and local earthquakes from the Nazca slab. The data stems from the Integrated Plate boundary Observatory Chile (IPOC) located in northern Chile, covering an approximately 120 km wide coastal strip between 17°-25° S with an average station spacing of 60 km. With partly over ten years of data, this data set is uniquely suited to address the long-standing debate about the mantle flow field at the South American margin and in particular whether the flow field beneath the slab is parallel or perpendicular to the trench. Our measurements yield two distinct anisotropic layers. The teleseismic measurements show a change of fast polarizations directions from North to South along the trench ranging from parallel to subparallel to the absolute plate motion and, given the geometry of absolute plate motion and strike of the trench, mostly perpendicular to the trench. Shear-wave splitting from local earthquakes shows fast polarizations roughly aligned trench-parallel but exhibit short-scale variations which are indicative of a relatively shallow source. Comparisons between fast polarization directions and the strike of the local fault systems yield a good agreement. We use forward modelling to test the influence of the upper layer on the teleseismic measurements. We show that the observed variations of teleseismic measurements along the trench are caused by the anisotropy in the upper layer. Accordingly, the mantle layer is best characterized by an anisotropic fast axes parallel to the absolute plate motion which is roughly trench-perpendicular. This anisotropy is likely caused by a combination of crystallographic preferred orientation of the mantle mineral olivine as fossilized anisotropy in the slab and entrained flow beneath the slab. We interpret the upper anisotropic layer to be confined to the crust of the overriding continental plate. This is explained by the shape-preferred orientation of micro-cracks in relation to local fault zones which are oriented parallel the overall strike of the Andean range. Our results do not provide any evidence for a significant contribution of trench-parallel mantle flow beneath the subducting slab to the measurements.
Li, Yiming; Ishitsuka, Yuji; Hedde, Per Niklas; Nienhaus, G Ulrich
2013-06-25
In localization-based super-resolution microscopy, individual fluorescent markers are stochastically photoactivated and subsequently localized within a series of camera frames, yielding a final image with a resolution far beyond the diffraction limit. Yet, before localization can be performed, the subregions within the frames where the individual molecules are present have to be identified-oftentimes in the presence of high background. In this work, we address the importance of reliable molecule identification for the quality of the final reconstructed super-resolution image. We present a fast and robust algorithm (a-livePALM) that vastly improves the molecule detection efficiency while minimizing false assignments that can lead to image artifacts.
NASA Astrophysics Data System (ADS)
Palmesi, P.; Exl, L.; Bruckner, F.; Abert, C.; Suess, D.
2017-11-01
The long-range magnetic field is the most time-consuming part in micromagnetic simulations. Computational improvements can relieve problems related to this bottleneck. This work presents an efficient implementation of the Fast Multipole Method [FMM] for the magnetic scalar potential as used in micromagnetics. The novelty lies in extending FMM to linearly magnetized tetrahedral sources making it interesting also for other areas of computational physics. We treat the near field directly and in use (exact) numerical integration on the multipole expansion in the far field. This approach tackles important issues like the vectorial and continuous nature of the magnetic field. By using FMM the calculations scale linearly in time and memory.
Huang, Shouren; Bergström, Niklas; Yamakawa, Yuji; Senoo, Taku; Ishikawa, Masatoshi
2016-01-01
It is traditionally difficult to implement fast and accurate position regulation on an industrial robot in the presence of uncertainties. The uncertain factors can be attributed either to the industrial robot itself (e.g., a mismatch of dynamics, mechanical defects such as backlash, etc.) or to the external environment (e.g., calibration errors, misalignment or perturbations of a workpiece, etc.). This paper proposes a systematic approach to implement high-performance position regulation under uncertainties on a general industrial robot (referred to as the main robot) with minimal or no manual teaching. The method is based on a coarse-to-fine strategy that involves configuring an add-on module for the main robot’s end effector. The add-on module consists of a 1000 Hz vision sensor and a high-speed actuator to compensate for accumulated uncertainties. The main robot only focuses on fast and coarse motion, with its trajectories automatically planned by image information from a static low-cost camera. Fast and accurate peg-and-hole alignment in one dimension was implemented as an application scenario by using a commercial parallel-link robot and an add-on compensation module with one degree of freedom (DoF). Experimental results yielded an almost 100% success rate for fast peg-in-hole manipulation (with regulation accuracy at about 0.1 mm) when the workpiece was randomly placed. PMID:27483274
High-Frequency Replanning Under Uncertainty Using Parallel Sampling-Based Motion Planning
Sun, Wen; Patil, Sachin; Alterovitz, Ron
2015-01-01
As sampling-based motion planners become faster, they can be re-executed more frequently by a robot during task execution to react to uncertainty in robot motion, obstacle motion, sensing noise, and uncertainty in the robot’s kinematic model. We investigate and analyze high-frequency replanning (HFR), where, during each period, fast sampling-based motion planners are executed in parallel as the robot simultaneously executes the first action of the best motion plan from the previous period. We consider discrete-time systems with stochastic nonlinear (but linearizable) dynamics and observation models with noise drawn from zero mean Gaussian distributions. The objective is to maximize the probability of success (i.e., avoid collision with obstacles and reach the goal) or to minimize path length subject to a lower bound on the probability of success. We show that, as parallel computation power increases, HFR offers asymptotic optimality for these objectives during each period for goal-oriented problems. We then demonstrate the effectiveness of HFR for holonomic and nonholonomic robots including car-like vehicles and steerable medical needles. PMID:26279645
NASA Astrophysics Data System (ADS)
Pinsker, R. I.
2014-10-01
In hot magnetized plasmas, two types of linear collisionless absorption processes are used to heat and drive noninductive current: absorption at ion or electron cyclotron resonances and their harmonics, and absorption by Landau damping and the transit-time-magnetic-pumping (TTMP) interactions. This tutorial discusses the latter process, i.e., parallel interactions between rf waves and electrons in which cyclotron resonance is not involved. Electron damping by the parallel interactions can be important in the ICRF, particularly in the higher harmonic region where competing ion cyclotron damping is weak, as well as in the Lower Hybrid Range of Frequencies (LHRF), which is in the neighborhood of the geometric mean of the ion and electron cyclotron frequencies. On the other hand, absorption by parallel processes is not significant in conventional ECRF schemes. Parallel interactions are especially important for the realization of high current drive efficiency with rf waves, and an application of particular recent interest is current drive with the whistler or helicon wave at high to very high (i.e., the LHRF) ion cyclotron harmonics. The scaling of absorption by parallel interactions with wave frequency is examined and the advantages and disadvantages of fast (helicons/whistlers) and slow (lower hybrid) waves in the LHRF in the context of reactor-grade tokamak plasmas are compared. In this frequency range, both wave modes can propagate in a significant fraction of the discharge volume; the ways in which the two waves can interact with each other are considered. The use of parallel interactions to heat and drive current in practice will be illustrated with examples from past experiments; also looking forward, this tutorial will provide an overview of potential applications in tokamak reactors. Supported by the US Department of Energy under DE-FC02-04ER54698.
[CMACPAR an modified parallel neuro-controller for control processes].
Ramos, E; Surós, R
1999-01-01
CMACPAR is a Parallel Neurocontroller oriented to real time systems as for example Control Processes. Its characteristics are mainly a fast learning algorithm, a reduced number of calculations, great generalization capacity, local learning and intrinsic parallelism. This type of neurocontroller is used in real time applications required by refineries, hydroelectric centers, factories, etc. In this work we present the analysis and the parallel implementation of a modified scheme of the Cerebellar Model CMAC for the n-dimensional space projection using a mean granularity parallel neurocontroller. The proposed memory management allows for a significant memory reduction in training time and required memory size.
Wiens, Curtis N.; Artz, Nathan S.; Jang, Hyungseok; McMillan, Alan B.; Reeder, Scott B.
2017-01-01
Purpose To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. Theory and Methods A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Results Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. Conclusion A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. PMID:27403613
NASA Astrophysics Data System (ADS)
Teng, Y. C.; Kelly, D.; Li, Y.; Zhang, K.
2016-02-01
A new state-of-the-art model (the Fully Adaptive Storm Tide model, FAST) for the prediction of storm surges over complex landscapes is presented. The FAST model is based on the conservation form of the full non-linear depth-averaged long wave equations. The equations are solved via an explicit finite volume scheme with interfacial fluxes being computed via Osher's approximate Riemann solver. Geometric source terms are treated in a high order manner that is well-balanced. The numerical solution technique has been chosen to enable the accurate simulation of wetting and drying over complex topography. Another important feature of the FAST model is the use of a simple underlying Cartesian mesh with tree-based static and dynamic adaptive mesh refinement (AMR). This permits the simulation of unsteady flows over varying landscapes (including localized features such as canals) by locally increasing (or relaxing) grid resolution in a dynamic fashion. The use of (dynamic) AMR lowers the computational cost of the storm surge model whilst retaining high resolution (and thus accuracy) where and when it is required. In additional, the FAST model has been designed to execute in a parallel computational environment with localized time-stepping. The FAST model has already been carefully verified against a series of benchmark type problems (Kelly et al. 2015). Here we present two simulations of the storm tide due to Hurricane Ike(2008) and Hurricane Sandy (2012). The model incorporates high resolution LIDAR data for the major portion of the New York City. Results compare favorably with water elevations measured by NOAA tidal gauges, by mobile sensors deployed and high water marks collected by the USGS.
Guo, Ying; Hou, Yubin; Lu, Qingyou
2014-05-01
We present a completely practical TunaDrive piezo motor. It consists of a central piezo stack sandwiched by two arm piezo stacks and two leg piezo stacks, respectively, which is then sandwiched and spring-clamped by a pair of parallel polished sapphire rods. It works by alternatively fast expanding and contracting the arm/leg stacks while slowly expanding/contracting the central stack simultaneously. The key point is that sufficiently fast expanding and contracting a limb stack can make its two sliding friction forces well cancel, resulting in the total sliding friction force is <10% of the total static friction force, which can help increase output force greatly. The piezo motor's high compactness, precision, and output force make it perfect in building a high-quality harsh-condition (vibration resistant) atomic resolution scanning probe microscope.
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.
Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro
2016-01-01
Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
Toward an automated parallel computing environment for geosciences
NASA Astrophysics Data System (ADS)
Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping
2007-08-01
Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.
Deformation, crystal preferred orientations, and seismic anisotropy in the Earth's D″ layer
NASA Astrophysics Data System (ADS)
Tommasi, Andréa; Goryaeva, Alexandra; Carrez, Philippe; Cordier, Patrick; Mainprice, David
2018-06-01
We use a forward multiscale model that couples atomistic modeling of intracrystalline plasticity mechanisms (dislocation glide ± twinning) in MgSiO3 post-perovskite (PPv) and periclase (MgO) at lower mantle pressures and temperatures to polycrystal plasticity simulations to predict crystal preferred orientations (CPO) development and seismic anisotropy in D″. We model the CPO evolution in aggregates of 70% PPv and 30% MgO submitted to simple shear, axial shortening, and along corner-flow streamlines, which simulate changes in flow orientation similar to those expected at the transition between a downwelling and flow parallel to the core-mantle boundary (CMB) within D″ or between CMB-parallel flow and upwelling at the borders of the large low shear wave velocity provinces (LLSVP) in the lowermost mantle. Axial shortening results in alignment of PPv [010] axes with the shortening direction. Simple shear produces PPv CPO with a monoclinic symmetry that rapidly rotates towards parallelism between the dominant [100](010) slip system and the macroscopic shear. These predictions differ from MgSiO3 post-perovskite textures formed in diamond-anvil cell experiments, but agree with those obtained in simple shear and compression experiments using CaIrO3 post-perovskite. Development of CPO in PPv and MgO results in seismic anisotropy in D″. For shear parallel to the CMB, at low strain, the inclination of ScS, Sdiff, and SKKS fast polarizations and delay times vary depending on the propagation direction. At moderate and high shear strains, all S-waves are polarized nearly horizontally. Downwelling flow produces Sdiff, ScS, and SKKS fast polarization directions and birefringence that vary gradually as a function of the back-azimuth from nearly parallel to inclined by up to 70° to CMB and from null to ∼5%. Change in the flow to shear parallel to the CMB results in dispersion of the CPO, weakening of the anisotropy, and strong azimuthal variation of the S-wave splitting up to 250 km from the corner. Transition from horizontal shear to upwelling also produces weakening of the CPO and complex seismic anisotropy patterns, with dominantly inclined fast ScS and SKKS polarizations, over most of the upwelling path. Models that take into account twinning in PPv explain most observations of seismic anisotropy in D″, but heterogeneity of the flow at scales <1000 km is needed to comply with the seismological evidence for low apparent birefringence in D″.
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-01-01
Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Parallel fast multipole boundary element method applied to computational homogenization
NASA Astrophysics Data System (ADS)
Ptaszny, Jacek
2018-01-01
In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.
NASA Astrophysics Data System (ADS)
Lu, San; Artemyev, A. V.; Angelopoulos, V.
2017-11-01
Magnetotail current sheet thinning is a distinctive feature of substorm growth phase, during which magnetic energy is stored in the magnetospheric lobes. Investigation of charged particle dynamics in such thinning current sheets is believed to be important for understanding the substorm energy storage and the current sheet destabilization responsible for substorm expansion phase onset. We use Time History of Events and Macroscale Interactions during Substorms (THEMIS) B and C observations in 2008 and 2009 at 18 - 25 RE to show that during magnetotail current sheet thinning, the electron temperature decreases (cooling), and the parallel temperature decreases faster than the perpendicular temperature, leading to a decrease of the initially strong electron temperature anisotropy (isotropization). This isotropization cannot be explained by pure adiabatic cooling or by pitch angle scattering. We use test particle simulations to explore the mechanism responsible for the cooling and isotropization. We find that during the thinning, a fast decrease of a parallel electric field (directed toward the Earth) can speed up the electron parallel cooling, causing it to exceed the rate of perpendicular cooling, and thus lead to isotropization, consistent with observation. If the parallel electric field is too small or does not change fast enough, the electron parallel cooling is slower than the perpendicular cooling, so the parallel electron anisotropy grows, contrary to observation. The same isotropization can also be accomplished by an increasing parallel electric field directed toward the equatorial plane. Our study reveals the existence of a large-scale parallel electric field, which plays an important role in magnetotail particle dynamics during the current sheet thinning process.
Further optimization of SeDDaRA blind image deconvolution algorithm and its DSP implementation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Qiheng; Zhang, Jianlin
2011-11-01
Efficient algorithm for blind image deconvolution and its high-speed implementation is of great value in practice. Further optimization of SeDDaRA is developed, from algorithm structure to numerical calculation methods. The main optimization covers that, the structure's modularization for good implementation feasibility, reducing the data computation and dependency of 2D-FFT/IFFT, and acceleration of power operation by segmented look-up table. Then the Fast SeDDaRA is proposed and specialized for low complexity. As the final implementation, a hardware system of image restoration is conducted by using the multi-DSP parallel processing. Experimental results show that, the processing time and memory demand of Fast SeDDaRA decreases 50% at least; the data throughput of image restoration system is over 7.8Msps. The optimization is proved efficient and feasible, and the Fast SeDDaRA is able to support the real-time application.
Rotary fast tool servo system and methods
Montesanti, Richard C.; Trumper, David L.
2007-10-02
A high bandwidth rotary fast tool servo provides tool motion in a direction nominally parallel to the surface-normal of a workpiece at the point of contact between the cutting tool and workpiece. Three or more flexure blades having all ends fixed are used to form an axis of rotation for a swing arm that carries a cutting tool at a set radius from the axis of rotation. An actuator rotates a swing arm assembly such that a cutting tool is moved in and away from the lathe-mounted, rotating workpiece in a rapid and controlled manner in order to machine the workpiece. A pair of position sensors provides rotation and position information for a swing arm to a control system. A control system commands and coordinates motion of the fast tool servo with the motion of a spindle, rotating table, cross-feed slide, and in-feed slide of a precision lathe.
Rotary fast tool servo system and methods
Montesanti, Richard C [Cambridge, MA; Trumper, David L [Plaistow, NH; Kirtley, Jr., James L.
2009-08-18
A high bandwidth rotary fast tool servo provides tool motion in a direction nominally parallel to the surface-normal of a workpiece at the point of contact between the cutting tool and workpiece. Three or more flexure blades having all ends fixed are used to form an axis of rotation for a swing arm that carries a cutting tool at a set radius from the axis of rotation. An actuator rotates a swing arm assembly such that a cutting tool is moved in and away from the lathe-mounted, rotating workpiece in a rapid and controlled manner in order to machine the workpiece. One or more position sensors provides rotation and position information for a swing arm to a control system. A control system commands and coordinates motion of the fast tool servo with the motion of a spindle, rotating table, cross-feed slide, and in-feed slide of a precision lathe.
The Mercury System: Embedding Computation into Disk Drives
2004-08-20
enabling technologies to build extremely fast data search engines . We do this by moving the search closer to the data, and performing it in hardware...engine searches in parallel across a disk or disk surface 2. System Parallelism: Searching is off-loaded to search engines and main processor can
Scalable Static and Dynamic Community Detection Using Grappolo
DOE Office of Scientific and Technical Information (OSTI.GOV)
Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman
Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detectionmore » are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.« less
On Parallelizing Single Dynamic Simulation Using HPC Techniques and APIs of Commercial Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Diao, Ruisheng; Jin, Shuangshuang; Howell, Frederic
Time-domain simulations are heavily used in today’s planning and operation practices to assess power system transient stability and post-transient voltage/frequency profiles following severe contingencies to comply with industry standards. Because of the increased modeling complexity, it is several times slower than real time for state-of-the-art commercial packages to complete a dynamic simulation for a large-scale model. With the growing stochastic behavior introduced by emerging technologies, power industry has seen a growing need for performing security assessment in real time. This paper presents a parallel implementation framework to speed up a single dynamic simulation by leveraging the existing stability model librarymore » in commercial tools through their application programming interfaces (APIs). Several high performance computing (HPC) techniques are explored such as parallelizing the calculation of generator current injection, identifying fast linear solvers for network solution, and parallelizing data outputs when interacting with APIs in the commercial package, TSAT. The proposed method has been tested on a WECC planning base case with detailed synchronous generator models and exhibits outstanding scalable performance with sufficient accuracy.« less
NASA Astrophysics Data System (ADS)
Homburg, Oliver; Jarczynski, Manfred; Mitra, Thomas; Brüning, Stephan
2017-02-01
In the last decade much improvement has been achieved for ultra-short pulse lasers with high repetition rates. This laser technology has vastly matured so that it entered a manifold of industrial applications recently compared to mainly scientific use in the past. Compared to ns-pulse ablation ultra-short pulses in the ps- or even fs regime lead to still colder ablation and further reduced heat-affected zones. This is crucial for micro patterning when structure sizes are getting smaller and requirements are getting stronger at the same time. An additional advantage of ultra-fast processing is its applicability to a large variety of materials, e.g. metals and several high bandgap materials like glass and ceramics. One challenge for ultra-fast micro machining is throughput. The operational capacity of these processes can be maximized by increasing the scan rate or the number of beams - parallel processing. This contribution focuses on process parallelism of ultra-short pulsed lasers with high repetition rate and individually addressable acousto-optical beam modulation. The core of the multi-beam generation is a smooth diffractive beam splitter component with high uniform spots and negligible loss, and a prismatic array compressor to match beam size and pitch. The optical design and the practical realization of an 8 beam processing head in combination with a high average power single mode ultra-short pulsed laser source are presented as well as the currently on-going and promising laboratory research and micro machining results. Finally, an outlook of scaling the processing head to several tens of beams is given.
Prater, Ronald; Moeller, Charles P.; Pinsker, Robert I.; ...
2014-06-26
Fast waves at frequencies far above the ion cyclotron frequency and approaching the lower hybrid frequency (also called “helicons” or “whistlers”) have application to off-axis current drive in tokamaks with high electron beta. The high frequency causes the whistler-like behavior of the wave power nearly following field lines, but with a small radial component, so the waves spiral slowly toward the plasma center. The high frequency also contributes to strong damping. Modeling predicts robust off-axis current drive with good efficiency compared to alternatives in high performance discharges in DIII-D and Fusion Nuclear Science Facility (FNSF) when the electron beta ismore » above about 1.8%. Detailed analysis of ray behavior shows that ray trajectories and damping are deterministic (that is, not strongly affected by plasma profiles or initial ray conditions), unlike the chaotic ray behavior in lower frequency fast wave experiments. Current drive was found to not be sensitive to the launched value of the parallel index of refraction n||, so wave accessibility issues can be reduced. Finally, use of a traveling wave antenna provides a very narrow n|| spectrum, which also helps avoid accessibility problems.« less
2017-01-01
Graphitic carbon anodes have long been used in Li ion batteries due to their combination of attractive properties, such as low cost, high gravimetric energy density, and good rate capability. However, one significant challenge is controlling, and optimizing, the nature and formation of the solid electrolyte interphase (SEI). Here it is demonstrated that carbon coating via chemical vapor deposition (CVD) facilitates high electrochemical performance of carbon anodes. We examine and characterize the substrate/vertical graphene interface (multilayer graphene nanowalls coated onto carbon paper via plasma enhanced CVD), revealing that these low-tortuosity and high-selection graphene nanowalls act as fast Li ion transport channels. Moreover, we determine that the hitherto neglected parallel layer acts as a protective surface at the interface, enhancing the anode performance. In summary, these findings not only clarify the synergistic role of the parallel functional interface when combined with vertical graphene nanowalls but also have facilitated the development of design principles for future high rate, high performance batteries. PMID:29392179
Parallel simulation of tsunami inundation on a large-scale supercomputer
NASA Astrophysics Data System (ADS)
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
On a model of three-dimensional bursting and its parallel implementation
NASA Astrophysics Data System (ADS)
Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.
2008-04-01
A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.
A fast, parallel algorithm for distant-dependent calculation of crystal properties
NASA Astrophysics Data System (ADS)
Stein, Matthew
2017-12-01
A fast, parallel algorithm for distant-dependent calculation and simulation of crystal properties is presented along with speedup results and methods of application. An illustrative example is used to compute the Lennard-Jones lattice constants up to 32 significant figures for 4 ≤ p ≤ 30 in the simple cubic, face-centered cubic, body-centered cubic, hexagonal-close-pack, and diamond lattices. In most cases, the known precision of these constants is more than doubled, and in some cases, corrected from previously published figures. The tools and strategies to make this computation possible are detailed along with application to other potentials, including those that model defects.
Effects of ATC automation on precision approaches to closely space parallel runways
NASA Technical Reports Server (NTRS)
Slattery, R.; Lee, K.; Sanford, B.
1995-01-01
Improved navigational technology (such as the Microwave Landing System and the Global Positioning System) installed in modern aircraft will enable air traffic controllers to better utilize available airspace. Consequently, arrival traffic can fly approaches to parallel runways separated by smaller distances than are currently allowed. Previous simulation studies of advanced navigation approaches have found that controller workload is increased when there is a combination of aircraft that are capable of following advanced navigation routes and aircraft that are not. Research into Air Traffic Control automation at Ames Research Center has led to the development of the Center-TRACON Automation System (CTAS). The Final Approach Spacing Tool (FAST) is the component of the CTAS used in the TRACON area. The work in this paper examines, via simulation, the effects of FAST used for aircraft landing on closely spaced parallel runways. The simulation contained various combinations of aircraft, equipped and unequipped with advanced navigation systems. A set of simulations was run both manually and with an augmented set of FAST advisories to sequence aircraft, assign runways, and avoid conflicts. The results of the simulations are analyzed, measuring the airport throughput, aircraft delay, loss of separation, and controller workload.
NASA Astrophysics Data System (ADS)
Qiang, Ji
2017-10-01
A three-dimensional (3D) Poisson solver with longitudinal periodic and transverse open boundary conditions can have important applications in beam physics of particle accelerators. In this paper, we present a fast efficient method to solve the Poisson equation using a spectral finite-difference method. This method uses a computational domain that contains the charged particle beam only and has a computational complexity of O(Nu(logNmode)) , where Nu is the total number of unknowns and Nmode is the maximum number of longitudinal or azimuthal modes. This saves both the computational time and the memory usage of using an artificial boundary condition in a large extended computational domain. The new 3D Poisson solver is parallelized using a message passing interface (MPI) on multi-processor computers and shows a reasonable parallel performance up to hundreds of processor cores.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, J.; Alpan, F. A.; Fischer, G.A.
2011-07-01
Traditional two-dimensional (2D)/one-dimensional (1D) SYNTHESIS methodology has been widely used to calculate fast neutron (>1.0 MeV) fluence exposure to reactor pressure vessel in the belt-line region. However, it is expected that this methodology cannot provide accurate fast neutron fluence calculation at elevations far above or below the active core region. A three-dimensional (3D) parallel discrete ordinates calculation for ex-vessel neutron dosimetry on a Westinghouse 4-Loop XL Pressurized Water Reactor has been done. It shows good agreement between the calculated results and measured results. Furthermore, the results show very different fast neutron flux values at some of the former plate locationsmore » and elevations above and below an active core than those calculated by a 2D/1D SYNTHESIS method. This indicates that for certain irregular reactor internal structures, where the fast neutron flux has a very strong local effect, it is required to use a 3D transport method to calculate accurate fast neutron exposure. (authors)« less
Fast simulation of the NICER instrument
NASA Astrophysics Data System (ADS)
Doty, John P.; Wampler-Doty, Matthew P.; Prigozhin, Gregory Y.; Okajima, Takashi; Arzoumanian, Zaven; Gendreau, Keith
2016-07-01
The NICER1 mission uses a complicated physical system to collect information from objects that are, by x-ray timing science standards, rather faint. To get the most out of the data we will need a rigorous understanding of all instrumental effects. We are in the process of constructing a very fast, high fidelity simulator that will help us to assess instrument performance, support simulation-based data reduction, and improve our estimates of measurement error. We will combine and extend existing optics, detector, and electronics simulations. We will employ the Compute Unified Device Architecture (CUDA2) to parallelize these calculations. The price of suitable CUDA-compatible multi-giga op cores is about $0.20/core, so this approach will be very cost-effective.
Wiens, Curtis N; Artz, Nathan S; Jang, Hyungseok; McMillan, Alan B; Reeder, Scott B
2017-06-01
To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. Magn Reson Med 77:2303-2309, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Unexpectedly Fast Phonon-Assisted Exciton Hopping between Carbon Nanotubes
Davoody, A. H.; Karimi, F.; Arnold, M. S.; ...
2017-06-05
Carbon-nanotube (CNT) aggregates are promising light-absorbing materials for photovoltaics. The hopping rate of excitons between CNTs directly affects the efficiency of these devices. We theoretically investigate phonon-assisted exciton hopping, where excitons scatter with phonons into a same-tube transition state, followed by intertube Coulomb scattering into the final state. Second-order hopping between bright excitonic states is as fast as the first-order process (~1 ps). For perpendicular CNTs, the high rate stems from the high density of phononic states; for parallel CNTs, the reason lies in relaxed selection rules. Moreover, second-order exciton transfer between dark and bright states, facilitated by phonons withmore » large angular momentum, has rates comparable to bright-to-bright transfer, so dark excitons provide an additional pathway for energy transfer in CNT composites. Furthermore, as dark excitons are difficult to probe in experiment, predictive theory is critical for understanding exciton dynamics in CNT composites.« less
Unexpectedly Fast Phonon-Assisted Exciton Hopping between Carbon Nanotubes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davoody, A. H.; Karimi, F.; Arnold, M. S.
Carbon-nanotube (CNT) aggregates are promising light-absorbing materials for photovoltaics. The hopping rate of excitons between CNTs directly affects the efficiency of these devices. We theoretically investigate phonon-assisted exciton hopping, where excitons scatter with phonons into a same-tube transition state, followed by intertube Coulomb scattering into the final state. Second-order hopping between bright excitonic states is as fast as the first-order process (~1 ps). For perpendicular CNTs, the high rate stems from the high density of phononic states; for parallel CNTs, the reason lies in relaxed selection rules. Moreover, second-order exciton transfer between dark and bright states, facilitated by phonons withmore » large angular momentum, has rates comparable to bright-to-bright transfer, so dark excitons provide an additional pathway for energy transfer in CNT composites. Furthermore, as dark excitons are difficult to probe in experiment, predictive theory is critical for understanding exciton dynamics in CNT composites.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guo, Ying; Lu, Qingyou, E-mail: qxl@ustc.edu.cn; Hefei National Laboratory for Physical Sciences at Microscale, University of Science and Technology of China, Hefei, Anhui 230026
2014-05-15
We present a completely practical TunaDrive piezo motor. It consists of a central piezo stack sandwiched by two arm piezo stacks and two leg piezo stacks, respectively, which is then sandwiched and spring-clamped by a pair of parallel polished sapphire rods. It works by alternatively fast expanding and contracting the arm/leg stacks while slowly expanding/contracting the central stack simultaneously. The key point is that sufficiently fast expanding and contracting a limb stack can make its two sliding friction forces well cancel, resulting in the total sliding friction force is <10% of the total static friction force, which can help increasemore » output force greatly. The piezo motor's high compactness, precision, and output force make it perfect in building a high-quality harsh-condition (vibration resistant) atomic resolution scanning probe microscope.« less
NASA Astrophysics Data System (ADS)
Goossens, Bart; Aelterman, Jan; Luong, Hi"p.; Pižurica, Aleksandra; Philips, Wilfried
2011-09-01
The shearlet transform is a recent sibling in the family of geometric image representations that provides a traditional multiresolution analysis combined with a multidirectional analysis. In this paper, we present a fast DFT-based analysis and synthesis scheme for the 2D discrete shearlet transform. Our scheme conforms to the continuous shearlet theory to high extent, provides perfect numerical reconstruction (up to floating point rounding errors) in a non-iterative scheme and is highly suitable for parallel implementation (e.g. FPGA, GPU). We show that our discrete shearlet representation is also a tight frame and the redundancy factor of the transform is around 2.6, independent of the number of analysis directions. Experimental denoising results indicate that the transform performs the same or even better than several related multiresolution transforms, while having a significantly lower redundancy factor.
Development of fast cooling pulsed magnets at the Wuhan National High Magnetic Field Center.
Peng, Tao; Sun, Quqin; Zhao, Jianlong; Jiang, Fan; Li, Liang; Xu, Qiang; Herlach, Fritz
2013-12-01
Pulsed magnets with fast cooling channels have been developed at the Wuhan National High Magnetic Field Center. Between the inner and outer sections of a coil wound with a continuous length of CuNb wire, G10 rods with cross section 4 mm × 5 mm were inserted as spacers around the entire circumference, parallel to the coil axis. The free space between adjacent rods is 6 mm. The liquid nitrogen flows freely in the channels between these rods, and in the direction perpendicular to the rods through grooves provided in the rods. For a typical 60 T pulsed magnetic field with pulse duration of 40 ms, the cooling time between subsequent pulses is reduced from 160 min to 35 min. Subsequently, the same technology was applied to a 50 T magnet with 300 ms pulse duration. The cooling time of this magnet was reduced from 480 min to 65 min.
fastBMA: scalable network inference and transitive reduction.
Hung, Ling-Hong; Shi, Kaiyuan; Wu, Migao; Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee
2017-10-01
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/). © The Authors 2017. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Olive, Jean-Arthur; Pearce, Frederick; Rondenay, Stéphane; Behn, Mark D.
2014-04-01
Many subduction zones exhibit significant retrograde motion of their arc and trench. The observation of fast shear-wave velocities parallel to the trench in such settings has been inferred to represent trench-parallel mantle flow beneath a retreating slab. Here, we investigate this process by measuring seismic anisotropy in the shallow Aegean mantle. We carry out shear-wave splitting analysis on a dense array of seismometers across the Western Hellenic Subduction Zone, and find a pronounced zonation of anisotropy at the scale of the subduction zone. Fast SKS splitting directions subparallel to the trench-retreat direction dominate the region nearest to the trench. Fast splitting directions abruptly transition to trench-parallel above the corner of the mantle wedge, and rotate back to trench-normal over the back-arc. We argue that the trench-normal anisotropy near the trench is explained by entrainment of an asthenospheric layer beneath the shallow-dipping portion of the slab. Toward the volcanic arc this signature is overprinted by trench-parallel anisotropy in the mantle wedge, likely caused by a layer of strained serpentine immediately above the slab. Arcward steepening of the slab and horizontal divergence of mantle flow due to rollback may generate an additional component of sub-slab trench-parallel anisotropy in this region. Poloidal flow above the retreating slab is likely the dominant source of back-arc trench-normal anisotropy. We hypothesize that trench-normal anisotropy associated with significant entrainment of the asthenospheric mantle near the trench may be widespread but only observable at shallow-dipping subduction zones where stations nearest the trench do not overlie the mantle wedge.
Fast growth may impair regeneration capacity in the branching coral Acropora muricata.
Denis, Vianney; Guillaume, Mireille M M; Goutx, Madeleine; de Palmas, Stéphane; Debreuil, Julien; Baker, Andrew C; Boonstra, Roxane K; Bruggemann, J Henrich
2013-01-01
Regeneration of artificially induced lesions was monitored in nubbins of the branching coral Acropora muricata at two reef-flat sites representing contrasting environments at Réunion Island (21°07'S, 55°32'E). Growth of these injured nubbins was examined in parallel, and compared to controls. Biochemical compositions of the holobiont and the zooxanthellae density were determined at the onset of the experiment, and the photosynthetic efficiency (Fv/Fm ) of zooxanthellae was monitored during the experiment. Acropora muricata rapidly regenerated small lesions, but regeneration rates significantly differed between sites. At the sheltered site characterized by high temperatures, temperature variations, and irradiance levels, regeneration took 192 days on average. At the exposed site, characterized by steadier temperatures and lower irradiation, nubbins demonstrated fast lesion repair (81 days), slower growth, lower zooxanthellae density, chlorophyll a concentration and lipid content than at the former site. A trade-off between growth and regeneration rates was evident here. High growth rates seem to impair regeneration capacity. We show that environmental conditions conducive to high zooxanthellae densities in corals are related to fast skeletal growth but also to reduced lesion regeneration rates. We hypothesize that a lowered regenerative capacity may be related to limited availability of energetic and cellular resources, consequences of coral holobionts operating at high levels of photosynthesis and associated growth.
Engine-start Control Strategy of P2 Parallel Hybrid Electric Vehicle
NASA Astrophysics Data System (ADS)
Xiangyang, Xu; Siqi, Zhao; Peng, Dong
2017-12-01
A smooth and fast engine-start process is important to parallel hybrid electric vehicles with an electric motor mounted in front of the transmission. However, there are some challenges during the engine-start control. Firstly, the electric motor must simultaneously provide a stable driving torque to ensure the drivability and a compensative torque to drag the engine before ignition. Secondly, engine-start time is a trade-off control objective because both fast start and smooth start have to be considered. To solve these problems, this paper first analyzed the resistance of the engine start process, and established a physic model in MATLAB/Simulink. Then a model-based coordinated control strategy among engine, motor and clutch was developed. Two basic control strategy during fast start and smooth start process were studied. Simulation results showed that the control objectives were realized by applying given control strategies, which can meet different requirement from the driver.
Fast generation of computer-generated hologram by graphics processing unit
NASA Astrophysics Data System (ADS)
Matsuda, Sho; Fujii, Tomohiko; Yamaguchi, Takeshi; Yoshikawa, Hiroshi
2009-02-01
A cylindrical hologram is well known to be viewable in 360 deg. This hologram depends high pixel resolution.Therefore, Computer-Generated Cylindrical Hologram (CGCH) requires huge calculation amount.In our previous research, we used look-up table method for fast calculation with Intel Pentium4 2.8 GHz.It took 480 hours to calculate high resolution CGCH (504,000 x 63,000 pixels and the average number of object points are 27,000).To improve quality of CGCH reconstructed image, fringe pattern requires higher spatial frequency and resolution.Therefore, to increase the calculation speed, we have to change the calculation method. In this paper, to reduce the calculation time of CGCH (912,000 x 108,000 pixels), we employ Graphics Processing Unit (GPU).It took 4,406 hours to calculate high resolution CGCH on Xeon 3.4 GHz.Since GPU has many streaming processors and a parallel processing structure, GPU works as the high performance parallel processor.In addition, GPU gives max performance to 2 dimensional data and streaming data.Recently, GPU can be utilized for the general purpose (GPGPU).For example, NVIDIA's GeForce7 series became a programmable processor with Cg programming language.Next GeForce8 series have CUDA as software development kit made by NVIDIA.Theoretically, calculation ability of GPU is announced as 500 GFLOPS. From the experimental result, we have achieved that 47 times faster calculation compared with our previous work which used CPU.Therefore, CGCH can be generated in 95 hours.So, total time is 110 hours to calculate and print the CGCH.
Efficient Preconditioning for the p-Version Finite Element Method in Two Dimensions
1989-10-01
paper, we study fast parallel preconditioners for systems of equations arising from the p-version finite element method. The p-version finite element...computations and the solution of a relatively small global auxiliary problem. We study two different methods. In the first (Section 3), the global...20], will be studied in the next section. Problem (3.12) is obviously much more easily solved than the original problem ,nd the procedure is highly
Massively parallel support for a case-based planning system
NASA Technical Reports Server (NTRS)
Kettler, Brian P.; Hendler, James A.; Anderson, William A.
1993-01-01
Case-based planning (CBP), a kind of case-based reasoning, is a technique in which previously generated plans (cases) are stored in memory and can be reused to solve similar planning problems in the future. CBP can save considerable time over generative planning, in which a new plan is produced from scratch. CBP thus offers a potential (heuristic) mechanism for handling intractable problems. One drawback of CBP systems has been the need for a highly structured memory to reduce retrieval times. This approach requires significant domain engineering and complex memory indexing schemes to make these planners efficient. In contrast, our CBP system, CaPER, uses a massively parallel frame-based AI language (PARKA) and can do extremely fast retrieval of complex cases from a large, unindexed memory. The ability to do fast, frequent retrievals has many advantages: indexing is unnecessary; very large case bases can be used; memory can be probed in numerous alternate ways; and queries can be made at several levels, allowing more specific retrieval of stored plans that better fit the target problem with less adaptation. In this paper we describe CaPER's case retrieval techniques and some experimental results showing its good performance, even on large case bases.
Combined algorithmic and GPU acceleration for ultra-fast circular conebeam backprojection
NASA Astrophysics Data System (ADS)
Brokish, Jeffrey; Sack, Paul; Bresler, Yoram
2010-04-01
In this paper, we describe the first implementation and performance of a fast O(N3logN) hierarchical backprojection algorithm for cone beam CT with a circular trajectory1,developed on a modern Graphics Processing Unit (GPU). The resulting tomographic backprojection system for 3D cone beam geometry combines speedup through algorithmic improvements provided by the hierarchical backprojection algorithm with speedup from a massively parallel hardware accelerator. For data parameters typical in diagnostic CT and using a mid-range GPU card, we report reconstruction speeds of up to 360 frames per second, and relative speedup of almost 6x compared to conventional backprojection on the same hardware. The significance of these results is twofold. First, they demonstrate that the reduction in operation counts demonstrated previously for the FHBP algorithm can be translated to a comparable run-time improvement in a massively parallel hardware implementation, while preserving stringent diagnostic image quality. Second, the dramatic speedup and throughput numbers achieved indicate the feasibility of systems based on this technology, which achieve real-time 3D reconstruction for state-of-the art diagnostic CT scanners with small footprint, high-reliability, and affordable cost.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Satake, Shin-ichi; Kanamori, Hiroyuki; Kunugi, Tomoaki
2007-02-01
We have developed a parallel algorithm for microdigital-holographic particle-tracking velocimetry. The algorithm is used in (1) numerical reconstruction of a particle image computer using a digital hologram, and (2) searching for particles. The numerical reconstruction from the digital hologram makes use of the Fresnel diffraction equation and the FFT (fast Fourier transform),whereas the particle search algorithm looks for local maximum graduation in a reconstruction field represented by a 3D matrix. To achieve high performance computing for both calculations (reconstruction and particle search), two memory partitions are allocated to the 3D matrix. In this matrix, the reconstruction part consists of horizontallymore » placed 2D memory partitions on the x-y plane for the FFT, whereas, the particle search part consists of vertically placed 2D memory partitions set along the z axes.Consequently, the scalability can be obtained for the proportion of processor elements,where the benchmarks are carried out for parallel computation by a SGI Altix machine.« less
O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...
1995-01-01
Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less
Particle simulation of plasmas on the massively parallel processor
NASA Technical Reports Server (NTRS)
Gledhill, I. M. A.; Storey, L. R. O.
1987-01-01
Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
Parallel Navier-Stokes computations on shared and distributed memory architectures
NASA Technical Reports Server (NTRS)
Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar
1995-01-01
We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
New generation of Cherenkov counters
NASA Astrophysics Data System (ADS)
Giomataris, Y.; Charpak, G.; Peskov, V.; Sauli, F.
1992-12-01
Experimental results with a parallel plate avalanche chamber (PPAC) having a CsI photocathode and pad array readout are reported. High gains in excess of 10 5 have been obtained with He gas at atmospheric pressure and traces of CH 4 or CF 4 quencher. Such light gas mixtures extend the transparency for the Cherenkov light to the extreme UV region and allow detector operation with very low sensitivity to the ionization produced by minimum ionizing particles. A hadron blind detector (HBD) is discussed which exploits the broad photon energy bandwidth (≈ 10 eV) and the high Cherenkov threshold ( pπ = 15 GeV). This fast detector, since it has a good spatial resolution, can be used at the future Large Hadron Collider (LHC) or the Superconductivity Super Collider (SSC) either as an efficient electron tagger, rejecting hadrons faking electrons in the calorimeter, or as a pretracker giving fast electron and high-energy muon signature and momentum estimation. Other potential applications in the domain of Cherenkov light detection are also discussed.
Fibrillar Collagen Organization Associated with Broiler Wooden Breast Fibrotic Myopathy.
Velleman, Sandra G; Clark, Daniel L; Tonniges, Jeffrey R
2017-12-01
Wooden breast (WB) is a fibrotic myopathy affecting the pectoralis major (p. major) muscle in fast-growing commercial broiler lines. Birds with WB are phenotypically detected by the palpation of a hard p. major muscle. A primary feature of WB is the fibrosis of muscle with the replacement of muscle fibers with extracellular matrix proteins, such as collagen. The ability of a tissue to be pliable and stretch is associated with the organization of collagen fibrils in the connective tissue areas surrounding muscle fiber bundles (perimysium) and around individual muscle fibers (endomysium). The objective of this study was to compare the structure and organization of fibrillar collagen by using transmission electron microscopy in two fast-growing broiler lines (Lines A and B) with incidence of WB to a slower growing broiler Line C with no phenotypically detectable WB. In Line A, the collagen fibrils were tightly packed in a parallel organization, whereas in Line B, the collagen fibrils were randomly aligned. Tightly packed collagen fibrils arranged in parallel are associated with nonpliable collagen that is highly cross-linked. This will lead to a phenotypically hard p. major muscle. In Line C, the fibrillar collagen was sparse in its distribution. Furthermore, the average collagen fibril diameter and banding D-period length were altered in Line A p. major muscles affected with WB. Taken together, these data are suggestive of different fibrotic myopathies beyond just what is classified as WB in fast-growing broiler lines.
Comparison of the different approaches to generate holograms from data acquired with a Kinect sensor
NASA Astrophysics Data System (ADS)
Kang, Ji-Hoon; Leportier, Thibault; Ju, Byeong-Kwon; Song, Jin Dong; Lee, Kwang-Hoon; Park, Min-Chul
2017-05-01
Data of real scenes acquired in real-time with a Kinect sensor can be processed with different approaches to generate a hologram. 3D models can be generated from a point cloud or a mesh representation. The advantage of the point cloud approach is that computation process is well established since it involves only diffraction and propagation of point sources between parallel planes. On the other hand, the mesh representation enables to reduce the number of elements necessary to represent the object. Then, even though the computation time for the contribution of a single element increases compared to a simple point, the total computation time can be reduced significantly. However, the algorithm is more complex since propagation of elemental polygons between non-parallel planes should be implemented. Finally, since a depth map of the scene is acquired at the same time than the intensity image, a depth layer approach can also be adopted. This technique is appropriate for a fast computation since propagation of an optical wavefront from one plane to another can be handled efficiently with the fast Fourier transform. Fast computation with depth layer approach is convenient for real time applications, but point cloud method is more appropriate when high resolution is needed. In this study, since Kinect can be used to obtain both point cloud and depth map, we examine the different approaches that can be adopted for hologram computation and compare their performance.
Fast, Parallel and Secure Cryptography Algorithm Using Lorenz's Attractor
NASA Astrophysics Data System (ADS)
Marco, Anderson Gonçalves; Martinez, Alexandre Souto; Bruno, Odemir Martinez
A novel cryptography method based on the Lorenz's attractor chaotic system is presented. The proposed algorithm is secure and fast, making it practical for general use. We introduce the chaotic operation mode, which provides an interaction among the password, message and a chaotic system. It ensures that the algorithm yields a secure codification, even if the nature of the chaotic system is known. The algorithm has been implemented in two versions: one sequential and slow and the other, parallel and fast. Our algorithm assures the integrity of the ciphertext (we know if it has been altered, which is not assured by traditional algorithms) and consequently its authenticity. Numerical experiments are presented, discussed and show the behavior of the method in terms of security and performance. The fast version of the algorithm has a performance comparable to AES, a popular cryptography program used commercially nowadays, but it is more secure, which makes it immediately suitable for general purpose cryptography applications. An internet page has been set up, which enables the readers to test the algorithm and also to try to break into the cipher.
Components of action potential repolarization in cerebellar parallel fibres.
Pekala, Dobromila; Baginskas, Armantas; Szkudlarek, Hanna J; Raastad, Morten
2014-11-15
Repolarization of the presynaptic action potential is essential for transmitter release, excitability and energy expenditure. Little is known about repolarization in thin, unmyelinated axons forming en passant synapses, which represent the most common type of axons in the mammalian brain's grey matter.We used rat cerebellar parallel fibres, an example of typical grey matter axons, to investigate the effects of K(+) channel blockers on repolarization. We show that repolarization is composed of a fast tetraethylammonium (TEA)-sensitive component, determining the width and amplitude of the spike, and a slow margatoxin (MgTX)-sensitive depolarized after-potential (DAP). These two components could be recorded at the granule cell soma as antidromic action potentials and from the axons with a newly developed miniaturized grease-gap method. A considerable proportion of fast repolarization remained in the presence of TEA, MgTX, or both. This residual was abolished by the addition of quinine. The importance of proper control of fast repolarization was demonstrated by somatic recordings of antidromic action potentials. In these experiments, the relatively broad K(+) channel blocker 4-aminopyridine reduced the fast repolarization, resulting in bursts of action potentials forming on top of the DAP. We conclude that repolarization of the action potential in parallel fibres is supported by at least three groups of K(+) channels. Differences in their temporal profiles allow relatively independent control of the spike and the DAP, whereas overlap of their temporal profiles provides robust control of axonal bursting properties.
Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data.
Li, Wenyuan; Gong, Ke; Li, Qingjiao; Alber, Frank; Zhou, Xianghong Jasmine
2015-03-15
Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin contact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a great challenge on the memory usage and speed of its normalization. Therefore, there is an urgent need for fast and memory-efficient methods for normalization of Hi-C data. We developed Hi-Corrector, an easy-to-use, open source implementation of the Hi-C data normalization algorithm. Its salient features are (i) scalability-the software is capable of normalizing Hi-C data of any size in reasonable times; (ii) memory efficiency-the sequential version can run on any single computer with very limited memory, no matter how little; (iii) fast speed-the parallel version can run very fast on multiple computing nodes with limited local memory. The sequential version is implemented in ANSI C and can be easily compiled on any system; the parallel version is implemented in ANSI C with the MPI library (a standardized and portable parallel environment designed for solving large-scale scientific problems). The package is freely available at http://zhoulab.usc.edu/Hi-Corrector/. © The Author 2014. Published by Oxford University Press.
Massively Parallel Processing for Fast and Accurate Stamping Simulations
NASA Astrophysics Data System (ADS)
Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu
2005-08-01
The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Ordered fast fourier transforms on a massively parallel hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Tong, Charles; Swarztrauber, Paul N.
1989-01-01
Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.
Design of Belief Propagation Based on FPGA for the Multistereo CAFADIS Camera
Magdaleno, Eduardo; Lüke, Jonás Philipp; Rodríguez, Manuel; Rodríguez-Ramos, José Manuel
2010-01-01
In this paper we describe a fast, specialized hardware implementation of the belief propagation algorithm for the CAFADIS camera, a new plenoptic sensor patented by the University of La Laguna. This camera captures the lightfield of the scene and can be used to find out at which depth each pixel is in focus. The algorithm has been designed for FPGA devices using VHDL. We propose a parallel and pipeline architecture to implement the algorithm without external memory. Although the BRAM resources of the device increase considerably, we can maintain real-time restrictions by using extremely high-performance signal processing capability through parallelism and by accessing several memories simultaneously. The quantifying results with 16 bit precision have shown that performances are really close to the original Matlab programmed algorithm. PMID:22163404
Design of belief propagation based on FPGA for the multistereo CAFADIS camera.
Magdaleno, Eduardo; Lüke, Jonás Philipp; Rodríguez, Manuel; Rodríguez-Ramos, José Manuel
2010-01-01
In this paper we describe a fast, specialized hardware implementation of the belief propagation algorithm for the CAFADIS camera, a new plenoptic sensor patented by the University of La Laguna. This camera captures the lightfield of the scene and can be used to find out at which depth each pixel is in focus. The algorithm has been designed for FPGA devices using VHDL. We propose a parallel and pipeline architecture to implement the algorithm without external memory. Although the BRAM resources of the device increase considerably, we can maintain real-time restrictions by using extremely high-performance signal processing capability through parallelism and by accessing several memories simultaneously. The quantifying results with 16 bit precision have shown that performances are really close to the original Matlab programmed algorithm.
Turbomachinery CFD on parallel computers
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.
1992-01-01
The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maneva, Y. G.; Poedts, Stefaan; Viñas, Adolfo F.
2015-11-20
We perform 2.5D hybrid simulations with massless fluid electrons and kinetic particle-in-cell ions to study the temporal evolution of ion temperatures, temperature anisotropies, and velocity distribution functions in relation to the dissipation and turbulent evolution of a broadband spectrum of parallel and obliquely propagating Alfvén-cyclotron waves. The purpose of this paper is to study the relative role of parallel versus oblique Alfvén-cyclotron waves in the observed heating and acceleration of alpha particles in the fast solar wind. We consider collisionless homogeneous multi-species plasma, consisting of isothermal electrons, isotropic protons, and a minor component of drifting α particles in a finite-βmore » fast stream near the Earth. The kinetic ions are modeled by initially isotropic Maxwellian velocity distribution functions, which develop nonthermal features and temperature anisotropies when a broadband spectrum of low-frequency nonresonant, ω ≤ 0.34 Ω{sub p}, Alfvén-cyclotron waves is imposed at the beginning of the simulations. The initial plasma parameter values, such as ion density, temperatures, and relative drift speeds, are supplied by fast solar wind observations made by the Wind spacecraft at 1 AU. The imposed broadband wave spectra are left-hand polarized and resemble Wind measurements of Alfvénic turbulence in the solar wind. The imposed magnetic field fluctuations for all cases are within the inertial range of the solar wind turbulence and have a Kraichnan-type spectral slope α = −3/2. We vary the propagation angle from θ = 0° to θ = 30° and θ = 60°, and find that the heating of alpha particles is most efficient for the highly oblique waves propagating at 60°, whereas the protons exhibit perpendicular cooling at all propagation angles.« less
Density-based parallel skin lesion border detection with webCL
2015-01-01
Background Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Methods Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Results Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. Conclusions When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser. PMID:26423836
Density-based parallel skin lesion border detection with webCL.
Lemon, James; Kockara, Sinan; Halic, Tansel; Mete, Mutlu
2015-01-01
Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser.
Seismic Anisotropy Beneath the Eastern Flank of the Rio Grande Rift
NASA Astrophysics Data System (ADS)
Benton, N. W.; Pulliam, J.
2015-12-01
Shear wave splitting was measured across the eastern flank of the Rio Grande Rift (RGR) to investigate mechanisms of upper mantle anisotropy. Earthquakes recorded at epicentral distances of 90°-130° from EarthScope Transportable Array (TA) and SIEDCAR (SC) broadband seismic stations were examined comprehensively, via the Matlab program "Splitlab", to determine whether SKS and SKKS phases indicated anisotropic properties. Splitlab allows waveforms to be rotated, filtered, and windowed interactively and splitting measurements are made on a user-specified waveform segment via three independent methods simultaneously. To improve signal-to-noise and improve reliability, we stacked the error surfaces that resulted from grid searches in the measurements for each station location. Fast polarization directions near the Rio Grande Rift tend to be sub-parallel to the RGR but then change to angles that are consistent with North America's average plate motion, to the east. The surface erosional depression of the Pecos Valley coincides with fast polarization directions that are aligned in a more northerly direction than their neighbors, whereas the topographic high to the east coincides with an easterly change of the fast axis.The area above a mantle high velocity anomaly discovered separately via seismic tomography which may indicate thickened lithosphere, corresponds to unusually large delay times and fast polarization directions that are more closely aligned to a north-south orientation. The area of southeastern New Mexico that falls between the mantle fast anomaly and the Great Plains craton displays dramatically smaller delay times, as well as changes in fast axis directions toward the northeast. Changes in fast axis directions may indicate flow around the mantle anomaly; small delay times could indicate vertical or attenuated flow.
NASA Astrophysics Data System (ADS)
Schultz, A.
2010-12-01
3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
AHPCRC (Army High Performance Computing Research Center) Bulletin. Volume 2, Issue 2, 2011
2011-01-01
fixed (i.e., no flapping). The simulation was performed at sea level conditions with a pressure of 101 kPa and a density of 1.23 kg/m3. The air speed...Hardening Behavior in Au Nanopillar Microplasticity . IJMCE 5 (3&4) 287–294. (2007) 5. S. J. Plimpton. Fast Parallel Algorithms for Short- Range Molecular...such as crude oil underwa- ter. Scattering is also used for sea floor mapping. For example, communications companies laying underwa- ter fiber optic
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harrison, Robert J.; Beylkin, Gregory; Bischoff, Florian A.
2016-01-01
MADNESS (multiresolution adaptive numerical environment for scientific simulation) is a high-level software environment for solving integral and differential equations in many dimensions that uses adaptive and fast harmonic analysis methods with guaranteed precision based on multiresolution analysis and separated representations. Underpinning the numerical capabilities is a powerful petascale parallel programming environment that aims to increase both programmer productivity and code scalability. This paper describes the features and capabilities of MADNESS and briefly discusses some current applications in chemistry and several areas of physics.
Density-matrix-based algorithm for solving eigenvalue problems
NASA Astrophysics Data System (ADS)
Polizzi, Eric
2009-03-01
A fast and stable numerical algorithm for solving the symmetric eigenvalue problem is presented. The technique deviates fundamentally from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms) or other Davidson-Jacobi techniques and takes its inspiration from the contour integration and density-matrix representation in quantum mechanics. It will be shown that this algorithm—named FEAST—exhibits high efficiency, robustness, accuracy, and scalability on parallel architectures. Examples from electronic structure calculations of carbon nanotubes are presented, and numerical performances and capabilities are discussed.
Notes on implementation of sparsely distributed memory
NASA Technical Reports Server (NTRS)
Keeler, J. D.; Denning, P. J.
1986-01-01
The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations.
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Parallel peak pruning for scalable SMP contour tree computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.
As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
FastQuery: A Parallel Indexing System for Scientific Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chou, Jerry; Wu, Kesheng; Prabhat,
2011-07-29
Modern scientific datasets present numerous data management and analysis challenges. State-of-the- art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also developmore » a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores.« less
Plana-Ruiz, S; Portillo, J; Estradé, S; Peiró, F; Kolb, Ute; Nicolopoulos, S
2018-06-06
A general method to set illuminating conditions for selectable beam convergence and probe size is presented in this work for Transmission Electron Microscopes (TEM) fitted with µs/pixel fast beam scanning control, (S)TEM, and an annular dark field detector. The case of interest of beam convergence and probe size, which enables diffraction pattern indexation, is then used as a starting point in this work to add 100 Hz precession to the beam while imaging the specimen at a fast rate and keeping the projector system in diffraction mode. The described systematic alignment method for the adjustment of beam precession on the specimen plane while scanning at fast rates is mainly based on the sharpness of the precessed STEM image. The complete alignment method for parallel condition and precession, Quasi-Parallel PED-STEM, is presented in block diagram scheme, as it has been tested on a variety of instruments. The immediate application of this methodology is that it renders the TEM column ready for the acquisition of Precessed Electron Diffraction Tomographies (EDT) as well as for the acquisition of slow Precessed Scanning Nanometer Electron Diffraction (SNED). Examples of the quality of the Precessed Electron Diffraction (PED) patterns and PED-STEM alignment images are presented with corresponding probe sizes and convergence angles. Copyright © 2018. Published by Elsevier B.V.
Anisotropic Rayleigh-wave phase velocities beneath northern Vietnam
NASA Astrophysics Data System (ADS)
Legendre, Cédric P.; Zhao, Li; Huang, Win-Gee; Huang, Bor-Shouh
2015-02-01
We explore the Rayleigh-wave phase-velocity structure beneath northern Vietnam over a broad period range of 5 to 250 s. We use the two-stations technique to derive the dispersion curves from the waveforms of 798 teleseismic events recoded by a set of 23 broadband seismic stations deployed in northern Vietnam. These dispersion curves are then inverted for both isotropic and azimuthally anisotropic Rayleigh-wave phase-velocity maps in the frequency range of 10 to 50 s. Main findings include a crustal expression of the Red River Shear Zone and the Song Ma Fault. Northern Vietnam displays a northeast/southwest dichotomy in the lithosphere with fast velocities beneath the South China Block and slow velocities beneath the Simao Block and between the Red River Fault and the Song Da Fault. The anisotropy in the region is relatively simple, with a high amplitude and fast directions parallel to the Red River Shear Zone in the western part. In the eastern part, the amplitudes are generally smaller and the fast axis displays more variations with periods.
Fast-particle energy loss to a quasi-one dimensional electron gas
NASA Astrophysics Data System (ADS)
Kushwaha, Manvir S.; Zielinski, P.
2000-03-01
A theoretical investigation has been made of the fast-particle energy-loss to a quasi-one-dimensional electron gas (Q1DEG) within the framework of the random-phase-approximation (RPA). For this purpose, we use an exact analytical expression for the inverse dielectric function, which knows no bound as regards the subband occupancy, and the parabolic potential well to characterize the lateral confinement. Three geometries are considered: the fast-particle moving parallel to, being specularly reflected from, and shooting through the Q1DEG. The illustrative numerical examples in all the three geometries lead us to infer that the dominant contribution to the loss peaks comes from the intra- and inter-subband collective excitations.^1 We argue that the high resolution electron energy loss spectroscopy (HREELS) could prove to be a potential alternative of the existing optical (Raman or FIR) spectroscopies.^2 ^1 M.S. Kushwaha and P. Zielinski, Solid State Commun. 112, 605(1999). ^2 M.S. Kushwaha and P. Zielinski, Unpublished.
Complicated seismic anisotropy beneath south-central Mongolia and its geodynamic implications
NASA Astrophysics Data System (ADS)
Qiang, Zhengyang; Wu, Qingju; Li, Yonghua; Gao, Mengtan; Demberel, Sodnomsambuu; Ulzibat, Munkhuu; Sukhbaatar, Usnikh; Flesch, Lucy M.
2017-05-01
Two years of high-quality broadband seismic data from 69 temporary stations deployed in south-central Mongolia provide an opportunity to study the anisotropy-forming mechanisms in this area. The majority of shear wave splitting observations determined from the analysis of teleseismic SKS phase are characterized by NW-SE trending fast directions with large splitting delay times (greater than 2.0 s at six stations), which is inferred to be generated by active asthenospheric flow. The variation of the fast direction may be associated with deflection of asthenosphere around the deep Siberian cratonic keel at the base of the lithosphere. Several of the NE-SW trending fast directions with relatively small delay times observed in the Gobi Desert are parallel to the strike of the main faults and sutures, which may represent lithospheric deformation. In addition, it is inferred that small-scale hot mantle upwelling is responsible for generating a cluster of null measurements observed on the south of the Hentiy Mountain.
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-07-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
Smart photodetector arrays for error control in page-oriented optical memory
NASA Astrophysics Data System (ADS)
Schaffer, Maureen Elizabeth
1998-12-01
Page-oriented optical memories (POMs) have been proposed to meet high speed, high capacity storage requirements for input/output intensive computer applications. This technology offers the capability for storage and retrieval of optical data in two-dimensional pages resulting in high throughput data rates. Since currently measured raw bit error rates for these systems fall several orders of magnitude short of industry requirements for binary data storage, powerful error control codes must be adopted. These codes must be designed to take advantage of the two-dimensional memory output. In addition, POMs require an optoelectronic interface to transfer the optical data pages to one or more electronic host systems. Conventional charge coupled device (CCD) arrays can receive optical data in parallel, but the relatively slow serial electronic output of these devices creates a system bottleneck thereby eliminating the POM advantage of high transfer rates. Also, CCD arrays are "unintelligent" interfaces in that they offer little data processing capabilities. The optical data page can be received by two-dimensional arrays of "smart" photo-detector elements that replace conventional CCD arrays. These smart photodetector arrays (SPAs) can perform fast parallel data decoding and error control, thereby providing an efficient optoelectronic interface between the memory and the electronic computer. This approach optimizes the computer memory system by combining the massive parallelism and high speed of optics with the diverse functionality, low cost, and local interconnection efficiency of electronics. In this dissertation we examine the design of smart photodetector arrays for use as the optoelectronic interface for page-oriented optical memory. We review options and technologies for SPA fabrication, develop SPA requirements, and determine SPA scalability constraints with respect to pixel complexity, electrical power dissipation, and optical power limits. Next, we examine data modulation and error correction coding for the purpose of error control in the POM system. These techniques are adapted, where possible, for 2D data and evaluated as to their suitability for a SPA implementation in terms of BER, code rate, decoder time and pixel complexity. Our analysis shows that differential data modulation combined with relatively simple block codes known as array codes provide a powerful means to achieve the desired data transfer rates while reducing error rates to industry requirements. Finally, we demonstrate the first smart photodetector array designed to perform parallel error correction on an entire page of data and satisfy the sustained data rates of page-oriented optical memories. Our implementation integrates a monolithic PN photodiode array and differential input receiver for optoelectronic signal conversion with a cluster error correction code using 0.35-mum CMOS. This approach provides high sensitivity, low electrical power dissipation, and fast parallel correction of 2 x 2-bit cluster errors in an 8 x 8 bit code block to achieve corrected output data rates scalable to 102 Gbps in the current technology increasing to 1.88 Tbps in 0.1-mum CMOS.
FleCSPH - a parallel and distributed SPH implementation based on the FleCSI framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Junghans, Christoph; Loiseau, Julien
2017-06-20
FleCSPH is a multi-physics compact application that exercises FleCSI parallel data structures for tree-based particle methods. In particular, FleCSPH implements a smoothed-particle hydrodynamics (SPH) solver for the solution of Lagrangian problems in astrophysics and cosmology. FleCSPH includes support for gravitational forces using the fast multipole method (FMM).
Solar Wind Proton Temperature Anisotropy: Linear Theory and WIND/SWE Observations
NASA Technical Reports Server (NTRS)
Hellinger, P.; Travnicek, P.; Kasper, J. C.; Lazarus, A. J.
2006-01-01
We present a comparison between WIND/SWE observations (Kasper et al., 2006) of beta parallel to p and T perpendicular to p/T parallel to p (where beta parallel to p is the proton parallel beta and T perpendicular to p and T parallel to p are the perpendicular and parallel proton are the perpendicular and parallel proton temperatures, respectively; here parallel and perpendicular indicate directions with respect to the ambient magnetic field) and predictions of the Vlasov linear theory. In the slow solar wind, the observed proton temperature anisotropy seems to be constrained by oblique instabilities, by the mirror one and the oblique fire hose, contrary to the results of the linear theory which predicts a dominance of the proton cyclotron instability and the parallel fire hose. The fast solar wind core protons exhibit an anticorrelation between beta parallel to c and T perpendicular to c/T parallel to c (where beta parallel to c is the core proton parallel beta and T perpendicular to c and T parallel to c are the perpendicular and parallel core proton temperatures, respectively) similar to that observed in the HELIOS data (Marsch et al., 2004).
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TRIANGULATED SURFACES*
Fu, Zhisong; Jeong, Won-Ki; Pan, Yongsheng; Kirby, Robert M.; Whitaker, Ross T.
2012-01-01
This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton–Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512–2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers. PMID:22641200
A hardware fast tracker for the ATLAS trigger
NASA Astrophysics Data System (ADS)
Asbah, Nedaa
2016-09-01
The trigger system of the ATLAS experiment is designed to reduce the event rate from the LHC nominal bunch crossing at 40 MHz to about 1 kHz, at the design luminosity of 1034 cm-2 s-1. After a successful period of data taking from 2010 to early 2013, the LHC already started with much higher instantaneous luminosity. This will increase the load on High Level Trigger system, the second stage of the selection based on software algorithms. More sophisticated algorithms will be needed to achieve higher background rejection while maintaining good efficiency for interesting physics signals. The Fast TracKer (FTK) is part of the ATLAS trigger upgrade project. It is a hardware processor that will provide, at every Level-1 accepted event (100 kHz) and within 100 microseconds, full tracking information for tracks with momentum as low as 1 GeV. Providing fast, extensive access to tracking information, with resolution comparable to the offline reconstruction, FTK will help in precise detection of the primary and secondary vertices to ensure robust selections and improve the trigger performance. FTK exploits hardware technologies with massive parallelism, combining Associative Memory ASICs, FPGAs and high-speed communication links.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsunoda, Hirokazu; Sato, Osamu; Okajima, Shigeaki
2002-07-01
In order to achieve fully automated reactor operation of RAPID-L reactor, innovative reactivity control systems LEM, LIM, and LRM are equipped with lithium-6 as a liquid poison. Because lithium-6 has not been used as a neutron absorbing material of conventional fast reactors, measurements of the reactivity worth of Lithium-6 were performed at the Fast Critical Assembly (FCA) of Japan Atomic Energy Research Institute (JAERI). The FCA core was composed of highly enriched uranium and stainless steel samples so as to simulate the core spectrum of RAPID-L. The samples of 95% enriched lithium-6 were inserted into the core parallel to themore » core axis for the measurement of the reactivity worth at each position. It was found that the measured reactivity worth in the core region well agreed with calculated value by the method for the core designs of RAPID-L. Bias factors for the core design method were obtained by comparing between experimental and calculated results. The factors were used to determine the number of LEM and LIM equipped in the core to achieve fully automated operation of RAPID-L. (authors)« less
Multiplexed Oversampling Digitizer in 65 nm CMOS for Column-Parallel CCD Readout
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grace, Carl; Walder, Jean-Pierre; von der Lippe, Henrik
2012-04-10
A digitizer designed to read out column-parallel charge-coupled devices (CCDs) used for high-speed X-ray imaging is presented. The digitizer is included as part of the High-Speed Image Preprocessor with Oversampling (HIPPO) integrated circuit. The digitizer module comprises a multiplexed, oversampling, 12-bit, 80 MS/s pipelined Analog-to-Digital Converter (ADC) and a bank of four fast-settling sample-and-hold amplifiers to instrument four analog channels. The ADC multiplexes and oversamples to reduce its area to allow integration that is pitch-matched to the columns of the CCD. Novel design techniques are used to enable oversampling and multiplexing with a reduced power penalty. The ADC exhibits 188more » ?V-rms noise which is less than 1 LSB at a 12-bit level. The prototype is implemented in a commercially available 65 nm CMOS process. The digitizer will lead to a proof-of-principle 2D 10 Gigapixel/s X-ray detector.« less
An Automated Parallel Image Registration Technique Based on the Correlation of Wavelet Features
NASA Technical Reports Server (NTRS)
LeMoigne, Jacqueline; Campbell, William J.; Cromp, Robert F.; Zukor, Dorothy (Technical Monitor)
2001-01-01
With the increasing importance of multiple platform/multiple remote sensing missions, fast and automatic integration of digital data from disparate sources has become critical to the success of these endeavors. Our work utilizes maxima of wavelet coefficients to form the basic features of a correlation-based automatic registration algorithm. Our wavelet-based registration algorithm is tested successfully with data from the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) and the Landsat/Thematic Mapper(TM), which differ by translation and/or rotation. By the choice of high-frequency wavelet features, this method is similar to an edge-based correlation method, but by exploiting the multi-resolution nature of a wavelet decomposition, our method achieves higher computational speeds for comparable accuracies. This algorithm has been implemented on a Single Instruction Multiple Data (SIMD) massively parallel computer, the MasPar MP-2, as well as on the CrayT3D, the Cray T3E and a Beowulf cluster of Pentium workstations.
NASA Astrophysics Data System (ADS)
Du, Xiaoping; Wang, Yang; Liu, Hao
2018-04-01
The space object in highly elliptical orbit is always presented as an image point on the ground-based imaging equipment so that it is difficult to resolve and identify the shape and attitude directly. In this paper a novel algorithm is presented for the estimation of spacecraft shape. The apparent magnitude model suitable for the inversion of object information such as shape and attitude is established based on the analysis of photometric characteristics. A parallel adaptive shape inversion algorithm based on UKF was designed after the achievement of dynamic equation of the nonlinear, Gaussian system involved with the influence of various dragging forces. The result of a simulation study demonstrate the viability and robustness of the new filter and its fast convergence rate. It realizes the inversion of combination shape with high accuracy, especially for the bus of cube and cylinder. Even though with sparse photometric data, it still can maintain a higher success rate of inversion.
UNFOLD-SENSE: a parallel MRI method with self-calibration and artifact suppression.
Madore, Bruno
2004-08-01
This work aims at improving the performance of parallel imaging by using it with our "unaliasing by Fourier-encoding the overlaps in the temporal dimension" (UNFOLD) temporal strategy. A self-calibration method called "self, hybrid referencing with UNFOLD and GRAPPA" (SHRUG) is presented. SHRUG combines the UNFOLD-based sensitivity mapping strategy introduced in the TSENSE method by Kellman et al. (5), with the strategy introduced in the GRAPPA method by Griswold et al. (10). SHRUG merges the two approaches to alleviate their respective limitations, and provides fast self-calibration at any given acceleration factor. UNFOLD-SENSE further includes an UNFOLD artifact suppression scheme to significantly suppress artifacts and amplified noise produced by parallel imaging. This suppression scheme, which was published previously (4), is related to another method that was presented independently as part of TSENSE. While the two are equivalent at accelerations < or = 2.0, the present approach is shown here to be significantly superior at accelerations > 2.0, with up to double the artifact suppression at high accelerations. Furthermore, a slight modification of Cartesian SENSE is introduced, which allows departures from purely Cartesian sampling grids. This technique, termed variable-density SENSE (vdSENSE), allows the variable-density data required by SHRUG to be reconstructed with the simplicity and fast processing of Cartesian SENSE. UNFOLD-SENSE is given by the combination of SHRUG for sensitivity mapping, vdSENSE for reconstruction, and UNFOLD for artifact/amplified noise suppression. The method was implemented, with online reconstruction, on both an SSFP and a myocardium-perfusion sequence. The results from six patients scanned with UNFOLD-SENSE are presented.
NASA Astrophysics Data System (ADS)
Breuillard, H.; Matteini, L.; Argall, M. R.; Sahraoui, F.; Andriopoulou, M.; Le Contel, O.; Retinò, A.; Mirioni, L.; Huang, S. Y.; Gershman, D. J.; Ergun, R. E.; Wilder, F. D.; Goodrich, K. A.; Ahmadi, N.; Yordanova, E.; Vaivads, A.; Turner, D. L.; Khotyaintsev, Yu. V.; Graham, D. B.; Lindqvist, P.-A.; Chasapis, A.; Burch, J. L.; Torbert, R. B.; Russell, C. T.; Magnes, W.; Strangeway, R. J.; Plaschke, F.; Moore, T. E.; Giles, B. L.; Paterson, W. R.; Pollock, C. J.; Lavraud, B.; Fuselier, S. A.; Cohen, I. J.
2018-06-01
The Earth’s magnetosheath, which is characterized by highly turbulent fluctuations, is usually divided into two regions of different properties as a function of the angle between the interplanetary magnetic field and the shock normal. In this study, we make use of high-time resolution instruments on board the Magnetospheric MultiScale spacecraft to determine and compare the properties of subsolar magnetosheath turbulence in both regions, i.e., downstream of the quasi-parallel and quasi-perpendicular bow shocks. In particular, we take advantage of the unprecedented temporal resolution of the Fast Plasma Investigation instrument to show the density fluctuations down to sub-ion scales for the first time. We show that the nature of turbulence is highly compressible down to electron scales, particularly in the quasi-parallel magnetosheath. In this region, the magnetic turbulence also shows an inertial (Kolmogorov-like) range, indicating that the fluctuations are not formed locally, in contrast with the quasi-perpendicular magnetosheath. We also show that the electromagnetic turbulence is dominated by electric fluctuations at sub-ion scales (f > 1 Hz) and that magnetic and electric spectra steepen at the largest-electron scale. The latter indicates a change in the nature of turbulence at electron scales. Finally, we show that the electric fluctuations around the electron gyrofrequency are mostly parallel in the quasi-perpendicular magnetosheath, where intense whistlers are observed. This result suggests that energy dissipation, plasma heating, and acceleration might be driven by intense electrostatic parallel structures/waves, which can be linked to whistler waves.
Fast Growth May Impair Regeneration Capacity in the Branching Coral Acropora muricata
Denis, Vianney; Guillaume, Mireille M. M.; Goutx, Madeleine; de Palmas, Stéphane; Debreuil, Julien; Baker, Andrew C.; Boonstra, Roxane K.; Bruggemann, J. Henrich
2013-01-01
Regeneration of artificially induced lesions was monitored in nubbins of the branching coral Acropora muricata at two reef-flat sites representing contrasting environments at Réunion Island (21°07′S, 55°32′E). Growth of these injured nubbins was examined in parallel, and compared to controls. Biochemical compositions of the holobiont and the zooxanthellae density were determined at the onset of the experiment, and the photosynthetic efficiency (Fv/Fm) of zooxanthellae was monitored during the experiment. Acropora muricata rapidly regenerated small lesions, but regeneration rates significantly differed between sites. At the sheltered site characterized by high temperatures, temperature variations, and irradiance levels, regeneration took 192 days on average. At the exposed site, characterized by steadier temperatures and lower irradiation, nubbins demonstrated fast lesion repair (81 days), slower growth, lower zooxanthellae density, chlorophyll a concentration and lipid content than at the former site. A trade-off between growth and regeneration rates was evident here. High growth rates seem to impair regeneration capacity. We show that environmental conditions conducive to high zooxanthellae densities in corals are related to fast skeletal growth but also to reduced lesion regeneration rates. We hypothesize that a lowered regenerative capacity may be related to limited availability of energetic and cellular resources, consequences of coral holobionts operating at high levels of photosynthesis and associated growth. PMID:24023627
High-throughput sequence alignment using Graphics Processing Units
Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh
2007-01-01
Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing
NASA Astrophysics Data System (ADS)
Duan, Ling-Yu; Sun, Wei; Zhang, Xinfeng; Wang, Shiqi; Chen, Jie; Yin, Jianxiong; See, Simon; Huang, Tiejun; Kot, Alex C.; Gao, Wen
2018-05-01
The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group (MPEG) has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of GPU. We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation and the memory access are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU to resolve the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which has harmoniously leveraged the advantages of GPU platforms, and yielded significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.
Bin-Hash Indexing: A Parallel Method for Fast Query Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng
2008-06-27
This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not bemore » resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.« less
Kawai, Makoto; Hara, Yukichi
2006-02-01
Western blotting of aminopeptidase N (APN) detects a high-molecular-mass isoform (260 kDa) [M. Kawai, Y. Otake, Y. Hara High-molecular-mass isoform of aminopeptidase N/CD13 in serum from cholestatic patients. Clin Chim Acta 330 (2003) 141-149] in cholestatic patient serum but is time-consuming. Human sera were electrophoresed on polyacrylamide gel containing Triton-X100 (Triton-PAGE) and stained with leucine-B-naphthylamide (LAP-staining). The stained bands were eluted from the gel, treated with N- and O-glycosidase if necessary, and analyzed by Western blotting [M. Kawai, Y. Otake, Y. Hara High-molecular-mass isoform of aminopeptidase N/CD13 in serum from cholestatic patients. Clin Chim Acta 330 (2003) 141-149]. Triton-PAGE and LAP-staining clearly detected fast bands in all the sera examined. Almost parallel with leucine aminopeptidase activity, slow bands were strongly stained in all 11 cholestatic patients but clearly stained in 3 out of 14 patients with hepatobiliary diseases other than cholestasis. PAGE with various concentrations of Triton showed that Triton slows down slow bands but not fast bands. Western blotting showed that Triton-PAGE-slow bands of cholestasis contained 140 and 260-kDa APN and that fast bands were slightly smaller than monomer-size slow bands after glycosidase treatment. Less time-consuming than Western blotting, Triton-PAGE and LAP-staining detect novel APN bands slowed by Triton and partly composed of the high-molecular-mass isoform in cholestasis. The slow bands seem to be homodimers of APN with transmembrane anchors. The polypeptide of the fast band seems to be processed differently from that of the slow band.
Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas
2016-04-01
Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .
Enhancing sedimentation by improving flow conditions using parallel retrofit baffles.
He, Cheng; Scott, Eric; Rochfort, Quintin
2015-09-01
In this study, placing parallel-connected baffles in the vicinity of the inlet was proposed to improve hydraulic conditions for enhancing TSS (total suspended solids) removal. The purpose of the retrofit baffle design is to divide the large and fast inflow into smaller and slower flows to increase flow uniformity. This avoids short-circuiting and increases residence time in the sedimentation basin. The newly proposed parallel-connected baffle configuration was assessed in the laboratory by comparing its TSS removal performance and the optimal flow residence time with those from the widely used series-connected baffles. The experimental results showed that the parallel-connected baffles outperformed the series-connected baffles because it could disperse flow faster and in less space by splitting the large inflow into many small branches instead of solely depending on flow internal friction over a longer flow path, as was the case under the series-connected baffles. Being able to dampen faster flow before entering the sedimentation basin is critical to reducing the possibility of disturbing any settled particles, especially under high inflow conditions. Also, for a large sedimentation basin, it may be more economically feasible to deploy the proposed parallel retrofit baffle in the vicinity of the inlet than series-connected baffles throughout the entire settling basin. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
Magnetosheath Filamentary Structures Formed by Ion Acceleration at the Quasi-Parallel Bow Shock
NASA Technical Reports Server (NTRS)
Omidi, N.; Sibeck, D.; Gutynska, O.; Trattner, K. J.
2014-01-01
Results from 2.5-D electromagnetic hybrid simulations show the formation of field-aligned, filamentary plasma structures in the magnetosheath. They begin at the quasi-parallel bow shock and extend far into the magnetosheath. These structures exhibit anticorrelated, spatial oscillations in plasma density and ion temperature. Closer to the bow shock, magnetic field variations associated with density and temperature oscillations may also be present. Magnetosheath filamentary structures (MFS) form primarily in the quasi-parallel sheath; however, they may extend to the quasi-perpendicular magnetosheath. They occur over a wide range of solar wind Alfvénic Mach numbers and interplanetary magnetic field directions. At lower Mach numbers with lower levels of magnetosheath turbulence, MFS remain highly coherent over large distances. At higher Mach numbers, magnetosheath turbulence decreases the level of coherence. Magnetosheath filamentary structures result from localized ion acceleration at the quasi-parallel bow shock and the injection of energetic ions into the magnetosheath. The localized nature of ion acceleration is tied to the generation of fast magnetosonic waves at and upstream of the quasi-parallel shock. The increased pressure in flux tubes containing the shock accelerated ions results in the depletion of the thermal plasma in these flux tubes and the enhancement of density in flux tubes void of energetic ions. This results in the observed anticorrelation between ion temperature and plasma density.
Tegel, Hanna; Yderland, Louise; Boström, Tove; Eriksson, Cecilia; Ukkonen, Kaisa; Vasala, Antti; Neubauer, Peter; Ottosson, Jenny; Hober, Sophia
2011-08-01
Protein production and analysis in a parallel fashion is today applied in laboratories worldwide and there is a great need to improve the techniques and systems used for this purpose. In order to save time and money, a fast and reliable screening method for analysis of protein production and also verification of the protein product is desired. Here, a micro-scale protocol for the parallel production and screening of 96 proteins in plate format is described. Protein capture was achieved using immobilized metal affinity chromatography and the product was verified using matrix-assisted laser desorption ionization time-of-flight MS. In order to obtain sufficiently high cell densities and product yield in the small-volume cultivations, the EnBase® cultivation technology was applied, which enables cultivation in as small volumes as 150 μL. Here, the efficiency of the method is demonstrated by producing 96 human, recombinant proteins, both in micro-scale and using a standard full-scale protocol and comparing the results in regard to both protein identity and sample purity. The results obtained are highly comparable to those acquired through employing standard full-scale purification protocols, thus validating this method as a successful initial screening step before protein production at a larger scale. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Waite, Gregory P.; Schutt, D.L.; Smith, Robert B.
2005-01-01
Teleseismic shear wave splitting measured at 56 continuous and temporary seismographs deployed in a 500 km by 600 km area around the Yellowstone hot spot indicates that fast anisotropy in the mantle is parallel to the direction of plate motion under most of the array. The average split time from all stations of 0.9 s is typical of continental stations. There is little evidence for plume-induced radial strain, suggesting that any contribution of gravitationally spreading plume material is undetectably small with respect to the plate motion velocity. Two stations within Yellowstone have splitting measurements indicating the apparent fast anisotropy direction (ϕ) is nearly perpendicular to plate motion. These stations are ∼30 km from stations with ϕ parallel to plate motion. The 70° rotation over 30 km suggests a shallow source of anisotropy; however, split times for these stations are more than 2 s. We suggest melt-filled, stress-oriented cracks in the lithosphere are responsible for the anomalous ϕ orientations within Yellowstone. Stations southeast of Yellowstone have measurements of ϕ oriented NNW to WNW at high angles to the plate motion direction. The Archean lithosphere beneath these stations may have significant anisotropy capable of producing the observed splitting.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Zuwei; Zhao, Haibo, E-mail: klinsmannzhb@163.com; Zheng, Chuguang
2015-01-15
This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule providesmore » a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are demonstrated in a physically realistic Brownian coagulation case. The computational accuracy is validated with benchmark solution of discrete-sectional method. The simulation results show that the comprehensive approach can attain very favorable improvement in cost without sacrificing computational accuracy.« less
An Approach in Radiation Therapy Treatment Planning: A Fast, GPU-Based Monte Carlo Method.
Karbalaee, Mojtaba; Shahbazi-Gahrouei, Daryoush; Tavakoli, Mohammad B
2017-01-01
An accurate and fast radiation dose calculation is essential for successful radiation radiotherapy. The aim of this study was to implement a new graphic processing unit (GPU) based radiation therapy treatment planning for accurate and fast dose calculation in radiotherapy centers. A program was written for parallel running based on GPU. The code validation was performed by EGSnrc/DOSXYZnrc. Moreover, a semi-automatic, rotary, asymmetric phantom was designed and produced using a bone, the lung, and the soft tissue equivalent materials. All measurements were performed using a Mapcheck dosimeter. The accuracy of the code was validated using the experimental data, which was obtained from the anthropomorphic phantom as the gold standard. The findings showed that, compared with those of DOSXYZnrc in the virtual phantom and for most of the voxels (>95%), <3% dose-difference or 3 mm distance-to-agreement (DTA) was found. Moreover, considering the anthropomorphic phantom, compared to the Mapcheck dose measurements, <5% dose-difference or 5 mm DTA was observed. Fast calculation speed and high accuracy of GPU-based Monte Carlo method in dose calculation may be useful in routine radiation therapy centers as the core and main component of a treatment planning verification system.
A Comparison of Two Sensors Used to Measure High-Voltage, Fast-Risetime Signals in Coaxial Cable
NASA Astrophysics Data System (ADS)
Farr, Everett G.; Atchley, Lanney M.; Ellibee, Donald E.; Carey, William J.; Altgilbers, Larry L.
We consider here two sensors that are commonly used to measure high-voltage fast-risetime signals in coaxial cable. One sensor measures the current in the cable, and is called a Current-Viewing Resistor, or CVR. In this design, the cable jacket is cut, a portion of the cable jacket is removed, and a number of resistors are inserted in parallel across the gap, thereby creating a low resistance in series with the outer cable jacket. The voltage across these resistors is proportional to the current in the coax. The second sensor measures the derivative of the voltage in the coax. It is fabricated from a "sawed-off" SMA connector that is inserted through a small hole in the cable jacket. In this paper we characterize the accuracy of both sensors when used with RG-220 cable, and we discuss the situations when one might prefer one measurement type over the other.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array
NASA Astrophysics Data System (ADS)
Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul
2008-04-01
This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
Performance of the Wavelet Decomposition on Massively Parallel Architectures
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek A.; LeMoigne, Jacqueline; Zukor, Dorothy (Technical Monitor)
2001-01-01
Traditionally, Fourier Transforms have been utilized for performing signal analysis and representation. But although it is straightforward to reconstruct a signal from its Fourier transform, no local description of the signal is included in its Fourier representation. To alleviate this problem, Windowed Fourier transforms and then wavelet transforms have been introduced, and it has been proven that wavelets give a better localization than traditional Fourier transforms, as well as a better division of the time- or space-frequency plane than Windowed Fourier transforms. Because of these properties and after the development of several fast algorithms for computing the wavelet representation of any signal, in particular the Multi-Resolution Analysis (MRA) developed by Mallat, wavelet transforms have increasingly been applied to signal analysis problems, especially real-life problems, in which speed is critical. In this paper we present and compare efficient wavelet decomposition algorithms on different parallel architectures. We report and analyze experimental measurements, using NASA remotely sensed images. Results show that our algorithms achieve significant performance gains on current high performance parallel systems, and meet scientific applications and multimedia requirements. The extensive performance measurements collected over a number of high-performance computer systems have revealed important architectural characteristics of these systems, in relation to the processing demands of the wavelet decomposition of digital images.
NASA Astrophysics Data System (ADS)
Puzyrev, Vladimir; Torres-Verdín, Carlos; Calo, Victor
2018-05-01
The interpretation of resistivity measurements acquired in high-angle and horizontal wells is a critical technical problem in formation evaluation. We develop an efficient parallel 3-D inversion method to estimate the spatial distribution of electrical resistivity in the neighbourhood of a well from deep directional electromagnetic induction measurements. The methodology places no restriction on the spatial distribution of the electrical resistivity around arbitrary well trajectories. The fast forward modelling of triaxial induction measurements performed with multiple transmitter-receiver configurations employs a parallel direct solver. The inversion uses a pre-conditioned gradient-based method whose accuracy is improved using the Wolfe conditions to estimate optimal step lengths at each iteration. The large transmitter-receiver offsets, used in the latest generation of commercial directional resistivity tools, improve the depth of investigation to over 30 m from the wellbore. Several challenging synthetic examples confirm the feasibility of the full 3-D inversion-based interpretations for these distances, hence enabling the integration of resistivity measurements with seismic amplitude data to improve the forecast of the petrophysical and fluid properties. Employing parallel direct solvers for the triaxial induction problems allows for large reductions in computational effort, thereby opening the possibility to invert multiposition 3-D data in practical CPU times.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing
NASA Astrophysics Data System (ADS)
Amooie, M. A.; Moortgat, J.
2017-12-01
We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
LSPR chip for parallel, rapid, and sensitive detection of cancer markers in serum.
Aćimović, Srdjan S; Ortega, Maria A; Sanz, Vanesa; Berthelot, Johann; Garcia-Cordero, Jose L; Renger, Jan; Maerkl, Sebastian J; Kreuzer, Mark P; Quidant, Romain
2014-05-14
Label-free biosensing based on metallic nanoparticles supporting localized surface plasmon resonances (LSPR) has recently received growing interest (Anker, J. N., et al. Nat. Mater. 2008, 7, 442-453). Besides its competitive sensitivity (Yonzon, C. R., et al. J. Am. Chem. Soc. 2004, 126, 12669-12676; Svendendahl, M., et al. Nano Lett. 2009, 9, 4428-4433) when compared to the surface plasmon resonance (SPR) approach based on extended metal films, LSPR biosensing features a high-end miniaturization potential and a significant reduction of the interrogation device bulkiness, positioning itself as a promising candidate for point-of-care diagnostic and field applications. Here, we present the first, paralleled LSPR lab-on-a-chip realization that goes well beyond the state-of-the-art, by uniting the latest advances in plasmonics, nanofabrication, microfluidics, and surface chemistry. Our system offers parallel, real-time inspection of 32 sensing sites distributed across 8 independent microfluidic channels with very high reproducibility/repeatability. This enables us to test various sensing strategies for the detection of biomolecules. In particular we demonstrate the fast detection of relevant cancer biomarkers (human alpha-feto-protein and prostate specific antigen) down to concentrations of 500 pg/mL in a complex matrix consisting of 50% human serum.
HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing
Karimi, Ramin; Hajdu, Andras
2016-01-01
Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis. PMID:26884678
HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.
Karimi, Ramin; Hajdu, Andras
2016-01-01
Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zugarramurdi, A.; Debiossac, M.; Lunca-Popa, P.
2015-03-09
We present a grazing incidence fast atom diffraction (GIFAD) study of monolayer graphene on 6H-SiC(0001). This system shows a Moiré-like 13 × 13 superlattice above the reconstructed carbon buffer layer. The averaging property of GIFAD results in electronic and geometric corrugations that are well decoupled; the graphene honeycomb corrugation is only observed with the incident beam parallel to the zigzag direction while the geometric corrugation arising from the superlattice is revealed along the armchair direction. Full-quantum calculations of the diffraction patterns show the very high GIFAD sensitivity to the amplitude of the surface corrugation. The best agreement between the calculated and measuredmore » diffraction intensities yields a corrugation height of 0.27 ± 0.03 Å.« less
Real-time multi-DSP control of three-phase current-source unity power factor PWM rectifier
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiao Wang; Boon-Teck Ooi
1993-07-01
The design of a real-time multi-DSP controller for a high-quality six-valve three-phase current-source unity power factor PWM rectifier is discussed in this paper. With the decoupler preprocessor and the dynamic trilogic PWM trigger scheme, each of the three input currents can be controlled independently. Based on the a-b-c frame system model and the fast parallel computer control, the pole-placement control method is implemented successfully to achieve fast response in the ac currents. The low-frequency resonance in the ac filter L-C networks has been damped effectively. The experimental results are obtained from a 1-kVA bipolar transistor current-source PWM rectifier with amore » real-time controller using three TMS320C25 DSP's.« less
NASA Astrophysics Data System (ADS)
Caragiulo, P.; Dragone, A.; Markovic, B.; Herbst, R.; Nishimura, K.; Reese, B.; Herrmann, S.; Hart, P.; Blaj, G.; Segal, J.; Tomada, A.; Hasi, J.; Carini, G.; Kenney, C.; Haller, G.
2015-05-01
ePix10k is a variant of a novel class of integrating pixel ASICs architectures optimized for the processing of signals in second generation LINAC Coherent Light Source (LCLS) X-Ray cameras. The ASIC is optimized for high dynamic range application requiring high spatial resolution and fast frame rates. ePix ASICs are based on a common platform composed of a random access analog matrix of pixel with global shutter, fast parallel column readout, and dedicated sigma-delta analog to digital converters per column. The ePix10k variant has 100um×100um pixels arranged in a 176×192 matrix, a resolution of 140e- r.m.s. and a signal range of 3.5pC (10k photons at 8keV). In its final version it will be able to sustain a frame rate of 2kHz. A first prototype has been fabricated and characterized. Performance in terms of noise, linearity, uniformity, cross-talk, together with preliminary measurements with bump bonded sensors are reported here.
NASA Astrophysics Data System (ADS)
Takagi, R.; Okada, T.; Yoshida, K.; Townend, J.; Boese, C. M.; Baratin, L. M.; Chamberlain, C. J.; Savage, M. K.
2016-12-01
We estimate shear wave velocity anisotropy in shallow crust near the Alpine fault using seismic interferometry of borehole vertical arrays. We utilized four borehole observations: two sensors are deployed in two boreholes of the Deep Fault Drilling Project in the hanging wall side, and the other two sites are located in the footwall side. Surface sensors deployed just above each borehole are used to make vertical arrays. Crosscorrelating rotated horizontal seismograms observed by the borehole and surface sensors, we extracted polarized shear waves propagating from the bottom to the surface of each borehole. The extracted shear waves show polarization angle dependence of travel time, indicating shear wave anisotropy between the two sensors. In the hanging wall side, the estimated fast shear wave directions are parallel to the Alpine fault. Strong anisotropy of 20% is observed at the site within 100 m from the Alpine fault. The hanging wall consists of mylonite and schist characterized by fault parallel foliation. In addition, an acoustic borehole imaging reveals fractures parallel to the Alpine fault. The fault parallel anisotropy suggest structural anisotropy is predominant in the hanging wall, demonstrating consistency of geological and seismological observations. In the footwall side, on the other hand, the angle between the fast direction and the strike of the Alpine fault is 33-40 degrees. Since the footwall is composed of granitoid that may not have planar structure, stress induced anisotropy is possibly predominant. The direction of maximum horizontal stress (SHmax) estimated by focal mechanisms of regional earthquakes is 55 degrees of the Alpine fault. Possible interpretation of the difference between the fast direction and SHmax direction is depth rotation of stress field near the Alpine fault. Similar depth rotation of stress field is also observed in the SAFOD borehole at the San Andreas fault.
Fast Computation and Assessment Methods in Power System Analysis
NASA Astrophysics Data System (ADS)
Nagata, Masaki
Power system analysis is essential for efficient and reliable power system operation and control. Recently, online security assessment system has become of importance, as more efficient use of power networks is eagerly required. In this article, fast power system analysis techniques such as contingency screening, parallel processing and intelligent systems application are briefly surveyed from the view point of their application to online dynamic security assessment.
The Anisotropic Structure of South China Sea: Using OBS Data to Constrain Mantle Flow
NASA Astrophysics Data System (ADS)
Li, L.; Xue, M.; Yang, T.; Liu, C.; Hua, Q.; Xia, S.; Huang, H.; Le, B. M.; Huo, D.; Pan, M.
2015-12-01
The dynamic mechanism of the formation of South China Sea (SCS) has been debated for decades. The anisotropic structure can provide useful insight into the complex evolution of SCS by indicating its mantle flow direction and strength. In this study, we employ shear wave splitting methods on two half-year seismic data collected from 10 and 6 passive source Ocean Bottom Seismometers (OBS) respectively. These OBSs were deployed along both sides of the extinct ridge in the central basin of SCS by Tongji University in 2012 and 2013 respectively, which were then successfully recovered in 2013 and 2015 respectively. Through processing and inspecting the global and regional earthquakes (with local events being processing) of the 2012 dataset, measurements are made for 2 global events and 24 regional events at 5 OBSs using the tangential energy minimization, the smallest eigenvalue minimization, as well as the correlation methods. We also implement cluster analysis on the splitting results obtained for different time windows as well as filtered at different frequency bands. For teleseismic core phases like SKS and PKS, we find the fast polarization direction beneath the central basin is approximately NE-SW, nearly parallel to the extinct ridge in the central basin of SCS. Whereas for regional events, the splitting analysis on S, PS and ScS phases shows much more complicated fast directions as the ray path varies for different phases. The fast directions observed can be divided into three groups: (1) for the events from the Eurasia plate, a gradual rotation of the fast polarization direction from NNE-SSW to NEE-SWW along the path from the inner Eurasia plate to the central SCS is observed, implying the mantle flow is controlled by the India-Eurasia collision; (2) for the events located at the junction of Pacific plate and Philippine plate, the dominant fast direction is NW-SE, almost perpendicular to Ryukyu Trench as well as sub-parallel to the absolute direction of Philippine plate; (3) for the events occurred in the SE direction near the Philippine Fault zone, the observed NE-SW fast direction is sub-parallel to the subduction direction of the Philippine plate.
JSD: Parallel Job Accounting on the IBM SP2
NASA Technical Reports Server (NTRS)
Saphir, William; Jones, James Patton; Walter, Howard (Technical Monitor)
1995-01-01
The IBM SP2 is one of the most promising parallel computers for scientific supercomputing - it is fast and usually reliable. One of its biggest problems is a lack of robust and comprehensive system software. Among other things, this software allows a collection of Unix processes to be treated as a single parallel application. It does not, however, provide accounting for parallel jobs other than what is provided by AIX for the individual process components. Without parallel job accounting, it is not possible to monitor system use, measure the effectiveness of system administration strategies, or identify system bottlenecks. To address this problem, we have written jsd, a daemon that collects accounting data for parallel jobs. jsd records information in a format that is easily machine- and human-readable, allowing us to extract the most important accounting information with very little effort. jsd also notifies system administrators in certain cases of system failure.
Final report for the Tera Computer TTI CRADA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davidson, G.S.; Pavlakos, C.; Silva, C.
1997-01-01
Tera Computer and Sandia National Laboratories have completed a CRADA, which examined the Tera Multi-Threaded Architecture (MTA) for use with large codes of importance to industry and DOE. The MTA is an innovative architecture that uses parallelism to mask latency between memories and processors. The physical implementation is a parallel computer with high cross-section bandwidth and GaAs processors designed by Tera, which support many small computation threads and fast, lightweight context switches between them. When any thread blocks while waiting for memory accesses to complete, another thread immediately begins execution so that high CPU utilization is maintained. The Tera MTAmore » parallel computer has a single, global address space, which is appealing when porting existing applications to a parallel computer. This ease of porting is further enabled by compiler technology that helps break computations into parallel threads. DOE and Sandia National Laboratories were interested in working with Tera to further develop this computing concept. While Tera Computer would continue the hardware development and compiler research, Sandia National Laboratories would work with Tera to ensure that their compilers worked well with important Sandia codes, most particularly CTH, a shock physics code used for weapon safety computations. In addition to that important code, Sandia National Laboratories would complete research on a robotic path planning code, SANDROS, which is important in manufacturing applications, and would evaluate the MTA performance on this code. Finally, Sandia would work directly with Tera to develop 3D visualization codes, which would be appropriate for use with the MTA. Each of these tasks has been completed to the extent possible, given that Tera has just completed the MTA hardware. All of the CRADA work had to be done on simulators.« less
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-01-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008
Accelerated Adaptive MGS Phase Retrieval
NASA Technical Reports Server (NTRS)
Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang
2011-01-01
The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
A mixed finite difference/Galerkin method for three-dimensional Rayleigh-Benard convection
NASA Technical Reports Server (NTRS)
Buell, Jeffrey C.
1988-01-01
A fast and accurate numerical method, for nonlinear conservation equation systems whose solutions are periodic in two of the three spatial dimensions, is presently implemented for the case of Rayleigh-Benard convection between two rigid parallel plates in the parameter region where steady, three-dimensional convection is known to be stable. High-order streamfunctions secure the reduction of the system of five partial differential equations to a system of only three. Numerical experiments are presented which verify both the expected convergence rates and the absolute accuracy of the method.
High performance data acquisition, identification, and monitoring for active magnetic bearings
NASA Technical Reports Server (NTRS)
Herzog, Raoul; Siegwart, Roland
1994-01-01
Future active magnetic bearing systems (AMB) must feature easier on-site tuning, higher stiffness and damping, better robustness with respect to undesirable vibrations in housing and foundation, and enhanced monitoring and identification abilities. To get closer to these goals we developed a fast parallel link from the digitally controlled AMB to Matlab, which is used on a host computer for data processing, identification, and controller layout. This enables the magnetic bearing to take its frequency responses without using any additional measurement equipment. These measurements can be used for AMB identification.
A High-Performance Parallel Implementation of the Certified Reduced Basis Method
2010-12-15
point of view of model reduction due to the “curse of dimensionality”. We consider transient thermal conduction in a three– dimensional “ Swiss cheese ... Swiss cheese ” problem (see Figure 7a) there are 54 unique ordered pairs in I. A histogram of 〈δµ〉 values computed for the ntrain = 106 case is given in...our primal-dual RB method yields a very fast and accurate output approxima- tion for the “ Swiss Cheese ” problem. Our goal in this final subsection is
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation
Harrison, Robert J.; Beylkin, Gregory; Bischoff, Florian A.; ...
2016-01-01
We present MADNESS (multiresolution adaptive numerical environment for scientific simulation) that is a high-level software environment for solving integral and differential equations in many dimensions that uses adaptive and fast harmonic analysis methods with guaranteed precision that are based on multiresolution analysis and separated representations. Underpinning the numerical capabilities is a powerful petascale parallel programming environment that aims to increase both programmer productivity and code scalability. This paper describes the features and capabilities of MADNESS and briefly discusses some current applications in chemistry and several areas of physics.
Equatorial anisotropy of the Earth's inner inner core from autocorrelations of earthquake coda
NASA Astrophysics Data System (ADS)
Wang, T.; Song, X.; Xia, H.
2014-12-01
The anisotropic structure of the inner core seems complex with significant depth and lateral variations. An innermost inner core has been suggested with a distinct form of anisotropy, but it has considerable uncertainties in its form, size, or even existence. All the previous inner-core anisotropy models have assumed a cylindrical anisotropy with the symmetry axis parallel (or nearly parallel) to the Earth's spin axis. In this study, we obtain inner-core phases, PKIIKP2 and PKIKP2 (the round-trip phases between the station and its antipode that passes straight through the center of the Earth and that is reflected from the inner-core boundary, respectively), from stackings of autocorrelations of earthquake coda at seismic station clusters around the world. The differential travel times PKIIKP2 - PKIKP2, which are sensitive to inner-core structure, show fast arrivals at high latitudes. However, we also observed large variations of up to 10 s along equatorial paths. These observations can be explained by a cylindrical anisotropy in the inner inner core (IIC) (with a radius of slightly less than half the inner core radius) that has a fast axis aligned near the equator and a cylindrical anisotropy in the outer inner core (OIC) that has a fast axis along the north-south direction. The equatorial fast axis of the IIC is near the Central America and the Southeast Asia. The form of the anisotropy in the IIC is distinctly different from that in the OIC and the anisotropy amplitude in the IIC is about 70% stronger than in the OIC. The different forms of anisotropy may be explained by a two-phase system of iron in the inner core (hcp in the OIC and bcc in the IIC). These results may suggest a major shift of the tectonics of the inner core during its formation and growth.
Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis.
Däumer, Martin; Kaiser, Rolf; Klein, Rolf; Lengauer, Thomas; Thiele, Bernhard; Thielen, Alexander
2011-05-13
Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4) variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS) detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%), and defining a minority cutoff of 5%, the results were concordant in all but one isolate. The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.
Fast encryption of RGB color digital images using a tweakable cellular automaton based schema
NASA Astrophysics Data System (ADS)
Faraoun, Kamel Mohamed
2014-12-01
We propose a new tweakable construction of block-enciphers using second-order reversible cellular automata, and we apply it to encipher RGB-colored images. The proposed construction permits a parallel encryption of the image content by extending the standard definition of a block cipher to take into account a supplementary parameter used as a tweak (nonce) to control the behavior of the cipher from one region of the image to the other, and hence avoid the necessity to use slow sequential encryption's operating modes. The proposed construction defines a flexible pseudorandom permutation that can be used with efficacy to solve the electronic code book problem without the need to a specific sequential mode. Obtained results from various experiments show that the proposed schema achieves high security and execution performances, and enables an interesting mode of selective area decryption due to the parallel character of the approach.
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.
1991-01-01
Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
NASA Astrophysics Data System (ADS)
Shi, Wei; Hu, Xiaosong; Jin, Chao; Jiang, Jiuchun; Zhang, Yanru; Yip, Tony
2016-05-01
With the development and popularization of electric vehicles, it is urgent and necessary to develop effective management and diagnosis technology for battery systems. In this work, we design a parallel battery model, according to equivalent circuits of parallel voltage and branch current, to study effects of imbalanced currents on parallel large-format LiFePO4/graphite battery systems. Taking a 60 Ah LiFePO4/graphite battery system manufactured by ATL (Amperex Technology Limited, China) as an example, causes of imbalanced currents in the parallel connection are analyzed using our model, and the associated effect mechanisms on long-term stability of each single battery are examined. Theoretical and experimental results show that continuously increasing imbalanced currents during cycling are mainly responsible for the capacity fade of LiFePO4/graphite parallel batteries. It is thus a good way to avoid fast performance fade of parallel battery systems by suppressing variations of branch currents.
Zeki, Semir
2016-10-01
Results from a variety of sources, some many years old, lead ineluctably to a re-appraisal of the twin strategies of hierarchical and parallel processing used by the brain to construct an image of the visual world. Contrary to common supposition, there are at least three 'feed-forward' anatomical hierarchies that reach the primary visual cortex (V1) and the specialized visual areas outside it, in parallel. These anatomical hierarchies do not conform to the temporal order with which visual signals reach the specialized visual areas through V1. Furthermore, neither the anatomical hierarchies nor the temporal order of activation through V1 predict the perceptual hierarchies. The latter shows that we see (and become aware of) different visual attributes at different times, with colour leading form (orientation) and directional visual motion, even though signals from fast-moving, high-contrast stimuli are among the earliest to reach the visual cortex (of area V5). Parallel processing, on the other hand, is much more ubiquitous than commonly supposed but is subject to a barely noticed but fundamental aspect of brain operations, namely that different parallel systems operate asynchronously with respect to each other and reach perceptual endpoints at different times. This re-assessment leads to the conclusion that the visual brain is constituted of multiple, parallel and asynchronously operating task- and stimulus-dependent hierarchies (STDH); which of these parallel anatomical hierarchies have temporal and perceptual precedence at any given moment is stimulus and task related, and dependent on the visual brain's ability to undertake multiple operations asynchronously. © 2016 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Lee, Hangyeore; Mun, Dong-Gi; Bae, Jingi; Kim, Hokeun; Oh, Se Yeon; Park, Young Soo; Lee, Jae-Hyuk; Lee, Sang-Won
2015-08-21
We report a new and simple design of a fully automated dual-online ultra-high pressure liquid chromatography system. The system employs only two nano-volume switching valves (a two-position four port valve and a two-position ten port valve) that direct solvent flows from two binary nano-pumps for parallel operation of two analytical columns and two solid phase extraction (SPE) columns. Despite the simple design, the sDO-UHPLC offers many advantageous features that include high duty cycle, back flushing sample injection for fast and narrow zone sample injection, online desalting, high separation resolution and high intra/inter-column reproducibility. This system was applied to analyze proteome samples not only in high throughput deep proteome profiling experiments but also in high throughput MRM experiments.
Chang, Gregory; Friedrich, Klaus M; Wang, Ligong; Vieira, Renata L R; Schweitzer, Mark E; Recht, Michael P; Wiggins, Graham C; Regatte, Ravinder R
2010-03-01
To determine the feasibility of performing MRI of the wrist at 7 Tesla (T) with parallel imaging and to evaluate how acceleration factors (AF) affect signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), and image quality. This study had institutional review board approval. A four-transmit eight-receive channel array coil was constructed in-house. Nine healthy subjects were scanned on a 7T whole-body MR scanner. Coronal and axial images of cartilage and trabecular bone micro-architecture (3D-Fast Low Angle Shot (FLASH) with and without fat suppression, repetition time/echo time = 20 ms/4.5 ms, flip angle = 10 degrees , 0.169-0.195 x 0.169-0.195 mm, 0.5-1 mm slice thickness) were obtained with AF 1, 2, 3, 4. T1-weighted fast spin-echo (FSE), proton density-weighted FSE, and multiple-echo data image combination (MEDIC) sequences were also performed. SNR and CNR were measured. Three musculoskeletal radiologists rated image quality. Linear correlation analysis and paired t-tests were performed. At higher AF, SNR and CNR decreased linearly for cartilage, muscle, and trabecular bone (r < -0.98). At AF 4, reductions in SNR/CNR were:52%/60% (cartilage), 72%/63% (muscle), 45%/50% (trabecular bone). Radiologists scored images with AF 1 and 2 as near-excellent, AF 3 as good-to-excellent (P = 0.075), and AF 4 as average-to-good (P = 0.11). It is feasible to perform high resolution 7T MRI of the wrist with parallel imaging. SNR and CNR decrease with higher AF, but image quality remains above-average.
LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS*
Meng, Xiangrui; Saunders, Michael A.; Mahoney, Michael W.
2014-01-01
We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈ℝn ‖Ax − b‖2, where A ∈ ℝm × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK’s DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster. PMID:25419094
Walter, Alexander M; Pinheiro, Paulo S; Verhage, Matthijs; Sørensen, Jakob B
2013-01-01
Neurotransmitter release depends on the fusion of secretory vesicles with the plasma membrane and the release of their contents. The final fusion step displays higher-order Ca(2+) dependence, but also upstream steps depend on Ca(2+). After deletion of the Ca(2+) sensor for fast release - synaptotagmin-1 - slower Ca(2+)-dependent release components persist. These findings have provoked working models involving parallel releasable vesicle pools (Parallel Pool Models, PPM) driven by alternative Ca(2+) sensors for release, but no slow release sensor acting on a parallel vesicle pool has been identified. We here propose a Sequential Pool Model (SPM), assuming a novel Ca(2+)-dependent action: a Ca(2+)-dependent catalyst that accelerates both forward and reverse priming reactions. While both models account for fast fusion from the Readily-Releasable Pool (RRP) under control of synaptotagmin-1, the origins of slow release differ. In the SPM the slow release component is attributed to the Ca(2+)-dependent refilling of the RRP from a Non-Releasable upstream Pool (NRP), whereas the PPM attributes slow release to a separate slowly-releasable vesicle pool. Using numerical integration we compared model predictions to data from mouse chromaffin cells. Like the PPM, the SPM explains biphasic release, Ca(2+)-dependence and pool sizes in mouse chromaffin cells. In addition, the SPM accounts for the rapid recovery of the fast component after strong stimulation, where the PPM fails. The SPM also predicts the simultaneous changes in release rate and amplitude seen when mutating the SNARE-complex. Finally, it can account for the loss of fast- and the persistence of slow release in the synaptotagmin-1 knockout by assuming that the RRP is depleted, leading to slow and Ca(2+)-dependent fusion from the NRP. We conclude that the elusive 'alternative Ca(2+) sensor' for slow release might be the upstream priming catalyst, and that a sequential model effectively explains Ca(2+)-dependent properties of secretion without assuming parallel pools or sensors.
Walter, Alexander M.; Pinheiro, Paulo S.; Verhage, Matthijs; Sørensen, Jakob B.
2013-01-01
Neurotransmitter release depends on the fusion of secretory vesicles with the plasma membrane and the release of their contents. The final fusion step displays higher-order Ca2+ dependence, but also upstream steps depend on Ca2+. After deletion of the Ca2+ sensor for fast release – synaptotagmin-1 – slower Ca2+-dependent release components persist. These findings have provoked working models involving parallel releasable vesicle pools (Parallel Pool Models, PPM) driven by alternative Ca2+ sensors for release, but no slow release sensor acting on a parallel vesicle pool has been identified. We here propose a Sequential Pool Model (SPM), assuming a novel Ca2+-dependent action: a Ca2+-dependent catalyst that accelerates both forward and reverse priming reactions. While both models account for fast fusion from the Readily-Releasable Pool (RRP) under control of synaptotagmin-1, the origins of slow release differ. In the SPM the slow release component is attributed to the Ca2+-dependent refilling of the RRP from a Non-Releasable upstream Pool (NRP), whereas the PPM attributes slow release to a separate slowly-releasable vesicle pool. Using numerical integration we compared model predictions to data from mouse chromaffin cells. Like the PPM, the SPM explains biphasic release, Ca2+-dependence and pool sizes in mouse chromaffin cells. In addition, the SPM accounts for the rapid recovery of the fast component after strong stimulation, where the PPM fails. The SPM also predicts the simultaneous changes in release rate and amplitude seen when mutating the SNARE-complex. Finally, it can account for the loss of fast- and the persistence of slow release in the synaptotagmin-1 knockout by assuming that the RRP is depleted, leading to slow and Ca2+-dependent fusion from the NRP. We conclude that the elusive ‘alternative Ca2+ sensor’ for slow release might be the upstream priming catalyst, and that a sequential model effectively explains Ca2+-dependent properties of secretion without assuming parallel pools or sensors. PMID:24339761
The role of current sheet formation in driven plasmoid reconnection in laser-produced plasma bubbles
NASA Astrophysics Data System (ADS)
Lezhnin, Kirill; Fox, William; Bhattacharjee, Amitava
2017-10-01
We conduct a multiparametric study of driven magnetic reconnection relevant to recent experiments on colliding magnetized laser produced plasmas using the PIC code PSC. Varying the background plasma density, plasma resistivity, and plasma bubble geometry, the results demonstrate a variety of reconnection behavior and show the coupling between magnetic reconnection and global fluid evolution of the system. We consider both collision of two radially expanding bubbles where reconnection is driven through an X-point, and collision of two parallel fields where reconnection must be initiated by the tearing instability. Under various conditions, we observe transitions between fast, collisionless reconnection to a Sweet-Parker-like slow reconnection to complete stalling of the reconnection. By varying plasma resistivity, we observe the transition between fast and slow reconnection at Lundquist number S 103 . The transition from plasmoid reconnection to a single X-point reconnection also happens around S 103 . We find that the criterion δ /di < 1 is necessary for fast reconnection onset. Finally, at sufficiently high background density, magnetic reconnection can be suppressed, leading to bouncing motion of the magnetized plasma bubbles.
Development of fast parallel multi-technique scanning X-ray imaging at Synchrotron Soleil
NASA Astrophysics Data System (ADS)
Medjoubi, K.; Leclercq, N.; Langlois, F.; Buteau, A.; Lé, S.; Poirier, S.; Mercère, P.; Kewish, C. M.; Somogyi, A.
2013-10-01
A fast multimodal scanning X-ray imaging scheme is prototyped at Soleil Synchrotron. It permits the simultaneous acquisition of complementary information on the sample structure, composition and chemistry by measuring transmission, differential phase contrast, small-angle scattering, and X-ray fluorescence by dedicated detectors with ms dwell time per pixel. The results of the proof of principle experiments are presented in this paper.
Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing.
Duan, Ling-Yu; Sun, Wei; Zhang, Xinfeng; Wang, Shiqi; Chen, Jie; Yin, Jianxiong; See, Simon; Huang, Tiejun; Kot, Alex C; Gao, Wen
2018-05-01
The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of a CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of graphics processing unit (GPU). We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation as well as the memory access mechanism are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU for resolving the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which enables to leverage the advantages of GPU platforms harmoniously, and yield significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.
FAST MAGNETOACOUSTIC WAVE TRAINS OF SAUSAGE SYMMETRY IN CYLINDRICAL WAVEGUIDES OF THE SOLAR CORONA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shestov, S.; Kuzin, S.; Nakariakov, V. M., E-mail: sshestov@gmail.com
2015-12-01
Fast magnetoacoustic waves guided along the magnetic field by plasma non-uniformities, in particular coronal loops, fibrils, and plumes, are known to be highly dispersive, which lead to the formation of quasi-periodic wave trains excited by a broadband impulsive driver, e.g., a solar flare. We investigated the effects of cylindrical geometry on the fast sausage wave train formation. We performed magnetohydrodynamic numerical simulations of fast magnetoacoustic perturbations of a sausage symmetry, propagating from a localized impulsive source along a field-aligned plasma cylinder with a smooth radial profile of the fast speed. The wave trains are found to have pronounced period modulation,more » with the longer instant period seen in the beginning of the wave train. The wave trains also have a pronounced amplitude modulation. Wavelet spectra of the wave trains have characteristic tadpole features, with the broadband large-amplitude heads preceding low-amplitude quasi-monochromatic tails. The mean period of the wave train is about the transverse fast magnetoacoustic transit time across the cylinder. The mean parallel wavelength is about the diameter of the wave-guiding plasma cylinder. Instant periods are longer than the sausage wave cutoff period. The wave train characteristics depend on the fast magnetoacoustic speed in both the internal and external media, the smoothness of the transverse profile of the equilibrium quantities, and also the spatial size of the initial perturbation. If the initial perturbation is localized at the axis of the cylinder, the wave trains contain higher radial harmonics that have shorter periods.« less
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions
NASA Astrophysics Data System (ADS)
Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin
2017-01-01
We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
NASA Astrophysics Data System (ADS)
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
BFL: a node and edge betweenness based fast layout algorithm for large scale networks
Hashimoto, Tatsunori B; Nagasaki, Masao; Kojima, Kaname; Miyano, Satoru
2009-01-01
Background Network visualization would serve as a useful first step for analysis. However, current graph layout algorithms for biological pathways are insensitive to biologically important information, e.g. subcellular localization, biological node and graph attributes, or/and not available for large scale networks, e.g. more than 10000 elements. Results To overcome these problems, we propose the use of a biologically important graph metric, betweenness, a measure of network flow. This metric is highly correlated with many biological phenomena such as lethality and clusters. We devise a new fast parallel algorithm calculating betweenness to minimize the preprocessing cost. Using this metric, we also invent a node and edge betweenness based fast layout algorithm (BFL). BFL places the high-betweenness nodes to optimal positions and allows the low-betweenness nodes to reach suboptimal positions. Furthermore, BFL reduces the runtime by combining a sequential insertion algorim with betweenness. For a graph with n nodes, this approach reduces the expected runtime of the algorithm to O(n2) when considering edge crossings, and to O(n log n) when considering only density and edge lengths. Conclusion Our BFL algorithm is compared against fast graph layout algorithms and approaches requiring intensive optimizations. For gene networks, we show that our algorithm is faster than all layout algorithms tested while providing readability on par with intensive optimization algorithms. We achieve a 1.4 second runtime for a graph with 4000 nodes and 12000 edges on a standard desktop computer. PMID:19146673
Parallel processing in a host plus multiple array processor system for radar
NASA Technical Reports Server (NTRS)
Barkan, B. Z.
1983-01-01
Host plus multiple array processor architecture is demonstrated to yield a modular, fast, and cost-effective system for radar processing. Software methodology for programming such a system is developed. Parallel processing with pipelined data flow among the host, array processors, and discs is implemented. Theoretical analysis of performance is made and experimentally verified. The broad class of problems to which the architecture and methodology can be applied is indicated.
NASA Astrophysics Data System (ADS)
Sato, Yuki; Fukuda, Naoki; Takeda, Hiroyuki; Kameda, Daisuke; Suzuki, Hiroshi; Shimizu, Yohei; Ahn, DeukSoon; Murai, Daichi; Inabe, Naohito; Shimaoka, Takehiro; Tsubota, Masakatsu; Kaneko, Junichi H.; Chayahara, Akiyoshi; Umezawa, Hitoshi; Shikata, Shinichi; Kumagai, Hidekazu; Murakami, Hiroyuki; Sato, Hiromi; Yoshida, Koichi; Kubo, Toshiyuki
A multiple sampling ionization chamber (MUSIC) and parallel-plate avalanche counters (PPACs) were installed within the superconducting in-flight separator, named BigRIPS, at the RIKEN Nishina Center for particle identification of RI beams. The MUSIC detector showed negligible charge collection inefficiency from recombination of electrons and ions, up to a 99-kcps incidence rate for high-energy heavy ions. For the PPAC detectors, the electrical discharge durability for incident heavy ions was improved by changing the electrode material. Finally, we designed a single crystal diamond detector, which is under development for TOF measurements of high-energy heavy ions, that has a very fast response time (pulse width <1 ns).
NASA Astrophysics Data System (ADS)
Eilert, Tobias; Beckers, Maximilian; Drechsler, Florian; Michaelis, Jens
2017-10-01
The analysis tool and software package Fast-NPS can be used to analyse smFRET data to obtain quantitative structural information about macromolecules in their natural environment. In the algorithm a Bayesian model gives rise to a multivariate probability distribution describing the uncertainty of the structure determination. Since Fast-NPS aims to be an easy-to-use general-purpose analysis tool for a large variety of smFRET networks, we established an MCMC based sampling engine that approximates the target distribution and requires no parameter specification by the user at all. For an efficient local exploration we automatically adapt the multivariate proposal kernel according to the shape of the target distribution. In order to handle multimodality, the sampler is equipped with a parallel tempering scheme that is fully adaptive with respect to temperature spacing and number of chains. Since the molecular surrounding of a dye molecule affects its spatial mobility and thus the smFRET efficiency, we introduce dye models which can be selected for every dye molecule individually. These models allow the user to represent the smFRET network in great detail leading to an increased localisation precision. Finally, a tool to validate the chosen model combination is provided. Programme Files doi:http://dx.doi.org/10.17632/7ztzj63r68.1 Licencing provisions: Apache-2.0 Programming language: GUI in MATLAB (The MathWorks) and the core sampling engine in C++ Nature of problem: Sampling of highly diverse multivariate probability distributions in order to solve for macromolecular structures from smFRET data. Solution method: MCMC algorithm with fully adaptive proposal kernel and parallel tempering scheme.
Saeedi, Ehsan; Kong, Yinan
2017-01-01
In this paper, we propose a novel parallel architecture for fast hardware implementation of elliptic curve point multiplication (ECPM), which is the key operation of an elliptic curve cryptography processor. The point multiplication over binary fields is synthesized on both FPGA and ASIC technology by designing fast elliptic curve group operations in Jacobian projective coordinates. A novel combined point doubling and point addition (PDPA) architecture is proposed for group operations to achieve high speed and low hardware requirements for ECPM. It has been implemented over the binary field which is recommended by the National Institute of Standards and Technology (NIST). The proposed ECPM supports two Koblitz and random curves for the key sizes 233 and 163 bits. For group operations, a finite-field arithmetic operation, e.g. multiplication, is designed on a polynomial basis. The delay of a 233-bit point multiplication is only 3.05 and 3.56 μs, in a Xilinx Virtex-7 FPGA, for Koblitz and random curves, respectively, and 0.81 μs in an ASIC 65-nm technology, which are the fastest hardware implementation results reported in the literature to date. In addition, a 163-bit point multiplication is also implemented in FPGA and ASIC for fair comparison which takes around 0.33 and 0.46 μs, respectively. The area-time product of the proposed point multiplication is very low compared to similar designs. The performance (1Area×Time=1AT) and Area × Time × Energy (ATE) product of the proposed design are far better than the most significant studies found in the literature. PMID:28459831
Fast Confocal Raman Imaging Using a 2-D Multifocal Array for Parallel Hyperspectral Detection.
Kong, Lingbo; Navas-Moreno, Maria; Chan, James W
2016-01-19
We present the development of a novel confocal hyperspectral Raman microscope capable of imaging at speeds up to 100 times faster than conventional point-scan Raman microscopy under high noise conditions. The microscope utilizes scanning galvomirrors to generate a two-dimensional (2-D) multifocal array at the sample plane, generating Raman signals simultaneously at each focus of the array pattern. The signals are combined into a single beam and delivered through a confocal pinhole before being focused through the slit of a spectrometer. To separate the signals from each row of the array, a synchronized scan mirror placed in front of the spectrometer slit positions the Raman signals onto different pixel rows of the detector. We devised an approach to deconvolve the superimposed signals and retrieve the individual spectra at each focal position within a given row. The galvomirrors were programmed to scan different focal arrays following Hadamard encoding patterns. A key feature of the Hadamard detection is the reconstruction of individual spectra with improved signal-to-noise ratio. Using polystyrene beads as test samples, we demonstrated not only that our system images faster than a conventional point-scan method but that it is especially advantageous under noisy conditions, such as when the CCD detector operates at fast read-out rates and high temperatures. This is the first demonstration of multifocal confocal Raman imaging in which parallel spectral detection is implemented along both axes of the CCD detector chip. We envision this novel 2-D multifocal spectral detection technique can be used to develop faster imaging spontaneous Raman microscopes with lower cost detectors.
Hossain, Md Selim; Saeedi, Ehsan; Kong, Yinan
2017-01-01
In this paper, we propose a novel parallel architecture for fast hardware implementation of elliptic curve point multiplication (ECPM), which is the key operation of an elliptic curve cryptography processor. The point multiplication over binary fields is synthesized on both FPGA and ASIC technology by designing fast elliptic curve group operations in Jacobian projective coordinates. A novel combined point doubling and point addition (PDPA) architecture is proposed for group operations to achieve high speed and low hardware requirements for ECPM. It has been implemented over the binary field which is recommended by the National Institute of Standards and Technology (NIST). The proposed ECPM supports two Koblitz and random curves for the key sizes 233 and 163 bits. For group operations, a finite-field arithmetic operation, e.g. multiplication, is designed on a polynomial basis. The delay of a 233-bit point multiplication is only 3.05 and 3.56 μs, in a Xilinx Virtex-7 FPGA, for Koblitz and random curves, respectively, and 0.81 μs in an ASIC 65-nm technology, which are the fastest hardware implementation results reported in the literature to date. In addition, a 163-bit point multiplication is also implemented in FPGA and ASIC for fair comparison which takes around 0.33 and 0.46 μs, respectively. The area-time product of the proposed point multiplication is very low compared to similar designs. The performance ([Formula: see text]) and Area × Time × Energy (ATE) product of the proposed design are far better than the most significant studies found in the literature.
Self-calibrated correlation imaging with k-space variant correlation functions.
Li, Yu; Edalati, Masoud; Du, Xingfu; Wang, Hui; Cao, Jie J
2018-03-01
Correlation imaging is a previously developed high-speed MRI framework that converts parallel imaging reconstruction into the estimate of correlation functions. The presented work aims to demonstrate this framework can provide a speed gain over parallel imaging by estimating k-space variant correlation functions. Because of Fourier encoding with gradients, outer k-space data contain higher spatial-frequency image components arising primarily from tissue boundaries. As a result of tissue-boundary sparsity in the human anatomy, neighboring k-space data correlation varies from the central to the outer k-space. By estimating k-space variant correlation functions with an iterative self-calibration method, correlation imaging can benefit from neighboring k-space data correlation associated with both coil sensitivity encoding and tissue-boundary sparsity, thereby providing a speed gain over parallel imaging that relies only on coil sensitivity encoding. This new approach is investigated in brain imaging and free-breathing neonatal cardiac imaging. Correlation imaging performs better than existing parallel imaging techniques in simulated brain imaging acceleration experiments. The higher speed enables real-time data acquisition for neonatal cardiac imaging in which physiological motion is fast and non-periodic. With k-space variant correlation functions, correlation imaging gives a higher speed than parallel imaging and offers the potential to image physiological motion in real-time. Magn Reson Med 79:1483-1494, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Seismic properties of lawsonite eclogites from the southern Motagua fault zone, Guatemala
NASA Astrophysics Data System (ADS)
Kim, Daeyeong; Wallis, Simon; Endo, Shunsuke; Ree, Jin-Han
2016-05-01
We present new data on the crystal preferred orientation (CPO) and seismic properties of omphacite and lawsonite in extremely fresh eclogite from the southern Motagua fault zone, Guatemala, to discuss the seismic anisotropy of subducting oceanic crust. The CPO of omphacite is characterized by (010)[001], and it shows P-wave seismic anisotropies (AVP) of 1.4%-3.2% and S-wave seismic anisotropies (AVS) of 1.4%-2.7%. Lawsonite exhibits (001) planes parallel to the foliation and [010] axes parallel to the lineation, and seismic anisotropies of 1.7%-6.6% AVP and 3.4%-14.7% AVS. The seismic anisotropy of a rock mass consisting solely of omphacite and lawsonite is 1.2%-4.1% AVP and 1.8%-6.8% AVS. For events that propagate more or less parallel to the maximum extension direction, X, the fast S-wave velocity (VS) polarization is parallel to the Z in the Y-Z section (rotated from the X-Z section), causing trench-normal seismic anisotropy for orthogonal subduction. Based on the high modal abundance and strong fabric of lawsonite, the AVS of eclogites is estimated as ~ 11.7% in the case that lawsonite makes up ~ 75% of the rock mass. On this basis, we suggest that lawsonite in both blueschist and eclogite may play important roles in the formation of complex pattern of seismic anisotropy observed in NE Japan: weak trench-parallel anisotropy in the forearc basin domains and trench-normal anisotropy in the backarc region.
Introducing GAMER: A fast and accurate method for ray-tracing galaxies using procedural noise
DOE Office of Scientific and Technical Information (OSTI.GOV)
Groeneboom, N. E.; Dahle, H., E-mail: nicolaag@astro.uio.no
2014-03-10
We developed a novel approach for fast and accurate ray-tracing of galaxies using procedural noise fields. Our method allows for efficient and realistic rendering of synthetic galaxy morphologies, where individual components such as the bulge, disk, stars, and dust can be synthesized in different wavelengths. These components follow empirically motivated overall intensity profiles but contain an additional procedural noise component that gives rise to complex natural patterns that mimic interstellar dust and star-forming regions. These patterns produce more realistic-looking galaxy images than using analytical expressions alone. The method is fully parallelized and creates accurate high- and low- resolution images thatmore » can be used, for example, in codes simulating strong and weak gravitational lensing. In addition to having a user-friendly graphical user interface, the C++ software package GAMER is easy to implement into an existing code.« less
THE VIOLATION OF THE TAYLOR HYPOTHESIS IN MEASUREMENTS OF SOLAR WIND TURBULENCE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klein, K. G.; Howes, G. G.; TenBarge, J. M.
2014-08-01
Motivated by the upcoming Solar Orbiter and Solar Probe Plus missions, qualitative and quantitative predictions are made for the effects of the violation of the Taylor hypothesis on the magnetic energy frequency spectrum measured in the near-Sun environment. The synthetic spacecraft data method is used to predict observational signatures of the violation for critically balanced Alfvénic turbulence or parallel fast/whistler turbulence. The violation of the Taylor hypothesis can occur in the slow flow regime, leading to a shift of the entire spectrum to higher frequencies, or in the dispersive regime, in which the dissipation range spectrum flattens at high frequencies.more » It is found that Alfvénic turbulence will not significantly violate the Taylor hypothesis, but whistler turbulence will. The flattening of the frequency spectrum is therefore a key observational signature for fast/whistler turbulence.« less
Fast reversible learning based on neurons functioning as anisotropic multiplex hubs
NASA Astrophysics Data System (ADS)
Vardi, Roni; Goldental, Amir; Sheinin, Anton; Sardi, Shira; Kanter, Ido
2017-05-01
Neural networks are composed of neurons and synapses, which are responsible for learning in a slow adaptive dynamical process. Here we experimentally show that neurons act like independent anisotropic multiplex hubs, which relay and mute incoming signals following their input directions. Theoretically, the observed information routing enriches the computational capabilities of neurons by allowing, for instance, equalization among different information routes in the network, as well as high-frequency transmission of complex time-dependent signals constructed via several parallel routes. In addition, this kind of hubs adaptively eliminate very noisy neurons from the dynamics of the network, preventing masking of information transmission. The timescales for these features are several seconds at most, as opposed to the imprint of information by the synaptic plasticity, a process which exceeds minutes. Results open the horizon to the understanding of fast and adaptive learning realities in higher cognitive brain's functionalities.
Introducing GAMER: A Fast and Accurate Method for Ray-tracing Galaxies Using Procedural Noise
NASA Astrophysics Data System (ADS)
Groeneboom, N. E.; Dahle, H.
2014-03-01
We developed a novel approach for fast and accurate ray-tracing of galaxies using procedural noise fields. Our method allows for efficient and realistic rendering of synthetic galaxy morphologies, where individual components such as the bulge, disk, stars, and dust can be synthesized in different wavelengths. These components follow empirically motivated overall intensity profiles but contain an additional procedural noise component that gives rise to complex natural patterns that mimic interstellar dust and star-forming regions. These patterns produce more realistic-looking galaxy images than using analytical expressions alone. The method is fully parallelized and creates accurate high- and low- resolution images that can be used, for example, in codes simulating strong and weak gravitational lensing. In addition to having a user-friendly graphical user interface, the C++ software package GAMER is easy to implement into an existing code.
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
Fast Particle Methods for Multiscale Phenomena Simulations
NASA Technical Reports Server (NTRS)
Koumoutsakos, P.; Wray, A.; Shariff, K.; Pohorille, Andrew
2000-01-01
We are developing particle methods oriented at improving computational modeling capabilities of multiscale physical phenomena in : (i) high Reynolds number unsteady vortical flows, (ii) particle laden and interfacial flows, (iii)molecular dynamics studies of nanoscale droplets and studies of the structure, functions, and evolution of the earliest living cell. The unifying computational approach involves particle methods implemented in parallel computer architectures. The inherent adaptivity, robustness and efficiency of particle methods makes them a multidisciplinary computational tool capable of bridging the gap of micro-scale and continuum flow simulations. Using efficient tree data structures, multipole expansion algorithms, and improved particle-grid interpolation, particle methods allow for simulations using millions of computational elements, making possible the resolution of a wide range of length and time scales of these important physical phenomena.The current challenges in these simulations are in : [i] the proper formulation of particle methods in the molecular and continuous level for the discretization of the governing equations [ii] the resolution of the wide range of time and length scales governing the phenomena under investigation. [iii] the minimization of numerical artifacts that may interfere with the physics of the systems under consideration. [iv] the parallelization of processes such as tree traversal and grid-particle interpolations We are conducting simulations using vortex methods, molecular dynamics and smooth particle hydrodynamics, exploiting their unifying concepts such as : the solution of the N-body problem in parallel computers, highly accurate particle-particle and grid-particle interpolations, parallel FFT's and the formulation of processes such as diffusion in the context of particle methods. This approach enables us to transcend among seemingly unrelated areas of research.
Competitive Genomic Screens of Barcoded Yeast Libraries
Urbanus, Malene; Proctor, Michael; Heisler, Lawrence E.; Giaever, Guri; Nislow, Corey
2011-01-01
By virtue of advances in next generation sequencing technologies, we have access to new genome sequences almost daily. The tempo of these advances is accelerating, promising greater depth and breadth. In light of these extraordinary advances, the need for fast, parallel methods to define gene function becomes ever more important. Collections of genome-wide deletion mutants in yeasts and E. coli have served as workhorses for functional characterization of gene function, but this approach is not scalable, current gene-deletion approaches require each of the thousands of genes that comprise a genome to be deleted and verified. Only after this work is complete can we pursue high-throughput phenotyping. Over the past decade, our laboratory has refined a portfolio of competitive, miniaturized, high-throughput genome-wide assays that can be performed in parallel. This parallelization is possible because of the inclusion of DNA 'tags', or 'barcodes,' into each mutant, with the barcode serving as a proxy for the mutation and one can measure the barcode abundance to assess mutant fitness. In this study, we seek to fill the gap between DNA sequence and barcoded mutant collections. To accomplish this we introduce a combined transposon disruption-barcoding approach that opens up parallel barcode assays to newly sequenced, but poorly characterized microbes. To illustrate this approach we present a new Candida albicans barcoded disruption collection and describe how both microarray-based and next generation sequencing-based platforms can be used to collect 10,000 - 1,000,000 gene-gene and drug-gene interactions in a single experiment. PMID:21860376
Fast prediction of RNA-RNA interaction using heuristic algorithm.
Montaseri, Soheila
2015-01-01
Interaction between two RNA molecules plays a crucial role in many medical and biological processes such as gene expression regulation. In this process, an RNA molecule prohibits the translation of another RNA molecule by establishing stable interactions with it. Some algorithms have been formed to predict the structure of the RNA-RNA interaction. High computational time is a common challenge in most of the presented algorithms. In this context, a heuristic method is introduced to accurately predict the interaction between two RNAs based on minimum free energy (MFE). This algorithm uses a few dot matrices for finding the secondary structure of each RNA and binding sites between two RNAs. Furthermore, a parallel version of this method is presented. We describe the algorithm's concurrency and parallelism for a multicore chip. The proposed algorithm has been performed on some datasets including CopA-CopT, R1inv-R2inv, Tar-Tar*, DIS-DIS, and IncRNA54-RepZ in Escherichia coli bacteria. The method has high validity and efficiency, and it is run in low computational time in comparison to other approaches.
A High Order, Locally-Adaptive Method for the Navier-Stokes Equations
NASA Astrophysics Data System (ADS)
Chan, Daniel
1998-11-01
I have extended the FOSLS method of Cai, Manteuffel and McCormick (1997) and implemented it within the framework of a spectral element formulation using the Legendre polynomial basis function. The FOSLS method solves the Navier-Stokes equations as a system of coupled first-order equations and provides the ellipticity that is needed for fast iterative matrix solvers like multigrid to operate efficiently. Each element is treated as an object and its properties are self-contained. Only C^0 continuity is imposed across element interfaces; this design allows local grid refinement and coarsening without the burden of having an elaborate data structure, since only information along element boundaries is needed. With the FORTRAN 90 programming environment, I can maintain a high computational efficiency by employing a hybrid parallel processing model. The OpenMP directives provides parallelism in the loop level which is executed in a shared-memory SMP and the MPI protocol allows the distribution of elements to a cluster of SMP's connected via a commodity network. This talk will provide timing results and a comparison with a second order finite difference method.
Highly parallel implementation of non-adiabatic Ehrenfest molecular dynamics
NASA Astrophysics Data System (ADS)
Kanai, Yosuke; Schleife, Andre; Draeger, Erik; Anisimov, Victor; Correa, Alfredo
2014-03-01
While the adiabatic Born-Oppenheimer approximation tremendously lowers computational effort, many questions in modern physics, chemistry, and materials science require an explicit description of coupled non-adiabatic electron-ion dynamics. Electronic stopping, i.e. the energy transfer of a fast projectile atom to the electronic system of the target material, is a notorious example. We recently implemented real-time time-dependent density functional theory based on the plane-wave pseudopotential formalism in the Qbox/qb@ll codes. We demonstrate that explicit integration using a fourth-order Runge-Kutta scheme is very suitable for modern highly parallelized supercomputers. Applying the new implementation to systems with hundreds of atoms and thousands of electrons, we achieved excellent performance and scalability on a large number of nodes both on the BlueGene based ``Sequoia'' system at LLNL as well as the Cray architecture of ``Blue Waters'' at NCSA. As an example, we discuss our work on computing the electronic stopping power of aluminum and gold for hydrogen projectiles, showing an excellent agreement with experiment. These first-principles calculations allow us to gain important insight into the the fundamental physics of electronic stopping.
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajbhandari, Samyam; Kim, Jinsung; Krishnamoorthy, Sriram
This paper describes the design and implementation of a layered domain-specific compiler to support MADNESS---Multiresolution ADaptive Numerical Environment for Scientific Simulation. MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions, using adaptive and fast harmonic analysis methods with guaranteed precision. MADNESS uses k-d trees to represent spatial functions and implements operators like addition, multiplication, differentiation, and integration on the numerical representation of functions. The MADNESS runtime system provides global namespace support and a task-based execution model including futures. MADNESS is currently deployed on massively parallel supercomputers and has enabled many science advances.more » Due to the highly irregular and statically unpredictable structure of the k-d trees representing the spatial functions encountered in MADNESS applications, only purely runtime approaches to optimization have previously been implemented in the MADNESS framework. This paper describes a layered domain-specific compiler developed to address some performance bottlenecks in MADNESS. The newly developed static compile-time optimizations, in conjunction with the MADNESS runtime support, enable significant performance improvement for the MADNESS framework.« less
Communication Studies of DMP and SMP Machines
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
A critical analysis of shock models for chondrule formation
NASA Astrophysics Data System (ADS)
Stammler, Sebastian M.; Dullemond, Cornelis P.
2014-11-01
In recent years many models of chondrule formation have been proposed. One of those models is the processing of dust in shock waves in protoplanetary disks. In this model, the dust and the chondrule precursors are overrun by shock waves, which heat them up by frictional heating and thermal exchange with the gas. In this paper we reanalyze the nebular shock model of chondrule formation and focus on the downstream boundary condition. We show that for large-scale plane-parallel chondrule-melting shocks the postshock equilibrium temperature is too high to avoid volatile loss. Even if we include radiative cooling in lateral directions out of the disk plane into our model (thereby breaking strict plane-parallel geometry) we find that for a realistic vertical extent of the solar nebula disk the temperature decline is not fast enough. On the other hand, if we assume that the shock is entirely optically thin so that particles can radiate freely, the cooling rates are too high to produce the observed chondrules textures. Global nebular shocks are therefore problematic as the primary sources of chondrules.
Influence of fast advective flows on pattern formation of Dictyostelium discoideum
Bae, Albert; Zykov, Vladimir; Bodenschatz, Eberhard
2018-01-01
We report experimental and numerical results on pattern formation of self-organizing Dictyostelium discoideum cells in a microfluidic setup under a constant buffer flow. The external flow advects the signaling molecule cyclic adenosine monophosphate (cAMP) downstream, while the chemotactic cells attached to the solid substrate are not transported with the flow. At high flow velocities, elongated cAMP waves are formed that cover the whole length of the channel and propagate both parallel and perpendicular to the flow direction. While the wave period and transverse propagation velocity are constant, parallel wave velocity and the wave width increase linearly with the imposed flow. We also observe that the acquired wave shape is highly dependent on the wave generation site and the strength of the imposed flow. We compared the wave shape and velocity with numerical simulations performed using a reaction-diffusion model and found excellent agreement. These results are expected to play an important role in understanding the process of pattern formation and aggregation of D. discoideum that may experience fluid flows in its natural habitat. PMID:29590179
Formation of laser-induced periodic surface structures on niobium by femtosecond laser irradiation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pan, A.; Dias, A.; Gomez-Aranzadi, M.
2014-05-07
The surface morphology of a Niobium sample, irradiated in air by a femtosecond laser with a wavelength of 800 nm and pulse duration of 100 fs, was examined. The period of the micro/nanostructures, parallel and perpendicularly oriented to the linearly polarized fs-laser beam, was studied by means of 2D Fast Fourier Transform analysis. The observed Laser-Induced Periodic Surface Structures (LIPSS) were classified as Low Spatial Frequency LIPSS (periods about 600 nm) and High Spatial Frequency LIPSS, showing a periodicity around 300 nm, both of them perpendicularly oriented to the polarization of the incident laser wave. Moreover, parallel high spatial frequency LIPSS were observedmore » with periods around 100 nm located at the peripheral areas of the laser fingerprint and overwritten on the perpendicular periodic gratings. The results indicate that this method of micro/nanostructuring allows controlling the Niobium grating period by the number of pulses applied, so the scan speed and not the fluence is the key parameter of control. A discussion on the mechanism of the surface topology evolution was also introduced.« less
NASA Astrophysics Data System (ADS)
von Zanthier, Christoph; Holl, Peter; Kemmer, Josef; Lechner, Peter; Maier, B.; Soltau, Heike; Stoetter, R.; Braeuninger, Heinrich W.; Dennerl, Konrad; Haberl, Frank; Hartmann, R.; Hartner, Gisela D.; Hippmann, H.; Kastelic, E.; Kink, W.; Krause, N.; Meidinger, Norbert; Metzner, G.; Pfeffermann, Elmar; Popp, M.; Reppin, Claus; Stoetter, Diana; Strueder, Lothar; Truemper, Joachim; Weber, U.; Carathanassis, D.; Engelhard, S.; Gebhart, Th.; Hauff, D.; Lutz, G.; Richter, R. H.; Seitz, H.; Solc, P.; Bihler, Edgar; Boettcher, H.; Kendziorra, Eckhard; Kraemer, J.; Pflueger, Bernhard; Staubert, Ruediger
1998-04-01
The concept and performance of the fully depleted pn- junction CCD system, developed for the European XMM- and the German ABRIXAS-satellite missions for soft x-ray imaging and spectroscopy in the 0.1 keV to 15 keV photon range, is presented. The 58 mm X 60 mm large pn-CCD array uses pn- junctions for registers and for the backside instead of MOS registers. This concept naturally allows to fully deplete the detector volume to make it an efficient detector to photons with energies up to 15 keV. For high detection efficiency in the soft x-ray region down to 100 eV, an ultrathin pn-CCD backside deadlayer has been realized. Each pn-CCD-channel is equipped with an on-chip JFET amplifier which, in combination with the CAMEX-amplifier and multiplexing chip, facilitates parallel readout with a pixel read rate of 3 MHz and an electronic noise floor of ENC < e-. With the complete parallel readout, very fast pn-CCD readout modi can be implemented in the system which allow for high resolution photon spectroscopy of even the brightest x-ray sources in the sky.
High performance cellular level agent-based simulation with FLAME for the GPU.
Richmond, Paul; Walker, Dawn; Coakley, Simon; Romano, Daniela
2010-05-01
Driven by the availability of experimental data and ability to simulate a biological scale which is of immediate interest, the cellular scale is fast emerging as an ideal candidate for middle-out modelling. As with 'bottom-up' simulation approaches, cellular level simulations demand a high degree of computational power, which in large-scale simulations can only be achieved through parallel computing. The flexible large-scale agent modelling environment (FLAME) is a template driven framework for agent-based modelling (ABM) on parallel architectures ideally suited to the simulation of cellular systems. It is available for both high performance computing clusters (www.flame.ac.uk) and GPU hardware (www.flamegpu.com) and uses a formal specification technique that acts as a universal modelling format. This not only creates an abstraction from the underlying hardware architectures, but avoids the steep learning curve associated with programming them. In benchmarking tests and simulations of advanced cellular systems, FLAME GPU has reported massive improvement in performance over more traditional ABM frameworks. This allows the time spent in the development and testing stages of modelling to be drastically reduced and creates the possibility of real-time visualisation for simple visual face-validation.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing.
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-08
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4_ speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration.
High speed civil transport: Sonic boom softening and aerodynamic optimization
NASA Technical Reports Server (NTRS)
Cheung, Samson
1994-01-01
An improvement in sonic boom extrapolation techniques has been the desire of aerospace designers for years. This is because the linear acoustic theory developed in the 60's is incapable of predicting the nonlinear phenomenon of shock wave propagation. On the other hand, CFD techniques are too computationally expensive to employ on sonic boom problems. Therefore, this research focused on the development of a fast and accurate sonic boom extrapolation method that solves the Euler equations for axisymmetric flow. This new technique has brought the sonic boom extrapolation techniques up to the standards of the 90's. Parallel computing is a fast growing subject in the field of computer science because of its promising speed. A new optimizer (IIOWA) for the parallel computing environment has been developed and tested for aerodynamic drag minimization. This is a promising method for CFD optimization making use of the computational resources of workstations, which unlike supercomputers can spend most of their time idle. Finally, the OAW concept is attractive because of its overall theoretical performance. In order to fully understand the concept, a wind-tunnel model was built and is currently being tested at NASA Ames Research Center. The CFD calculations performed under this cooperative agreement helped to identify the problem of the flow separation, and also aided the design by optimizing the wing deflection for roll trim.
Klix, Sabrina; Hezel, Fabian; Fuchs, Katharina; Ruff, Jan; Dieringer, Matthias A.; Niendorf, Thoralf
2014-01-01
Purpose Design, validation and application of an accelerated fast spin-echo (FSE) variant that uses a split-echo approach for self-calibrated parallel imaging. Methods For self-calibrated, split-echo FSE (SCSE-FSE), extra displacement gradients were incorporated into FSE to decompose odd and even echo groups which were independently phase encoded to derive coil sensitivity maps, and to generate undersampled data (reduction factor up to R = 3). Reference and undersampled data were acquired simultaneously. SENSE reconstruction was employed. Results The feasibility of SCSE-FSE was demonstrated in phantom studies. Point spread function performance of SCSE-FSE was found to be competitive with traditional FSE variants. The immunity of SCSE-FSE for motion induced mis-registration between reference and undersampled data was shown using a dynamic left ventricular model and cardiac imaging. The applicability of black blood prepared SCSE-FSE for cardiac imaging was demonstrated in healthy volunteers including accelerated multi-slice per breath-hold imaging and accelerated high spatial resolution imaging. Conclusion SCSE-FSE obviates the need of external reference scans for SENSE reconstructed parallel imaging with FSE. SCSE-FSE reduces the risk for mis-registration between reference scans and accelerated acquisitions. SCSE-FSE is feasible for imaging of the heart and of large cardiac vessels but also meets the needs of brain, abdominal and liver imaging. PMID:24728341
Low-Frequency Waves in Cold Three-Component Plasmas
NASA Astrophysics Data System (ADS)
Fu, Qiang; Tang, Ying; Zhao, Jinsong; Lu, Jianyong
2016-09-01
The dispersion relation and electromagnetic polarization of the plasma waves are comprehensively studied in cold electron, proton, and heavy charged particle plasmas. Three modes are classified as the fast, intermediate, and slow mode waves according to different phase velocities. When plasmas contain positively-charged particles, the fast and intermediate modes can interact at the small propagating angles, whereas the two modes are separate at the large propagating angles. The near-parallel intermediate and slow waves experience the linear polarization, circular polarization, and linear polarization again, with the increasing wave number. The wave number regime corresponding to the above circular polarization shrinks as the propagating angle increases. Moreover, the fast and intermediate modes cause the reverse change of the electromagnetic polarization at the special wave number. While the heavy particles carry the negative charges, the dispersion relations of the fast and intermediate modes are always separate, being independent of the propagating angles. Furthermore, this study gives new expressions of the three resonance frequencies corresponding to the highly-oblique propagation waves in the general three-component plasmas, and shows the dependence of the resonance frequencies on the propagating angle, the concentration of the heavy particle, and the mass ratio among different kinds of particles. supported by National Natural Science Foundation of China (Nos. 11303099, 41531071 and 41574158), and the Youth Innovation Promotion Association CAS
Zhou, Jian; Deng, Zixuan; Lu, Jingyi; Li, Hong; Zhang, Xiuzhen; Peng, Yongde; Mo, Yifei; Bao, Yuqian; Jia, Weiping
2015-04-01
Dyslipidemia is commonly seen in patients with type 2 diabetes mellitus (T2DM). The current study sought to compare the effects of nateglinide and acarbose, two antihyperglycemic agents, on both fasting and postprandial lipid profiles in Chinese subjects with T2DM. For this multicenter, open-label, randomized, active-controlled, parallel-group study, 103 antihyperglycemic agent-naive patients with T2DM were recruited from four hospitals in China. In total, 85 subjects (44 in the nateglinide group, 41 in the acarbose group) with a known complete lipid profile underwent the entire clinical trial and were included in the final analysis. Serum was collected in the fasting state and 30 and 120 min after a standardized meal (postprandial states) to measure the baseline lipid profiles; the same testing was performed upon completion of a 2-week course of nateglinide (120 mg three times a day) or acarbose (50 mg three times a day). Fasting triglyceride (TG) levels were significantly reduced by both nateglinide and acarbose (P<0.001), with acarbose providing a significantly more robust improvement (vs. nateglinide, P=0.005). Additionally, the TG levels at both postprandial times were significantly reduced by acarbose (P<0.001 at 30 min and P=0.002 at 120 min), whereas nateglinide treatment only significantly reduced the 30-min postprandial TG (P=0.029). Neither nateglinide nor acarbose treatment had significant impact on total cholesterol, high-density lipoprotein, low-density lipoprotein, or non-high-density lipoprotein cholesterol. Compared with nateglinide, acarbose has superior therapeutic efficacy for reducing fasting and postprandial TG levels in patients with T2DM.
Houser, Dorian S; Champagne, Cory D; Crocker, Daniel E
2013-11-01
Insulin resistance in modern society is perceived as a pathological consequence of excess energy consumption and reduced physical activity. Its presence in relation to the development of cardiovascular risk factors has been termed the metabolic syndrome, which produces increased mortality and morbidity and which is rapidly increasing in human populations. Ironically, insulin resistance likely evolved to assist animals during food shortages by increasing the availability of endogenous lipid for catabolism while protecting protein from use in gluconeogenesis and eventual oxidation. Some species that incorporate fasting as a predictable component of their life history demonstrate physiological traits similar to the metabolic syndrome during prolonged fasts. One such species is the northern elephant seal (Mirounga angustirostris), which fasts from food and water for periods of up to 4 months. During this time, ∼90% of the seals metabolic demands are met through fat oxidation and circulating non-esterified fatty acids are high (0.7-3.2 mM). All life history stages of elephant seal studied to date demonstrate insulin resistance and fasting hyperglycemia as well as variations in hormones and adipocytokines that reflect the metabolic syndrome to some degree. Elephant seals demonstrate some intriguing adaptations with the potential for medical advancement; for example, ketosis is negligible despite significant and prolonged fatty acid oxidation and investigation of this feature might provide insight into the treatment of diabetic ketoacidosis. The parallels to the metabolic syndrome are likely reflected to varying degrees in other marine mammals, most of which evolved on diets high in lipid and protein content but essentially devoid of carbohydrate. Utilization of these natural models of insulin resistance may further our understanding of the pathophysiology of the metabolic syndrome in humans and better assist the development of preventative measures and therapies.
An embedded multi-core parallel model for real-time stereo imaging
NASA Astrophysics Data System (ADS)
He, Wenjing; Hu, Jian; Niu, Jingyu; Li, Chuanrong; Liu, Guangyu
2018-04-01
The real-time processing based on embedded system will enhance the application capability of stereo imaging for LiDAR and hyperspectral sensor. The task partitioning and scheduling strategies for embedded multiprocessor system starts relatively late, compared with that for PC computer. In this paper, aimed at embedded multi-core processing platform, a parallel model for stereo imaging is studied and verified. After analyzing the computing amount, throughout capacity and buffering requirements, a two-stage pipeline parallel model based on message transmission is established. This model can be applied to fast stereo imaging for airborne sensors with various characteristics. To demonstrate the feasibility and effectiveness of the parallel model, a parallel software was designed using test flight data, based on the 8-core DSP processor TMS320C6678. The results indicate that the design performed well in workload distribution and had a speed-up ratio up to 6.4.
Experimental and numerical investigation of the Fast-SAGD process
NASA Astrophysics Data System (ADS)
Shin, Hyundon
The SAGD process has been tested in the field, and is now in a commercial stage in Western Canadian oil sands areas. The Fast-SAGD method can partly solve the drilling difficulty and reduce costs in a SAGD operation requiring paired parallel wells one above the other. This method also enhances the thermal efficiency in the reservoir. In this research, the reservoir parameters and operating conditions for the SAGD and Fast-SAGD processes are investigated by numerical simulation in the three Alberta oil sands areas. Scaled physical model experiments, which are operated by an automated process control system, are conducted under high temperature and high pressure conditions. The results of the study indicate that the shallow Athabasca-type reservoir, which is thick with high permeability (high kxh), is a good candidate for SAGD application, whereas Cold Lake- and Peace River-type reservoirs, which are thin with low permeability, are not as good candidates for conventional SAGD implementation. The simulation results indicate improved energy efficiency and productivity in most cases for the Fast-SAGD process; in those cases, the project economics were enhanced compared to the SAGD process. Both Cold Lake- and Peace River-type reservoirs are good candidates for a Fast-SAGD application rather than a conventional SAGD application. This new process demonstrates improved efficiency and lower costs for extracting heavy oil from these important reservoirs. A new economic indicator, called simple thermal efficiency parameter (STEP), was developed and validated to evaluate the performance of a SAGD project. STEP is based on cumulative steam-oil ratio (CSOR), calendar day oil rate (CDOR) and recovery factor (RF) for the time prior to the steam-oil ratio (SOR) attaining 4. STEP can be used as a financial metric quantitatively as well as qualitatively for this type of thermal project. An automated process control system was set-up and validated, and has the capability of controlling and handling steam injection processes like the steam-assisted gravity drainage process. The results of these preliminary experiments showed the overall cumulative oil production to be larger in the Fast-SAGD case, but end-point CSOR to be lower in the SAGD case. History matching results indicated that the steam quality was as low as 0.3 in the SAGD experiments, and even lower in the Fast-SAGD experiments after starting the CSS.
High-speed extended-term time-domain simulation for online cascading analysis of power system
NASA Astrophysics Data System (ADS)
Fu, Chuan
A high-speed extended-term (HSET) time domain simulator (TDS), intended to become a part of an energy management system (EMS), has been newly developed for use in online extended-term dynamic cascading analysis of power systems. HSET-TDS includes the following attributes for providing situational awareness of high-consequence events: (i) online analysis, including n-1 and n-k events, (ii) ability to simulate both fast and slow dynamics for 1-3 hours in advance, (iii) inclusion of rigorous protection-system modeling, (iv) intelligence for corrective action ID, storage, and fast retrieval, and (v) high-speed execution. Very fast on-line computational capability is the most desired attribute of this simulator. Based on the process of solving algebraic differential equations describing the dynamics of power system, HSET-TDS seeks to develop computational efficiency at each of the following hierarchical levels, (i) hardware, (ii) strategies, (iii) integration methods, (iv) nonlinear solvers, and (v) linear solver libraries. This thesis first describes the Hammer-Hollingsworth 4 (HH4) implicit integration method. Like the trapezoidal rule, HH4 is symmetrically A-Stable but it possesses greater high-order precision (h4 ) than the trapezoidal rule. Such precision enables larger integration steps and therefore improves simulation efficiency for variable step size implementations. This thesis provides the underlying theory on which we advocate use of HH4 over other numerical integration methods for power system time-domain simulation. Second, motivated by the need to perform high speed extended-term time domain simulation (HSET-TDS) for on-line purposes, this thesis presents principles for designing numerical solvers of differential algebraic systems associated with power system time-domain simulation, including DAE construction strategies (Direct Solution Method), integration methods(HH4), nonlinear solvers(Very Dishonest Newton), and linear solvers(SuperLU). We have implemented a design appropriate for HSET-TDS, and we compare it to various solvers, including the commercial grade PSSE program, with respect to computational efficiency and accuracy, using as examples the New England 39 bus system, the expanded 8775 bus system, and PJM 13029 buses system. Third, we have explored a stiffness-decoupling method, intended to be part of parallel design of time domain simulation software for super computers. The stiffness-decoupling method is able to combine the advantages of implicit methods (A-stability) and explicit method(less computation). With the new stiffness detection method proposed herein, the stiffness can be captured. The expanded 975 buses system is used to test simulation efficiency. Finally, several parallel strategies for super computer deployment to simulate power system dynamics are proposed and compared. Design A partitions the task via scale with the stiffness decoupling method, waveform relaxation, and parallel linear solver. Design B partitions the task via the time axis using a highly precise integration method, the Kuntzmann-Butcher Method - order 8 (KB8). The strategy of partitioning events is designed to partition the whole simulation via the time axis through a simulated sequence of cascading events. For all strategies proposed, a strategy of partitioning cascading events is recommended, since the sub-tasks for each processor are totally independent, and therefore minimum communication time is needed.
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.
Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio
2014-07-05
A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
Integrated modeling applications for tokamak experiments with OMFIT
NASA Astrophysics Data System (ADS)
Meneghini, O.; Smith, S. P.; Lao, L. L.; Izacard, O.; Ren, Q.; Park, J. M.; Candy, J.; Wang, Z.; Luna, C. J.; Izzo, V. A.; Grierson, B. A.; Snyder, P. B.; Holland, C.; Penna, J.; Lu, G.; Raum, P.; McCubbin, A.; Orlov, D. M.; Belli, E. A.; Ferraro, N. M.; Prater, R.; Osborne, T. H.; Turnbull, A. D.; Staebler, G. M.
2015-08-01
One modeling framework for integrated tasks (OMFIT) is a comprehensive integrated modeling framework which has been developed to enable physics codes to interact in complicated workflows, and support scientists at all stages of the modeling cycle. The OMFIT development follows a unique bottom-up approach, where the framework design and capabilities organically evolve to support progressive integration of the components that are required to accomplish physics goals of increasing complexity. OMFIT provides a workflow for easily generating full kinetic equilibrium reconstructions that are constrained by magnetic and motional Stark effect measurements, and kinetic profile information that includes fast-ion pressure modeled by a transport code. It was found that magnetic measurements can be used to quantify the amount of anomalous fast-ion diffusion that is present in DIII-D discharges, and provide an estimate that is consistent with what would be needed for transport simulations to match the measured neutron rates. OMFIT was used to streamline edge-stability analyses, and evaluate the effect of resonant magnetic perturbation (RMP) on the pedestal stability, which have been found to be consistent with the experimental observations. The development of a five-dimensional numerical fluid model for estimating the effects of the interaction between magnetohydrodynamic (MHD) and microturbulence, and its systematic verification against analytic models was also supported by the framework. OMFIT was used for optimizing an innovative high-harmonic fast wave system proposed for DIII-D. For a parallel refractive index {{n}\\parallel}>3 , the conditions for strong electron-Landau damping were found to be independent of launched {{n}\\parallel} and poloidal angle. OMFIT has been the platform of choice for developing a neural-network based approach to efficiently perform a non-linear multivariate regression of local transport fluxes as a function of local dimensionless parameters. Transport predictions for thousands of DIII-D discharges showed excellent agreement with the power balance calculations across the whole plasma radius and over a broad range of operating regimes. Concerning predictive transport simulations, the framework made possible the design and automation of a workflow that enables self-consistent predictions of kinetic profiles and the plasma equilibrium. It is found that the feedback between the transport fluxes and plasma equilibrium can significantly affect the kinetic profiles predictions. Such a rich set of results provide tangible evidence of how bottom-up approaches can potentially provide a fast track to integrated modeling solutions that are functional, cost-effective, and in sync with the research effort of the community.
Zhou, Lili; Clifford Chao, K S; Chang, Jenghwa
2012-11-01
Simulated projection images of digital phantoms constructed from CT scans have been widely used for clinical and research applications but their quality and computation speed are not optimal for real-time comparison with the radiography acquired with an x-ray source of different energies. In this paper, the authors performed polyenergetic forward projections using open computing language (OpenCL) in a parallel computing ecosystem consisting of CPU and general purpose graphics processing unit (GPGPU) for fast and realistic image formation. The proposed polyenergetic forward projection uses a lookup table containing the NIST published mass attenuation coefficients (μ∕ρ) for different tissue types and photon energies ranging from 1 keV to 20 MeV. The CT images of interested sites are first segmented into different tissue types based on the CT numbers and converted to a three-dimensional attenuation phantom by linking each voxel to the corresponding tissue type in the lookup table. The x-ray source can be a radioisotope or an x-ray generator with a known spectrum described as weight w(n) for energy bin E(n). The Siddon method is used to compute the x-ray transmission line integral for E(n) and the x-ray fluence is the weighted sum of the exponential of line integral for all energy bins with added Poisson noise. To validate this method, a digital head and neck phantom constructed from the CT scan of a Rando head phantom was segmented into three (air, gray∕white matter, and bone) regions for calculating the polyenergetic projection images for the Mohan 4 MV energy spectrum. To accelerate the calculation, the authors partitioned the workloads using the task parallelism and data parallelism and scheduled them in a parallel computing ecosystem consisting of CPU and GPGPU (NVIDIA Tesla C2050) using OpenCL only. The authors explored the task overlapping strategy and the sequential method for generating the first and subsequent DRRs. A dispatcher was designed to drive the high-degree parallelism of the task overlapping strategy. Numerical experiments were conducted to compare the performance of the OpenCL∕GPGPU-based implementation with the CPU-based implementation. The projection images were similar to typical portal images obtained with a 4 or 6 MV x-ray source. For a phantom size of 512 × 512 × 223, the time for calculating the line integrals for a 512 × 512 image panel was 16.2 ms on GPGPU for one energy bin in comparison to 8.83 s on CPU. The total computation time for generating one polyenergetic projection image of 512 × 512 was 0.3 s (141 s for CPU). The relative difference between the projection images obtained with the CPU-based and OpenCL∕GPGPU-based implementations was on the order of 10(-6) and was virtually indistinguishable. The task overlapping strategy was 5.84 and 1.16 times faster than the sequential method for the first and the subsequent digitally reconstruction radiographies, respectively. The authors have successfully built digital phantoms using anatomic CT images and NIST μ∕ρ tables for simulating realistic polyenergetic projection images and optimized the processing speed with parallel computing using GPGPU∕OpenCL-based implementation. The computation time was fast (0.3 s per projection image) enough for real-time IGRT (image-guided radiotherapy) applications.
Plasma Physics Calculations on a Parallel Macintosh Cluster
NASA Astrophysics Data System (ADS)
Decyk, Viktor; Dauger, Dean; Kokelaar, Pieter
2000-03-01
We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Plasma Physics Calculations on a Parallel Macintosh Cluster
NASA Astrophysics Data System (ADS)
Decyk, Viktor K.; Dauger, Dean E.; Kokelaar, Pieter R.
We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 Mflops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators
Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew
2014-01-01
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Data Acquisition with GPUs: The DAQ for the Muon $g$-$2$ Experiment at Fermilab
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gohn, W.
Graphical Processing Units (GPUs) have recently become a valuable computing tool for the acquisition of data at high rates and for a relatively low cost. The devices work by parallelizing the code into thousands of threads, each executing a simple process, such as identifying pulses from a waveform digitizer. The CUDA programming library can be used to effectively write code to parallelize such tasks on Nvidia GPUs, providing a significant upgrade in performance over CPU based acquisition systems. The muonmore » $g$-$2$ experiment at Fermilab is heavily relying on GPUs to process its data. The data acquisition system for this experiment must have the ability to create deadtime-free records from 700 $$\\mu$$s muon spills at a raw data rate 18 GB per second. Data will be collected using 1296 channels of $$\\mu$$TCA-based 800 MSPS, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording of the muon decays during the spill. The described data acquisition system is currently being constructed, and will be fully operational before the start of the experiment in 2017.« less
Depth-varying azimuthal anisotropy in the Tohoku subduction channel
NASA Astrophysics Data System (ADS)
Liu, Xin; Zhao, Dapeng
2017-09-01
We determine a detailed 3-D model of azimuthal anisotropy tomography of the Tohoku subduction zone from the Japan Trench outer-rise to the back-arc near the Japan Sea coast, using a large number of high-quality P and S wave arrival-time data of local earthquakes recorded by the dense seismic network on the Japan Islands. Depth-varying seismic azimuthal anisotropy is revealed in the Tohoku subduction channel. The shallow portion of the Tohoku megathrust zone (<30 km depth) generally exhibits trench-normal fast-velocity directions (FVDs) except for the source area of the 2011 Tohoku-oki earthquake (Mw 9.0) where the FVD is nearly trench-parallel, whereas the deeper portion of the megathrust zone (at depths of ∼30-50 km) mainly exhibits trench-parallel FVDs. Trench-normal FVDs are revealed in the mantle wedge beneath the volcanic front and the back-arc. The Pacific plate mainly exhibits trench-parallel FVDs, except for the top portion of the subducting Pacific slab where visible trench-normal FVDs are revealed. A qualitative tectonic model is proposed to interpret such anisotropic features, suggesting transposition of earlier fabrics in the oceanic lithosphere into subduction-induced new structures in the subduction channel.
Advances in locally constrained k-space-based parallel MRI.
Samsonov, Alexey A; Block, Walter F; Arunachalam, Arjun; Field, Aaron S
2006-02-01
In this article, several theoretical and methodological developments regarding k-space-based, locally constrained parallel MRI (pMRI) reconstruction are presented. A connection between Parallel MRI with Adaptive Radius in k-Space (PARS) and GRAPPA methods is demonstrated. The analysis provides a basis for unified treatment of both methods. Additionally, a weighted PARS reconstruction is proposed, which may absorb different weighting strategies for improved image reconstruction. Next, a fast and efficient method for pMRI reconstruction of data sampled on non-Cartesian trajectories is described. In the new technique, the computational burden associated with the numerous matrix inversions in the original PARS method is drastically reduced by limiting direct calculation of reconstruction coefficients to only a few reference points. The rest of the coefficients are found by interpolating between the reference sets, which is possible due to the similar configuration of points participating in reconstruction for highly symmetric trajectories, such as radial and spirals. As a result, the time requirements are drastically reduced, which makes it practical to use pMRI with non-Cartesian trajectories in many applications. The new technique was demonstrated with simulated and actual data sampled on radial trajectories. Copyright 2006 Wiley-Liss, Inc.
High-contrast imaging in the cloud with klipReduce and Findr
NASA Astrophysics Data System (ADS)
Haug-Baltzell, Asher; Males, Jared R.; Morzinski, Katie M.; Wu, Ya-Lin; Merchant, Nirav; Lyons, Eric; Close, Laird M.
2016-08-01
Astronomical data sets are growing ever larger, and the area of high contrast imaging of exoplanets is no exception. With the advent of fast, low-noise detectors operating at 10 to 1000 Hz, huge numbers of images can be taken during a single hours-long observation. High frame rates offer several advantages, such as improved registration, frame selection, and improved speckle calibration. However, advanced image processing algorithms are computationally challenging to apply. Here we describe a parallelized, cloud-based data reduction system developed for the Magellan Adaptive Optics VisAO camera, which is capable of rapidly exploring tens of thousands of parameter sets affecting the Karhunen-Loève image processing (KLIP) algorithm to produce high-quality direct images of exoplanets. We demonstrate these capabilities with a visible wavelength high contrast data set of a hydrogen-accreting brown dwarf companion.
Rectifier cabinet static breaker
Costantino, Jr, Roger A.; Gliebe, Ronald J.
1992-09-01
A rectifier cabinet static breaker replaces a blocking diode pair with an SCR and the installation of a power transistor in parallel with the latch contactor to commutate the SCR to the off state. The SCR serves as a static breaker with fast turnoff capability providing an alternative way of achieving reactor scram in addition to performing the function of the replaced blocking diodes. The control circuitry for the rectifier cabinet static breaker includes on-line test capability and an LED indicator light to denote successful test completion. Current limit circuitry provides high-speed protection in the event of overload.
NASA Astrophysics Data System (ADS)
Masuda, Nobuyuki; Sugie, Takashige; Ito, Tomoyoshi; Tanaka, Shinjiro; Hamada, Yu; Satake, Shin-ichi; Kunugi, Tomoaki; Sato, Kazuho
2010-12-01
We have designed a PC cluster system with special purpose computer boards for visualization of fluid flow using digital holographic particle tracking velocimetry (DHPTV). In this board, there is a Field Programmable Gate Array (FPGA) chip in which is installed a pipeline for calculating the intensity of an object from a hologram by fast Fourier transform (FFT). This cluster system can create 1024 reconstructed images from a 1024×1024-grid hologram in 0.77 s. It is expected that this system will contribute to the analysis of fluid flow using DHPTV.
The sun and heliosphere at solar maximum
NASA Technical Reports Server (NTRS)
Smith, E. J.; Marsden, R. G.; Balogh, A.; Gloeckler, G.; Geiss, J.; McComas, D. J.; McKibben, R. B.; MacDowall, R. J.; Lanzerotti, L. J.; Krupp, N.;
2003-01-01
Recent Ulysses observations from the Sun's equator to the poles reveal fundamental properties of the three-dimensional heliosphere at the maximum in solar activity. The heliospheric magnetic field originates from a magnetic dipole oriented nearly perpendicular to, instead of nearly parallel to, the Sun'rotation axis. Magnetic fields, solar wind, and energetic charged particles from low-latitude sources reach all latitudes, including the polar caps. The very fast high-latitude wind and polar coronal holes disappear and reappear together. Solar wind speed continues to be inversely correlated with coronal temperature. The cosmic ray flux is reduced symmetrically at all latitudes.
AOM optimization with ultra stable high power CO2 lasers for fast laser engraving
NASA Astrophysics Data System (ADS)
Bohrer, Markus
2015-05-01
A new ultra stable CO2 laser in carbon fibre resonator technology with an average power of more than 600W has been developed especially as basis for the use with AOMs. Stability of linear polarisation and beam pointing stability are important issues as well as appropriate shaping of the incident beam. AOMs are tested close to the laser-induced damage threshold with pulses on demand close to one megahertz. Transversal and rotational optimization of the AOMs benefits from the parallel-kinematic principle of a hexapod used for this research.
Portion sizes and obesity: responses of fast-food companies.
Young, Lisa R; Nestle, Marion
2007-07-01
Because the sizes of food portions, especially of fast food, have increased in parallel with rising rates of overweight, health authorities have called on fast-food chains to decrease the sizes of menu items. From 2002 to 2006, we examined responses of fast-food chains to such calls by determining the current sizes of sodas, French fries, and hamburgers at three leading chains and comparing them to sizes observed in 1998 and 2002. Although McDonald's recently phased out its largest offerings, current items are similar to 1998 sizes and greatly exceed those offered when the company opened in 1955. Burger King and Wendy's have increased portion sizes, even while health authorities are calling for portion size reductions. Fast-food portions in the United States are larger than in Europe. These observations suggest that voluntary efforts by fast-food companies to reduce portion sizes are unlikely to be effective, and that policy approaches are needed to reduce energy intake from fast food.
Fast word reading in pure alexia: "fast, yet serial".
Bormann, Tobias; Wolfer, Sascha; Hachmann, Wibke; Neubauer, Claudia; Konieczny, Lars
2015-01-01
Pure alexia is a severe impairment of word reading in which individuals process letters serially with a pronounced length effect. Yet, there is considerable variation in the performance of alexic readers with generally very slow, but also occasionally fast responses, an observation addressed rarely in previous reports. It has been suggested that "fast" responses in pure alexia reflect residual parallel letter processing or that they may even be subserved by an independent reading system. Four experiments assessed fast and slow reading in a participant (DN) with pure alexia. Two behavioral experiments investigated frequency, neighborhood, and length effects in forced fast reading. Two further experiments measured eye movements when DN was forced to read quickly, or could respond faster because words were easier to process. Taken together, there was little support for the proposal that "qualitatively different" mechanisms or reading strategies underlie both types of responses in DN. Instead, fast responses are argued to be generated by the same serial-reading strategy.
NASA Astrophysics Data System (ADS)
Kunz, A.; Pihet, P.; Arend, E.; Menzel, H. G.
1990-12-01
A portable area monitor for the measurement of dose-equivalent quantities in practical radiation-protection work has been developed. The detector applied is a low-pressure proportional counter (TEPC) used in microdosimetry. The complex analysis system required has been optimized with regard to low power consumption and small size to achieve a real operational survey meter. The newly designed electronic includes complete analog, digital and microprocessor boards. It presents the characteristic of fast pulse-height processing over a large (5 decades) dynamic range. Three original circuits have been specifically developed, consisting of: (1) a miniaturized adjustable high-voltage power supply with low ripple and high stability; (2) a double spectroscopy amplifier with constant gain ratio and common pole-zero stage; and (3) an analog-to-digital converter with quasi-logarithmic characteristics based on a flash converter using fast comparators associated in parallel. With the incorporated single-board computer, the maximal total power consumption is 5 W, enabling 40 hours operation time with batteries. With minor adaptations the equipment is proposed as a low-cost solution for various measuring problems in environmental studies.
Fast reconstruction of optical properties for complex segmentations in near infrared imaging
NASA Astrophysics Data System (ADS)
Jiang, Jingjing; Wolf, Martin; Sánchez Majos, Salvador
2017-04-01
The intrinsic ill-posed nature of the inverse problem in near infrared imaging makes the reconstruction of fine details of objects deeply embedded in turbid media challenging even for the large amounts of data provided by time-resolved cameras. In addition, most reconstruction algorithms for this type of measurements are only suitable for highly symmetric geometries and rely on a linear approximation to the diffusion equation since a numerical solution of the fully non-linear problem is computationally too expensive. In this paper, we will show that a problem of practical interest can be successfully addressed making efficient use of the totality of the information supplied by time-resolved cameras. We set aside the goal of achieving high spatial resolution for deep structures and focus on the reconstruction of complex arrangements of large regions. We show numerical results based on a combined approach of wavelength-normalized data and prior geometrical information, defining a fully parallelizable problem in arbitrary geometries for time-resolved measurements. Fast reconstructions are obtained using a diffusion approximation and Monte-Carlo simulations, parallelized in a multicore computer and a GPU respectively.
NASA Astrophysics Data System (ADS)
Shen, Wei; Li, Dongsheng; Zhang, Shuaifang; Ou, Jinping
2017-07-01
This paper presents a hybrid method that combines the B-spline wavelet on the interval (BSWI) finite element method and spectral analysis based on fast Fourier transform (FFT) to study wave propagation in One-Dimensional (1D) structures. BSWI scaling functions are utilized to approximate the theoretical wave solution in the spatial domain and construct a high-accuracy dynamic stiffness matrix. Dynamic reduction on element level is applied to eliminate the interior degrees of freedom of BSWI elements and substantially reduce the size of the system matrix. The dynamic equations of the system are then transformed and solved in the frequency domain through FFT-based spectral analysis which is especially suitable for parallel computation. A comparative analysis of four different finite element methods is conducted to demonstrate the validity and efficiency of the proposed method when utilized in high-frequency wave problems. Other numerical examples are utilized to simulate the influence of crack and delamination on wave propagation in 1D rods and beams. Finally, the errors caused by FFT and their corresponding solutions are presented.
Computer Science Techniques Applied to Parallel Atomistic Simulation
NASA Astrophysics Data System (ADS)
Nakano, Aiichiro
1998-03-01
Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
NASA Astrophysics Data System (ADS)
Dong, Dai; Li, Xiaoning
2015-03-01
High-pressure solenoid valve with high flow rate and high speed is a key component in an underwater driving system. However, traditional single spool pilot operated valve cannot meet the demands of both high flow rate and high speed simultaneously. A new structure for a high pressure solenoid valve is needed to meet the demand of the underwater driving system. A novel parallel-spool pilot operated high-pressure solenoid valve is proposed to overcome the drawback of the current single spool design. Mathematical models of the opening process and flow rate of the valve are established. Opening response time of the valve is subdivided into 4 parts to analyze the properties of the opening response. Corresponding formulas to solve 4 parts of the response time are derived. Key factors that influence the opening response time are analyzed. According to the mathematical model of the valve, a simulation of the opening process is carried out by MATLAB. Parameters are chosen based on theoretical analysis to design the test prototype of the new type of valve. Opening response time of the designed valve is tested by verifying response of the current in the coil and displacement of the main valve spool. The experimental results are in agreement with the simulated results, therefore the validity of the theoretical analysis is verified. Experimental opening response time of the valve is 48.3 ms at working pressure of 10 MPa. The flow capacity test shows that the largest effective area is 126 mm2 and the largest air flow rate is 2320 L/s. According to the result of the load driving test, the valve can meet the demands of the driving system. The proposed valve with parallel spools provides a new method for the design of a high-pressure valve with fast response and large flow rate.
Exploration of high harmonic fast wave heating on the National Spherical Torus Experiment
NASA Astrophysics Data System (ADS)
Wilson, J. R.; Bell, R. E.; Bernabei, S.; Bitter, M.; Bonoli, P.; Gates, D.; Hosea, J.; LeBlanc, B.; Mau, T. K.; Medley, S.; Menard, J.; Mueller, D.; Ono, M.; Phillips, C. K.; Pinsker, R. I.; Raman, R.; Rosenberg, A.; Ryan, P.; Sabbagh, S.; Stutman, D.; Swain, D.; Takase, Y.; Wilgen, J.
2003-05-01
High harmonic fast wave (HHFW) heating has been proposed as a particularly attractive means for plasma heating and current drive in the high beta plasmas that are achievable in spherical torus (ST) devices. The National Spherical Torus Experiment (NSTX) [M. Ono, S. M. Kaye, S. Neumeyer et al., in Proceedings of the 18th IEEE/NPSS Symposium on Fusion Engineering, Albuquerque, 1999 (IEEE, Piscataway, NJ, 1999), p. 53] is such a device. An rf heating system has been installed on the NSTX to explore the physics of HHFW heating, current drive via rf waves and for use as a tool to demonstrate the attractiveness of the ST concept as a fusion device. To date, experiments have demonstrated many of the theoretical predictions for HHFW. In particular, strong wave absorption on electrons over a wide range of plasma parameters and wave parallel phase velocities, wave acceleration of energetic ions, and indications of current drive for directed wave spectra have been observed. In addition HHFW heating has been used to explore the energy transport properties of NSTX plasmas, to create H-mode discharges with a large fraction of bootstrap current and to control the plasma current profile during the early stages of the discharge.
Hardware-efficient implementation of digital FIR filter using fast first-order moment algorithm
NASA Astrophysics Data System (ADS)
Cao, Li; Liu, Jianguo; Xiong, Jun; Zhang, Jing
2018-03-01
As the digital finite impulse response (FIR) filter can be transformed into the shift-add form of multiple small-sized firstorder moments, based on the existing fast first-order moment algorithm, this paper presents a novel multiplier-less structure to calculate any number of sequential filtering results in parallel. The theoretical analysis on its hardware and time-complexities reveals that by appropriately setting the degree of parallelism and the decomposition factor of a fixed word width, the proposed structure may achieve better area-time efficiency than the existing two-dimensional (2-D) memoryless-based filter. To evaluate the performance concretely, the proposed designs for different taps along with the existing 2-D memoryless-based filters, are synthesized by Synopsys Design Compiler with 0.18-μm SMIC library. The comparisons show that the proposed design has less area-time complexity and power consumption when the number of filter taps is larger than 48.
A Parallel Multigrid Solver for Viscous Flows on Anisotropic Structured Grids
NASA Technical Reports Server (NTRS)
Prieto, Manuel; Montero, Ruben S.; Llorente, Ignacio M.; Bushnell, Dennis M. (Technical Monitor)
2001-01-01
This paper presents an efficient parallel multigrid solver for speeding up the computation of a 3-D model that treats the flow of a viscous fluid over a flat plate. The main interest of this simulation lies in exhibiting some basic difficulties that prevent optimal multigrid efficiencies from being achieved. As the computing platform, we have used Coral, a Beowulf-class system based on Intel Pentium processors and equipped with GigaNet cLAN and switched Fast Ethernet networks. Our study not only examines the scalability of the solver but also includes a performance evaluation of Coral where the investigated solver has been used to compare several of its design choices, namely, the interconnection network (GigaNet versus switched Fast-Ethernet) and the node configuration (dual nodes versus single nodes). As a reference, the performance results have been compared with those obtained with the NAS-MG benchmark.
NASA Technical Reports Server (NTRS)
Mccormick, S.; Quinlan, D.
1989-01-01
The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids (global and local) to provide adaptive resolution and fast solution of PDEs. Like all such methods, it offers parallelism by using possibly many disconnected patches per level, but is hindered by the need to handle these levels sequentially. The finest levels must therefore wait for processing to be essentially completed on all the coarser ones. A recently developed asynchronous version of FAC, called AFAC, completely eliminates this bottleneck to parallelism. This paper describes timing results for AFAC, coupled with a simple load balancing scheme, applied to the solution of elliptic PDEs on an Intel iPSC hypercube. These tests include performance of certain processes necessary in adaptive methods, including moving grids and changing refinement. A companion paper reports on numerical and analytical results for estimating convergence factors of AFAC applied to very large scale examples.
NASA Technical Reports Server (NTRS)
Wigton, Larry
1996-01-01
Improving the numerical linear algebra routines for use in new Navier-Stokes codes, specifically Tim Barth's unstructured grid code, with spin-offs to TRANAIR is reported. A fast distance calculation routine for Navier-Stokes codes using the new one-equation turbulence models is written. The primary focus of this work was devoted to improving matrix-iterative methods. New algorithms have been developed which activate the full potential of classical Cray-class computers as well as distributed-memory parallel computers.
Research on the phase adjustment method for dispersion interferometer on HL-2A tokamak
NASA Astrophysics Data System (ADS)
Tongyu, WU; Wei, ZHANG; Haoxi, WANG; Yan, ZHOU; Zejie, YIN
2018-06-01
A synchronous demodulation system is proposed and deployed for CO2 dispersion interferometer on HL-2A, which aims at high plasma density measurements and real-time feedback control. In order to make sure that the demodulator and the interferometer signal are synchronous in phase, a phase adjustment (PA) method has been developed for the demodulation system. The method takes advantages of the field programmable gate array parallel and pipeline process capabilities to carry out high performance and low latency PA. Some experimental results presented show that the PA method is crucial to the synchronous demodulation system and reliable to follow the fast change of the electron density. The system can measure the line-integrated density with a high precision of 2.0 × 1018 m‑2.
Fast realization of nonrecursive digital filters with limits on signal delay
NASA Astrophysics Data System (ADS)
Titov, M. A.; Bondarenko, N. N.
1983-07-01
Attention is given to the problem of achieving a fast realization of nonrecursive digital filters with the aim of reducing signal delay. It is shown that a realization wherein the impulse characteristic of the filter is divided into blocks satisfies the delay requirements and is almost as economical in terms of the number of multiplications as conventional fast convolution. In addition, the block method leads to a reduction in the needed size of the memory and in the number of additions; the short-convolution procedure is substantially simplified. Finally, the block method facilitates the paralleling of computations owing to the simple transfers between subfilters.
NASA Astrophysics Data System (ADS)
Kiyani, Khurom; Chapman, Sandra; Osman, Kareem; Sahraoui, Fouad; Hnat, Bogdan
2014-05-01
The anisotropic nature of the scaling properties of solar wind magnetic turbulence fluctuations is investigated scale by scale using high cadence in situ magnetic field measurements from the Cluster, ACE and STEREO spacecraft missions in both fast and slow quiet solar wind conditions. The data span five decades in scales from the inertial range to the electron Larmor radius. We find a clear transition in scaling behaviour between the inertial and kinetic range of scales, which provides a direct, quantitative constraint on the physical processes that mediate the cascade of energy through these scales. In the inertial (magnetohydrodynamic) range the statistical nature of turbulent fluctuations are known to be anisotropic, both in the vector components of the magnetic field fluctuations (variance anisotropy) and in the spatial scales of these fluctuations (wavevector or k-anisotropy). We show for the first time that, when measuring parallel to the local magnetic field direction, the full statistical signature of the magnetic and Elsasser field fluctuations is that of a non-Gaussian globally scale-invariant process. This is distinct from the classic multi-exponent statistics observed when the local magnetic field is perpendicular to the flow direction. These observations suggest the weakness, or absence, of a parallel magnetofluid turbulence energy cascade. In contrast to the inertial range, there is a successive increase toward isotropy between parallel and transverse power at scales below the ion Larmor radius, with isotropy being achieved at the electron Larmor radius. Computing higher-order statistics, we show that the full statistical signature of both parallel, and perpendicular fluctuations at scales below the ion Larmor radius are that of an isotropic globally scale-invariant non-Gaussian process. Lastly, we perform a survey of multiple intervals of quiet solar wind sampled under different plasma conditions (fast, slow wind; plasma beta etc.) and find that the above results on the scaling transition between inertial and kinetic range scales are qualitatively robust, and that quantitatively, there is a spread in the values of the scaling exponents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Jiajia; Wang, Yuming; McIntosh, Scott W.
We combine observations of the Coronal Multi-channel Polarimeter and the Atmospheric Imaging Assembly on board the Solar Dynamics Observatory to study the characteristic properties of (propagating) Alfvénic motions and quasi-periodic intensity disturbances in polar plumes. This unique combination of instruments highlights the physical richness of the processes taking place at the base of the (fast) solar wind. The (parallel) intensity perturbations with intensity enhancements around 1% have an apparent speed of 120 km s{sup −1} (in both the 171 and 193 Å passbands) and a periodicity of 15 minutes, while the (perpendicular) Alfvénic wave motions have a velocity amplitude ofmore » 0.5 km s{sup −1}, a phase speed of 830 km s{sup −1}, and a shorter period of 5 minutes on the same structures. These observations illustrate a scenario where the excited Alfvénic motions are propagating along an inhomogeneously loaded magnetic field structure such that the combination could be a potential progenitor of the magnetohydrodynamic turbulence required to accelerate the fast solar wind.« less
The Five-Hundred Aperture Spherical Radio Telescope (fast) Project
NASA Astrophysics Data System (ADS)
Nan, Rendong; Li, Di; Jin, Chengjin; Wang, Qiming; Zhu, Lichun; Zhu, Wenbai; Zhang, Haiyan; Yue, Youling; Qian, Lei
Five-hundred-meter Aperture Spherical radio Telescope (FAST) is a Chinese mega-science project to build the largest single dish radio telescope in the world. Its innovative engineering concept and design pave a new road to realize a huge single dish in the most effective way. FAST also represents Chinese contribution in the international efforts to build the square kilometer array (SKA). Being the most sensitive single dish radio telescope, FAST will enable astronomers to jump-start many science goals, such as surveying the neutral hydrogen in the Milky Way and other galaxies, detecting faint pulsars, looking for the first shining stars, hearing the possible signals from other civilizations, etc. The idea of sitting a large spherical dish in a karst depression is rooted in Arecibo telescope. FAST is an Arecibo-type antenna with three outstanding aspects: the karst depression used as the site, which is large to host the 500-meter telescope and deep to allow a zenith angle of 40 degrees; the active main reflector correcting for spherical aberration on the ground to achieve a full polarization and a wide band without involving complex feed systems; and the light-weight feed cabin driven by cables and servomechanism plus a parallel robot as a secondary adjustable system to move with high precision. The feasibility studies for FAST have been carried out for 14 years, supported by Chinese and world astronomical communities. Funding for FAST has been approved by the National Development and Reform Commission in July of 2007 with a capital budget ~ 700 million RMB. The project time is 5.5 years from the commencement of work in March of 2011 and the first light is expected to be in 2016. This review intends to introduce the project of FAST with emphasis on the recent progress since 2006. In this paper, the subsystems of FAST are described in modest details followed by discussions of the fundamental science goals and examples of early science projects.
NASA Astrophysics Data System (ADS)
Homuth, B.; Löbl, U.; Batte, A. G.; Link, K.; Kasereka, C. M.; Rümpker, G.
2016-09-01
Shear-wave splitting measurements from local and teleseismic earthquakes are used to investigate the seismic anisotropy in the upper mantle beneath the Rwenzori region of the East African Rift system. At most stations, shear-wave splitting parameters obtained from individual earthquakes exhibit only minor variations with backazimuth. We therefore employ a joint inversion of SKS waveforms to derive hypothetical one-layer parameters. The corresponding fast polarizations are generally rift parallel and the average delay time is about 1 s. Shear phases from local events within the crust are characterized by an average delay time of 0.04 s. Delay times from local mantle earthquakes are in the range of 0.2 s. This observation suggests that the dominant source region for seismic anisotropy beneath the rift is located within the mantle. We use finite-frequency waveform modeling to test different models of anisotropy within the lithosphere/asthenosphere system of the rift. The results show that the rift-parallel fast polarizations are consistent with horizontal transverse isotropy (HTI anisotropy) caused by rift-parallel magmatic intrusions or lenses located within the lithospheric mantle—as it would be expected during the early stages of continental rifting. Furthermore, the short-scale spatial variations in the fast polarizations observed in the southern part of the study area can be explained by effects due to sedimentary basins of low isotropic velocity in combination with a shift in the orientation of anisotropic fabrics in the upper mantle. A uniform anisotropic layer in relation to large-scale asthenospheric mantle flow is less consistent with the observed splitting parameters.
National Laboratory for Advanced Scientific Visualization at UNAM - Mexico
NASA Astrophysics Data System (ADS)
Manea, Marina; Constantin Manea, Vlad; Varela, Alfredo
2016-04-01
In 2015, the National Autonomous University of Mexico (UNAM) joined the family of Universities and Research Centers where advanced visualization and computing plays a key role to promote and advance missions in research, education, community outreach, as well as business-oriented consulting. This initiative provides access to a great variety of advanced hardware and software resources and offers a range of consulting services that spans a variety of areas related to scientific visualization, among which are: neuroanatomy, embryonic development, genome related studies, geosciences, geography, physics and mathematics related disciplines. The National Laboratory for Advanced Scientific Visualization delivers services through three main infrastructure environments: the 3D fully immersive display system Cave, the high resolution parallel visualization system Powerwall, the high resolution spherical displays Earth Simulator. The entire visualization infrastructure is interconnected to a high-performance-computing-cluster (HPCC) called ADA in honor to Ada Lovelace, considered to be the first computer programmer. The Cave is an extra large 3.6m wide room with projected images on the front, left and right, as well as floor walls. Specialized crystal eyes LCD-shutter glasses provide a strong stereo depth perception, and a variety of tracking devices allow software to track the position of a user's hand, head and wand. The Powerwall is designed to bring large amounts of complex data together through parallel computing for team interaction and collaboration. This system is composed by 24 (6x4) high-resolution ultra-thin (2 mm) bezel monitors connected to a high-performance GPU cluster. The Earth Simulator is a large (60") high-resolution spherical display used for global-scale data visualization like geophysical, meteorological, climate and ecology data. The HPCC-ADA, is a 1000+ computing core system, which offers parallel computing resources to applications that requires large quantity of memory as well as large and fast parallel storage systems. The entire system temperature is controlled by an energy and space efficient cooling solution, based on large rear door liquid cooled heat exchangers. This state-of-the-art infrastructure will boost research activities in the region, offer a powerful scientific tool for teaching at undergraduate and graduate levels, and enhance association and cooperation with business-oriented organizations.
A fast ultrasonic simulation tool based on massively parallel implementations
NASA Astrophysics Data System (ADS)
Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain
2014-02-01
This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.
Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Tong, Charles; Swarztrauber, Paul N.
1991-01-01
The present evaluation of alternative, massively parallel hypercube processor-applicable designs for ordered radix-2 decimation-in-frequency FFT algorithms gives attention to the reduction of computation time-dominating communication. A combination of the order and computational phases of the FFT is accordingly employed, in conjunction with sequence-to-processor maps which reduce communication. Two orderings, 'standard' and 'cyclic', in which the order of the transform is the same as that of the input sequence, can be implemented with ease on the Connection Machine (where orderings are determined by geometries and priorities. A parallel method for trigonometric coefficient computation is presented which does not employ trigonometric functions or interprocessor communication.
Parallel ICA and its hardware implementation in hyperspectral image analysis
NASA Astrophysics Data System (ADS)
Du, Hongtao; Qi, Hairong; Peterson, Gregory D.
2004-04-01
Advances in hyperspectral images have dramatically boosted remote sensing applications by providing abundant information using hundreds of contiguous spectral bands. However, the high volume of information also results in excessive computation burden. Since most materials have specific characteristics only at certain bands, a lot of these information is redundant. This property of hyperspectral images has motivated many researchers to study various dimensionality reduction algorithms, including Projection Pursuit (PP), Principal Component Analysis (PCA), wavelet transform, and Independent Component Analysis (ICA), where ICA is one of the most popular techniques. It searches for a linear or nonlinear transformation which minimizes the statistical dependence between spectral bands. Through this process, ICA can eliminate superfluous but retain practical information given only the observations of hyperspectral images. One hurdle of applying ICA in hyperspectral image (HSI) analysis, however, is its long computation time, especially for high volume hyperspectral data sets. Even the most efficient method, FastICA, is a very time-consuming process. In this paper, we present a parallel ICA (pICA) algorithm derived from FastICA. During the unmixing process, pICA divides the estimation of weight matrix into sub-processes which can be conducted in parallel on multiple processors. The decorrelation process is decomposed into the internal decorrelation and the external decorrelation, which perform weight vector decorrelations within individual processors and between cooperative processors, respectively. In order to further improve the performance of pICA, we seek hardware solutions in the implementation of pICA. Until now, there are very few hardware designs for ICA-related processes due to the complicated and iterant computation. This paper discusses capacity limitation of FPGA implementations for pICA in HSI analysis. A synthesis of Application-Specific Integrated Circuit (ASIC) is designed for pICA-based dimensionality reduction in HSI analysis. The pICA design is implemented using standard-height cells and aimed at TSMC 0.18 micron process. During the synthesis procedure, three ICA-related reconfigurable components are developed for the reuse and retargeting purpose. Preliminary results show that the standard-height cell based ASIC synthesis provide an effective solution for pICA and ICA-related processes in HSI analysis.
Molecular Electronic Devices Based On Electrooptical Behavior Of Heme-Like Molecules
NASA Astrophysics Data System (ADS)
Simic-Glavaski, B.
1986-02-01
This paper discusses application of the electrically modulated and unusually strong Raman emitted light produced by an adsorbed monolayer of phthalocyanine molecules on silver electrode or silver bromide substrates and on neural membranes. The analysis of electronic energy levels in semiconducting silver bromide and the adsorbed phthalocyanine molecules suggests a lasing mechanism as a possible origin of the high enhancement factor in surface enhanced Raman scattering. Electrically modulated Raman scattering may be used as a carrier of information which is drawn fran the fast intramolecular electron transfer aN,the multiplicity of quantum wells in phthalocyanine molecules. Fast switching times on the order of 10-13 seconds have been measured at room temperature. Multilevel and multioutput optical signals have also been obtained fran such an electrically modulated adsorbed monolayer of phthalocyanine molecules which can be precisely addressed and interrogated. This may be of practical use to develop Nlecular electronic devices with high density memory and fast parallel processing systems with a typical 1020 gate Hz/cm2 capacity at room temperature for use in optical computers. The paper also discusses the electrooptical modulation of Raman signals obtained from adsorbed bio-compatible phthalocyanine molecules on nerve membranes. This optical probe of neural systems can be used in studies of complex information processing in neural nets and provides a possible method for interfacing natural and man-made information processing devices.
A high-speed tracking algorithm for dense granular media
NASA Astrophysics Data System (ADS)
Cerda, Mauricio; Navarro, Cristóbal A.; Silva, Juan; Waitukaitis, Scott R.; Mujica, Nicolás; Hitschfeld, Nancy
2018-06-01
Many fields of study, including medical imaging, granular physics, colloidal physics, and active matter, require the precise identification and tracking of particle-like objects in images. While many algorithms exist to track particles in diffuse conditions, these often perform poorly when particles are densely packed together-as in, for example, solid-like systems of granular materials. Incorrect particle identification can have significant effects on the calculation of physical quantities, which makes the development of more precise and faster tracking algorithms a worthwhile endeavor. In this work, we present a new tracking algorithm to identify particles in dense systems that is both highly accurate and fast. We demonstrate the efficacy of our approach by analyzing images of dense, solid-state granular media, where we achieve an identification error of 5% in the worst evaluated cases. Going further, we propose a parallelization strategy for our algorithm using a GPU, which results in a speedup of up to 10 × when compared to a sequential CPU implementation in C and up to 40 × when compared to the reference MATLAB library widely used for particle tracking. Our results extend the capabilities of state-of-the-art particle tracking methods by allowing fast, high-fidelity detection in dense media at high resolutions.
Exploration of High Harmonic Fast Wave Heating on the National Spherical Torus Experiment
DOE Office of Scientific and Technical Information (OSTI.GOV)
J.R. Wilson; R.E. Bell; S. Bernabei
2003-02-11
High Harmonic Fast Wave (HHFW) heating has been proposed as a particularly attractive means for plasma heating and current drive in the high-beta plasmas that are achievable in spherical torus (ST) devices. The National Spherical Torus Experiment (NSTX) [Ono, M., Kaye, S.M., Neumeyer, S., et al., Proceedings, 18th IEEE/NPSS Symposium on Fusion Engineering, Albuquerque, 1999, (IEEE, Piscataway, NJ (1999), p. 53.)] is such a device. An radio-frequency (rf) heating system has been installed on NSTX to explore the physics of HHFW heating, current drive via rf waves and for use as a tool to demonstrate the attractiveness of the STmore » concept as a fusion device. To date, experiments have demonstrated many of the theoretical predictions for HHFW. In particular, strong wave absorption on electrons over a wide range of plasma parameters and wave parallel phase velocities, wave acceleration of energetic ions, and indications of current drive for directed wave spectra have been observed. In addition HHFW heating has been used to explore the energy transport properties of NSTX plasmas, to create H-mode (high-confinement mode) discharges with a large fraction of bootstrap current and to control the plasma current profile during the early stages of the discharge.« less
Fast parallel molecular algorithms for DNA-based computation: factoring integers.
Chang, Weng-Long; Guo, Minyi; Ho, Michael Shan-Hui
2005-06-01
The RSA public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.
NASA Astrophysics Data System (ADS)
Russell, J. B.; Gaherty, J. B.; Lin, P. P.; Lizarralde, D.; Collins, J. A.; Hirth, G.; Evans, R. L.
2017-12-01
Observations of seismic anisotropy in the ocean basins are important for constraining deformation and melting processes in the upper mantle. The NoMelt OBS array was deployed on relatively pristine, 70 Ma seafloor in the central Pacific with the aim of constraining upper mantle circulation and the evolution of the lithosphere-asthenosphere system. Surface-waves traversing the array provide a unique opportunity to estimate a comprehensive set of anisotropic parameters. Azimuthal variations in Rayleigh-wave velocity over a period band of 15-180 s suggest strong anisotropic fabric both in the lithosphere and deep in the asthenosphere. High-frequency ambient noise (4-10 s) provides constraints on average VSV and VSH as well as azimuthal variations in both VS and VP in the upper ˜10 km of the mantle. Our best fitting models require radial anisotropy in the uppermost mantle with VSH > VSV by 3 - 7% and as much as 2% radial anisotropy in the crust. Additionally, we find a strong azimuthal dependence for Rayleigh- and Love-wave velocities, with Rayleigh 2θ fast direction parallel to the fossil spreading direction (FSD) and Love 2θ and 4θ fast directions shifted 90º and 45º from the FSD, respectively. These are some of the first direct observations of the Love 2θ and 4θ azimuthal signal, which allows us to directly invert for anisotropic terms G, B, and E in the uppermost Pacific lithosphere, for the first time. Together, these observations of radial and azimuthal anisotropy provide a comprehensive picture of oceanic mantle fabric and are consistent with horizontal alignment of olivine with the a-axis parallel to fossil spreading and having an orthorhombic or hexagonal symmetry.
NASA Astrophysics Data System (ADS)
Watanabe, Eriko; Ishikawa, Mami; Ohta, Maiko; Murakami, Yasuo; Kodate, Kashiko
2006-01-01
Medical errors and patient safety have always received a great deal of attention, as they can be critically life-threatening and significant matters. Hospitals and medical personnel are trying their utmost to avoid these errors. Currently in the medical field, patients' record is identified through their PIN numbers and ID cards. However, for patients who cannot speak or move, or who suffer from memory disturbances, alternative methods would be more desirable, and necessary in some cases. The authors previously proposed and fabricated a specially-designed correlator called FARCO (Fast Face Recognition Optical Correlator) based on the Vanderlugt Correlator1, which operates at the speed of 1000 faces/s 2,3,4. Combined with high-speed display devices, the four-channel processing could achieve such high operational speed as 4000 faces/s. Running trial experiments on a 1-to-N identification basis using the optical parallel correlator, we succeeded in acquiring low error rates of 1 % FMR and 2.3 % FNMR. In this paper, we propose a robust face recognition system using the FARCO for focusing on the safety and security of the medical field. We apply our face recognition system to registration of inpatients, in particular children and infants, before and after medical treatments or operations. The proposed system has recorded a higher recognition rate by multiplexing both input and database facial images from moving images. The system was also tested and evaluated for further practical use, leaving excellent results. Hence, our face recognition system could function effectively as an integral part of medical system, meeting these essential requirements of safety, security and privacy.
Fast Simulation of Dynamic Ultrasound Images Using the GPU.
Storve, Sigurd; Torp, Hans
2017-10-01
Simulated ultrasound data is a valuable tool for development and validation of quantitative image analysis methods in echocardiography. Unfortunately, simulation time can become prohibitive for phantoms consisting of a large number of point scatterers. The COLE algorithm by Gao et al. is a fast convolution-based simulator that trades simulation accuracy for improved speed. We present highly efficient parallelized CPU and GPU implementations of the COLE algorithm with an emphasis on dynamic simulations involving moving point scatterers. We argue that it is crucial to minimize the amount of data transfers from the CPU to achieve good performance on the GPU. We achieve this by storing the complete trajectories of the dynamic point scatterers as spline curves in the GPU memory. This leads to good efficiency when simulating sequences consisting of a large number of frames, such as B-mode and tissue Doppler data for a full cardiac cycle. In addition, we propose a phase-based subsample delay technique that efficiently eliminates flickering artifacts seen in B-mode sequences when COLE is used without enough temporal oversampling. To assess the performance, we used a laptop computer and a desktop computer, each equipped with a multicore Intel CPU and an NVIDIA GPU. Running the simulator on a high-end TITAN X GPU, we observed two orders of magnitude speedup compared to the parallel CPU version, three orders of magnitude speedup compared to simulation times reported by Gao et al. in their paper on COLE, and a speedup of 27000 times compared to the multithreaded version of Field II, using numbers reported in a paper by Jensen. We hope that by releasing the simulator as an open-source project we will encourage its use and further development.
Parallel processing of embossing dies with ultrafast lasers
NASA Astrophysics Data System (ADS)
Jarczynski, Manfred; Mitra, Thomas; Brüning, Stephan; Du, Keming; Jenke, Gerald
2018-02-01
Functionalization of surfaces equips products and components with new features like hydrophilic behavior, adjustable gloss level, light management properties, etc. Small feature sizes demand diffraction-limited spots and adapted fluence for different materials. Through the availability of high power fast repeating ultrashort pulsed lasers and efficient optical processing heads delivering diffraction-limited small spot size of around 10μm it is feasible to achieve fluences higher than an adequate patterning requires. Hence, parallel processing is becoming of interest to increase the throughput and allow mass production of micro machined surfaces. The first step on the roadmap of parallel processing for cylinder embossing dies was realized with an eight- spot processing head based on ns-fiber laser with passive optical beam splitting, individual spot switching by acousto optical modulation and an advanced imaging. Patterning of cylindrical embossing dies shows a high efficiency of nearby 80%, diffraction-limited and equally spaced spots with pitches down to 25μm achieved by a compression using cascaded prism arrays. Due to the nanoseconds laser pulses the ablation shows the typical surrounding material deposition of a hot process. In the next step the processing head was adapted to a picosecond-laser source and the 500W fiber laser was replaced by an ultrashort pulsed laser with 300W, 12ps and a repetition frequency of up to 6MHz. This paper presents details about the processing head design and the analysis of ablation rates and patterns on steel, copper and brass dies. Furthermore, it gives an outlook on scaling the parallel processing head from eight to 16 individually switched beamlets to increase processing throughput and optimized utilization of the available ultrashort pulsed laser energy.
Scalable direct Vlasov solver with discontinuous Galerkin method on unstructured mesh.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, J.; Ostroumov, P. N.; Mustapha, B.
2010-12-01
This paper presents the development of parallel direct Vlasov solvers with discontinuous Galerkin (DG) method for beam and plasma simulations in four dimensions. Both physical and velocity spaces are in two dimesions (2P2V) with unstructured mesh. Contrary to the standard particle-in-cell (PIC) approach for kinetic space plasma simulations, i.e., solving Vlasov-Maxwell equations, direct method has been used in this paper. There are several benefits to solving a Vlasov equation directly, such as avoiding noise associated with a finite number of particles and the capability to capture fine structure in the plasma. The most challanging part of a direct Vlasov solvermore » comes from higher dimensions, as the computational cost increases as N{sup 2d}, where d is the dimension of the physical space. Recently, due to the fast development of supercomputers, the possibility has become more realistic. Many efforts have been made to solve Vlasov equations in low dimensions before; now more interest has focused on higher dimensions. Different numerical methods have been tried so far, such as the finite difference method, Fourier Spectral method, finite volume method, and spectral element method. This paper is based on our previous efforts to use the DG method. The DG method has been proven to be very successful in solving Maxwell equations, and this paper is our first effort in applying the DG method to Vlasov equations. DG has shown several advantages, such as local mass matrix, strong stability, and easy parallelization. These are particularly suitable for Vlasov equations. Domain decomposition in high dimensions has been used for parallelization; these include a highly scalable parallel two-dimensional Poisson solver. Benchmark results have been shown and simulation results will be reported.« less
Csathó, Árpád; Birkás, Béla
2018-01-01
Life history theory posits that behavioral adaptation to various environmental (ecological and/or social) conditions encountered during childhood is regulated by a wide variety of different traits resulting in various behavioral strategies. Unpredictable and harsh conditions tend to produce fast life history strategies, characterized by early maturation, a higher number of sexual partners to whom one is less attached, and less parenting of offspring. Unpredictability and harshness not only affects dispositional social and emotional functioning, but may also promote the development of personality traits linked to higher rates of instability in social relationships or more self-interested behavior. Similarly, detrimental childhood experiences, such as poor parental care or high parent-child conflict, affect personality development and may create a more distrustful, malicious interpersonal style. The aim of this brief review is to survey and summarize findings on the impact of negative early-life experiences on the development of personality and fast life history strategies. By demonstrating that there are parallels in adaptations to adversity in these two domains, we hope to lend weight to current and future attempts to provide a comprehensive insight of personality traits and functions at the ultimate and proximate levels.
A fast combination calibration of foreground and background for pipelined ADCs
NASA Astrophysics Data System (ADS)
Kexu, Sun; Lenian, He
2012-06-01
This paper describes a fast digital calibration scheme for pipelined analog-to-digital converters (ADCs). The proposed method corrects the nonlinearity caused by finite opamp gain and capacitor mismatch in multiplying digital-to-analog converters (MDACs). The considered calibration technique takes the advantages of both foreground and background calibration schemes. In this combination calibration algorithm, a novel parallel background calibration with signal-shifted correlation is proposed, and its calibration cycle is very short. The details of this technique are described in the example of a 14-bit 100 Msample/s pipelined ADC. The high convergence speed of this background calibration is achieved by three means. First, a modified 1.5-bit stage is proposed in order to allow the injection of a large pseudo-random dithering without missing code. Second, before correlating the signal, it is shifted according to the input signal so that the correlation error converges quickly. Finally, the front pipeline stages are calibrated simultaneously rather than stage by stage to reduce the calibration tracking constants. Simulation results confirm that the combination calibration has a fast startup process and a short background calibration cycle of 2 × 221 conversions.
Preferential Heating of Oxygen 5+ Ions by Finite-Amplitude Oblique Alfven Waves
NASA Technical Reports Server (NTRS)
Maneva, Yana G.; Vinas, Adolfo; Araneda, Jamie; Poedts, Stefaan
2016-01-01
Minor ions in the fast solar wind are known to have higher temperatures and to flow faster than protons in the interplanetary space. In this study we combine previous research on parametric instability theory and 2.5D hybrid simulations to study the onset of preferential heating of Oxygen 5+ ions by large-scale finite-amplitude Alfven waves in the collisionless fast solar wind. We consider initially non-drifting isotropic multi-species plasma, consisting of isothermal massless fluid electrons, kinetic protons and kinetic Oxygen 5+ ions. The external energy source for the plasma heating and energization are oblique monochromatic Alfven-cyclotron waves. The waves have been created by rotating the direction of initial parallel pump, which is a solution of the multi-fluid plasma dispersion relation. We consider propagation angles theta less than or equal to 30 deg. The obliquely propagating Alfven pump waves lead to strong diffusion in the ion phase space, resulting in highly anisotropic heavy ion velocity distribution functions and proton beams. We discuss the application of the model to the problems of preferential heating of minor ions in the solar corona and the fast solar wind.
Computational electromagnetics: the physics of smooth versus oscillatory fields.
Chew, W C
2004-03-15
This paper starts by discussing the difference in the physics between solutions to Laplace's equation (static) and Maxwell's equations for dynamic problems (Helmholtz equation). Their differing physical characters are illustrated by how the two fields convey information away from their source point. The paper elucidates the fact that their differing physical characters affect the use of Laplacian field and Helmholtz field in imaging. They also affect the design of fast computational algorithms for electromagnetic scattering problems. Specifically, a comparison is made between fast algorithms developed using wavelets, the simple fast multipole method, and the multi-level fast multipole algorithm for electrodynamics. The impact of the physical characters of the dynamic field on the parallelization of the multi-level fast multipole algorithm is also discussed. The relationship of diagonalization of translators to group theory is presented. Finally, future areas of research for computational electromagnetics are described.
Fast 3D Surface Extraction 2 pages (including abstract)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sewell, Christopher Meyer; Patchett, John M.; Ahrens, James P.
Ocean scientists searching for isosurfaces and/or thresholds of interest in high resolution 3D datasets required a tedious and time-consuming interactive exploration experience. PISTON research and development activities are enabling ocean scientists to rapidly and interactively explore isosurfaces and thresholds in their large data sets using a simple slider with real time calculation and visualization of these features. Ocean Scientists can now visualize more features in less time, helping them gain a better understanding of the high resolution data sets they work with on a daily basis. Isosurface timings (512{sup 3} grid): VTK 7.7 s, Parallel VTK (48-core) 1.3 s, PISTONmore » OpenMP (48-core) 0.2 s, PISTON CUDA (Quadro 6000) 0.1 s.« less
Ding, Zhaoxiong; Zhang, Dongying; Wang, Guanghui; Tang, Minghui; Dong, Yumin; Zhang, Yixin; Ho, Ho-Pui; Zhang, Xuping
2016-09-21
In this paper, an in-line, low-cost, miniature and portable spectrophotometric detection system is presented and used for fast protein determination and calibration in centrifugal microfluidics. Our portable detection system is configured with paired emitter and detector diodes (PEDD), where the light beam between both LEDs is collimated with enhanced system tolerance. It is the first time that a physical model of PEDD is clearly presented, which could be modelled as a photosensitive RC oscillator. A portable centrifugal microfluidic system that contains a wireless port in real-time communication with a smartphone has been built to show that PEDD is an effective strategy for conducting rapid protein bioassays with detection performance comparable to that of a UV-vis spectrophotometer. The choice of centrifugal microfluidics offers the unique benefits of highly parallel fluidic actuation at high accuracy while there is no need for a pump, as inertial forces are present within the entire spinning disc and accurately controlled by varying the spinning speed. As a demonstration experiment, we have conducted the Bradford assay for bovine serum albumin (BSA) concentration calibration from 0 to 2 mg mL(-1). Moreover, a novel centrifugal disc with a spiral microchannel is proposed for automatic distribution and metering of the sample to all the parallel reactions at one time. The reported lab-on-a-disc scheme with PEDD detection may offer a solution for high-throughput assays, such as protein density calibration, drug screening and drug solubility measurement that require the handling of a large number of reactions in parallel.
Sabouni, Abas; Pouliot, Philippe; Shmuel, Amir; Lesage, Frederic
2014-01-01
This paper introduce a fast and efficient solver for simulating the induced (eddy) current distribution in the brain during transcranial magnetic stimulation procedure. This solver has been integrated with MRI and neuronavigation software to accurately model the electromagnetic field and show eddy current in the head almost in real-time. To examine the performance of the proposed technique, we used a 3D anatomically accurate MRI model of the 25 year old female subject.
Quartic scaling MP2 for solids: A highly parallelized algorithm in the plane wave basis
NASA Astrophysics Data System (ADS)
Schäfer, Tobias; Ramberger, Benjamin; Kresse, Georg
2017-03-01
We present a low-complexity algorithm to calculate the correlation energy of periodic systems in second-order Møller-Plesset (MP2) perturbation theory. In contrast to previous approximation-free MP2 codes, our implementation possesses a quartic scaling, O ( N 4 ) , with respect to the system size N and offers an almost ideal parallelization efficiency. The general issue that the correlation energy converges slowly with the number of basis functions is eased by an internal basis set extrapolation. The key concept to reduce the scaling is to eliminate all summations over virtual orbitals which can be elegantly achieved in the Laplace transformed MP2 formulation using plane wave basis sets and fast Fourier transforms. Analogously, this approach could allow us to calculate second order screened exchange as well as particle-hole ladder diagrams with a similar low complexity. Hence, the presented method can be considered as a step towards systematically improved correlation energies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, S.; Labanca, I.; Rech, I.
2014-10-15
Fluorescence correlation spectroscopy (FCS) is a well-established technique to study binding interactions or the diffusion of fluorescently labeled biomolecules in vitro and in vivo. Fast FCS experiments require parallel data acquisition and analysis which can be achieved by exploiting a multi-channel Single Photon Avalanche Diode (SPAD) array and a corresponding multi-input correlator. This paper reports a 32-channel FPGA based correlator able to perform 32 auto/cross-correlations simultaneously over a lag-time ranging from 10 ns up to 150 ms. The correlator is included in a 32 × 1 SPAD array module, providing a compact and flexible instrument for high throughput FCS experiments.more » However, some inherent features of SPAD arrays, namely afterpulsing and optical crosstalk effects, may introduce distortions in the measurement of auto- and cross-correlation functions. We investigated these limitations to assess their impact on the module and evaluate possible workarounds.« less
Random-subset fitting of digital holograms for fast three-dimensional particle tracking [invited].
Dimiduk, Thomas G; Perry, Rebecca W; Fung, Jerome; Manoharan, Vinothan N
2014-09-20
Fitting scattering solutions to time series of digital holograms is a precise way to measure three-dimensional dynamics of microscale objects such as colloidal particles. However, this inverse-problem approach is computationally expensive. We show that the computational time can be reduced by an order of magnitude or more by fitting to a random subset of the pixels in a hologram. We demonstrate our algorithm on experimentally measured holograms of micrometer-scale colloidal particles, and we show that 20-fold increases in speed, relative to fitting full frames, can be attained while introducing errors in the particle positions of 10 nm or less. The method is straightforward to implement and works for any scattering model. It also enables a parallelization strategy wherein random-subset fitting is used to quickly determine initial guesses that are subsequently used to fit full frames in parallel. This approach may prove particularly useful for studying rare events, such as nucleation, that can only be captured with high frame rates over long times.
Low complexity 1D IDCT for 16-bit parallel architectures
NASA Astrophysics Data System (ADS)
Bivolarski, Lazar
2007-09-01
This paper shows that using the Loeffler, Ligtenberg, and Moschytz factorization of 8-point IDCT [2] one-dimensional (1-D) algorithm as a fast approximation of the Discrete Cosine Transform (DCT) and using only 16 bit numbers, it is possible to create in an IEEE 1180-1990 compliant and multiplierless algorithm with low computational complexity. This algorithm as characterized by its structure is efficiently implemented on parallel high performance architectures as well as due to its low complexity is sufficient for wide range of other architectures. Additional constraint on this work was the requirement of compliance with the existing MPEG standards. The hardware implementation complexity and low resources where also part of the design criteria for this algorithm. This implementation is also compliant with the precision requirements described in MPEG IDCT precision specification ISO/IEC 23002-1. Complexity analysis is performed as an extension to the simple measure of shifts and adds for the multiplierless algorithm as additional operations are included in the complexity measure to better describe the actual transform implementation complexity.
NASA Astrophysics Data System (ADS)
Lin, Na; Jia, Zhe; Wang, Zhihui; Zhao, Hui; Ai, Guo; Song, Xiangyun; Bai, Ying; Battaglia, Vincent; Sun, Chengdong; Qiao, Juan; Wu, Kai; Liu, Gao
2017-10-01
The structure degradation of commercial Lithium-ion battery (LIB) graphite anodes with different cycling numbers and charge rates was investigated by focused ion beam (FIB) and scanning electron microscopy (SEM). The cross-section image of graphite anode by FIB milling shows that cracks, resulted in the volume expansion of graphite electrode during long-term cycling, were formed in parallel with the current collector. The crack occurs in the bulk of graphite particles near the lithium insertion surface, which might derive from the stress induced during lithiation and de-lithiation cycles. Subsequently, crack takes place along grain boundaries of the polycrystalline graphite, but only in the direction parallel with the current collector. Furthermore, fast charge graphite electrodes are more prone to form cracks since the tensile strength of graphite is more likely to be surpassed at higher charge rates. Therefore, for LIBs long-term or high charge rate applications, the tensile strength of graphite anode should be taken into account.
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction
Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...
1995-01-01
In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
Yue, Chao; Li, Wen; Reeves, Geoffrey D.; ...
2016-07-01
Interactions between interplanetary (IP) shocks and the Earth's magnetosphere manifest many important space physics phenomena including low-energy ion flux enhancements and particle acceleration. In order to investigate the mechanisms driving shock-induced enhancement of low-energy ion flux, we have examined two IP shock events that occurred when the Van Allen Probes were located near the equator while ionospheric and ground observations were available around the spacecraft footprints. We have found that, associated with the shock arrival, electromagnetic fields intensified, and low-energy ion fluxes, including H +, He +, and O +, were enhanced dramatically in both the parallel and perpendicular directions.more » During the 2 October 2013 shock event, both parallel and perpendicular flux enhancements lasted more than 20 min with larger fluxes observed in the perpendicular direction. In contrast, for the 15 March 2013 shock event, the low-energy perpendicular ion fluxes increased only in the first 5 min during an impulse of electric field, while the parallel flux enhancement lasted more than 30 min. In addition, ionospheric outflows were observed after shock arrivals. From a simple particle motion calculation, we found that the rapid response of low-energy ions is due to drifts of plasmaspheric population by the enhanced electric field. Furthermore, the fast acceleration in the perpendicular direction cannot solely be explained by E × B drift but betatron acceleration also plays a role. Adiabatic acceleration may also explain the fast response of the enhanced parallel ion fluxes, while ion outflows may contribute to the enhanced parallel fluxes that last longer than the perpendicular fluxes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yue, Chao; Li, Wen; Reeves, Geoffrey D.
Interactions between interplanetary (IP) shocks and the Earth's magnetosphere manifest many important space physics phenomena including low-energy ion flux enhancements and particle acceleration. In order to investigate the mechanisms driving shock-induced enhancement of low-energy ion flux, we have examined two IP shock events that occurred when the Van Allen Probes were located near the equator while ionospheric and ground observations were available around the spacecraft footprints. We have found that, associated with the shock arrival, electromagnetic fields intensified, and low-energy ion fluxes, including H +, He +, and O +, were enhanced dramatically in both the parallel and perpendicular directions.more » During the 2 October 2013 shock event, both parallel and perpendicular flux enhancements lasted more than 20 min with larger fluxes observed in the perpendicular direction. In contrast, for the 15 March 2013 shock event, the low-energy perpendicular ion fluxes increased only in the first 5 min during an impulse of electric field, while the parallel flux enhancement lasted more than 30 min. In addition, ionospheric outflows were observed after shock arrivals. From a simple particle motion calculation, we found that the rapid response of low-energy ions is due to drifts of plasmaspheric population by the enhanced electric field. Furthermore, the fast acceleration in the perpendicular direction cannot solely be explained by E × B drift but betatron acceleration also plays a role. Adiabatic acceleration may also explain the fast response of the enhanced parallel ion fluxes, while ion outflows may contribute to the enhanced parallel fluxes that last longer than the perpendicular fluxes.« less
Nael, Kambiz; Fenchel, Michael; Krishnam, Mayil; Finn, J Paul; Laub, Gerhard; Ruehm, Stefan G
2007-06-01
To evaluate the technical feasibility of high spatial resolution contrast-enhanced magnetic resonance angiography (CE-MRA) with highly accelerated parallel acquisition at 3.0 T using a 32-channel phased array coil, and a high relaxivity contrast agent. Ten adult healthy volunteers (5 men, 5 women, aged 21-66 years) underwent high spatial resolution CE-MRA of the pulmonary circulation. Imaging was performed at 3 T using a 32-channel phase array coil. After intravenous injection of 1 mL of gadobenate dimeglumine (Gd-BOPTA) at 1.5 mL/s, a timing bolus was used to measure the transit time from the arm vein to the main pulmonary artery. Subsequently following intravenous injection of 0.1 mmol/kg of Gd-BOPTA at the same rate, isotropic high spatial resolution data sets (1 x 1 x 1 mm3) CE-MRA of the entire pulmonary circulation were acquired using a fast gradient-recalled echo sequence (TR/TE 3/1.2 milliseconds, FA 18 degrees) and highly accelerated parallel acquisition (GRAPPA x 6) during a 20-second breath hold. The presence of artifact, noise, and image quality of the pulmonary arterial segments were evaluated independently by 2 radiologists. Phantom measurements were performed to assess the signal-to-noise ratio (SNR). Statistical analysis of data was performed by using Wilcoxon rank sum test and 2-sample Student t test. The interobserver variability was tested by kappa coefficient. All studies were of diagnostic quality as determined by both observers. The pulmonary arteries were routinely identified up to fifth-order branches, with definition in the diagnostic range and excellent interobserver agreement (kappa = 0.84, 95% confidence interval 0.77-0.90). Phantom measurements showed significantly lower SNR (P < 0.01) using GRAPPA (17.3 +/- 18.8) compared with measurements without parallel acquisition (58 +/- 49.4). The described 3 T CE-MRA protocol in addition to high T1 relaxivity of Gd-BOPTA provides sufficient SNR to support highly accelerated parallel acquisition (GRAPPA x 6), resulting in acquisition of isotopic (1 x 1 x 1 mm3) voxels over the entire pulmonary circulation in 20 seconds.
Computer-intensive simulation of solid-state NMR experiments using SIMPSON.
Tošner, Zdeněk; Andersen, Rasmus; Stevensson, Baltzar; Edén, Mattias; Nielsen, Niels Chr; Vosegaard, Thomas
2014-09-01
Conducting large-scale solid-state NMR simulations requires fast computer software potentially in combination with efficient computational resources to complete within a reasonable time frame. Such simulations may involve large spin systems, multiple-parameter fitting of experimental spectra, or multiple-pulse experiment design using parameter scan, non-linear optimization, or optimal control procedures. To efficiently accommodate such simulations, we here present an improved version of the widely distributed open-source SIMPSON NMR simulation software package adapted to contemporary high performance hardware setups. The software is optimized for fast performance on standard stand-alone computers, multi-core processors, and large clusters of identical nodes. We describe the novel features for fast computation including internal matrix manipulations, propagator setups and acquisition strategies. For efficient calculation of powder averages, we implemented interpolation method of Alderman, Solum, and Grant, as well as recently introduced fast Wigner transform interpolation technique. The potential of the optimal control toolbox is greatly enhanced by higher precision gradients in combination with the efficient optimization algorithm known as limited memory Broyden-Fletcher-Goldfarb-Shanno. In addition, advanced parallelization can be used in all types of calculations, providing significant time reductions. SIMPSON is thus reflecting current knowledge in the field of numerical simulations of solid-state NMR experiments. The efficiency and novel features are demonstrated on the representative simulations. Copyright © 2014 Elsevier Inc. All rights reserved.
Electrostatic waves driven by electron beam in lunar wake plasma
NASA Astrophysics Data System (ADS)
Sreeraj, T.; Singh, S. V.; Lakhina, G. S.
2018-05-01
A linear analysis of electrostatic waves propagating parallel to the ambient field in a four component homogeneous, collisionless, magnetised plasma comprising fluid protons, fluid He++, electron beam, and suprathermal electrons following kappa distribution is presented. In the absence of electron beam streaming, numerical analysis of the dispersion relation shows six modes: two electron acoustic modes (modes 1 and 6), two fast ion acoustic modes (modes 2 and 5), and two slow ion acoustic modes (modes 3 and 4). The modes 1, 2 and 3 and modes 4, 5, and 6 have positive and negative phase speeds, respectively. With an increase in electron beam speed, the mode 6 gets affected the most and the phase speed turns positive from negative. The mode 6 thus starts to merge with modes 2 and 3 and generates the electron beam driven fast and slow ion acoustic waves unstable with a finite growth. The electron beam driven slow ion-acoustic waves occur at lower wavenumbers, whereas fast ion-acoustic waves occur at a large value of wavenumbers. The effect of various other parameters has also been studied. We have applied this analysis to the electrostatic waves observed in lunar wake during the first flyby of the ARTEMIS mission. The analysis shows that the low (high) frequency waves observed in the lunar wake could be the electron beam driven slow (fast) ion-acoustic modes.
A Fast Algorithm for Massively Parallel, Long-Term, Simulation of Complex Molecular Dynamics Systems
NASA Technical Reports Server (NTRS)
Jaramillo-Botero, Andres; Goddard, William A, III; Fijany, Amir
1997-01-01
The advances in theory and computing technology over the last decade have led to enormous progress in applying atomistic molecular dynamics (MD) methods to the characterization, prediction, and design of chemical, biological, and material systems,.
Parallel VLSI architecture emulation and the organization of APSA/MPP
NASA Technical Reports Server (NTRS)
Odonnell, John T.
1987-01-01
The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bertelli, N.; Valeo, E. J.; Green, D. L.
At the power levels required for significant heating and current drive in magnetically-confined toroidal plasma, modification of the particle distribution function from a Maxwellian shape is likely (Stix 1975 Nucl. Fusion 15 737), with consequent changes in wave propagation and in the location and amount of absorption. In order to study these effects computationally, both the finite-Larmor-radius and the high-harmonic fast wave (HHFW), versions of the full-wave, hot-plasma toroidal simulation code TORIC (Brambilla 1999 Plasma Phys. Control. Fusion 41 1 and Brambilla 2002 Plasma Phys. Control. Fusion 44 2423), have been extended to allow the prescription of arbitrary velocity distributionsmore » of the form f(v(parallel to), v(perpendicular to) , psi, theta). For hydrogen (H) minority heating of a deuterium (D) plasma with anisotropic Maxwellian H distributions, the fractional H absorption varies significantly with changes in parallel temperature but is essentially independent of perpendicular temperature. On the other hand, for HHFW regime with anisotropic Maxwellian fast ion distribution, the fractional beam ion absorption varies mainly with changes in the perpendicular temperature. The evaluation of the wave-field and power absorption, through the full wave solver, with the ion distribution function provided by either a Monte-Carlo particle and Fokker-Planck codes is also examined for Alcator C-Mod and NSTX plasmas. Non-Maxwellian effects generally tend to increase the absorption with respect to the equivalent Maxwellian distribution.« less
NASA Astrophysics Data System (ADS)
Xue, M.; Li, L.; Chen, L.
2016-12-01
South China Sea (SCS) is located in the continental margin of Eurasia plate, where different geological blocks/tectonic plates interact. The dynamic mechanism of the formation of South China Sea (SCS) has been debated for decades. In this study, we first synthesize our geophysical results obtained in South China Sea, including an updated 3D velocity model from surface tomography using surrounding land stations and regional earthquakes, and shear wave splitting results obtained at surrounding land stations and OBS, using local, regional, and teleseismic earthquakes. The observed splitting results in South China Sea are complex: the fast polarization direction beneath the central basin is approximately NE-SW, nearly parallel to the extinct ridge in the central basin of SCS; however, the fast axis within the slab is trench-parallel outside the ridge subduction region. In 3D velocity models, subducting slabs are observed as dipping high velocity anomalies, and discontinuous low velocities are observed above the subduction slab, as well as in the basin. How the splitting observations are connected with the velocity models? How observations are linked to one another? How are the observations in central basin linked with surrounding region? We are aiming to link these observations themselves as well as with newly published results from geophysics, geochemistry, and geology in this region. Such a synthesis will improve our understanding about the evolution of South China Sea and facilitate new ideas.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-01
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4× speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration. PMID:28075343
NASA Astrophysics Data System (ADS)
Bertelli, N.; Valeo, E. J.; Green, D. L.; Gorelenkova, M.; Phillips, C. K.; Podestà, M.; Lee, J. P.; Wright, J. C.; Jaeger, E. F.
2017-05-01
At the power levels required for significant heating and current drive in magnetically-confined toroidal plasma, modification of the particle distribution function from a Maxwellian shape is likely (Stix 1975 Nucl. Fusion 15 737), with consequent changes in wave propagation and in the location and amount of absorption. In order to study these effects computationally, both the finite-Larmor-radius and the high-harmonic fast wave (HHFW), versions of the full-wave, hot-plasma toroidal simulation code TORIC (Brambilla 1999 Plasma Phys. Control. Fusion 41 1 and Brambilla 2002 Plasma Phys. Control. Fusion 44 2423), have been extended to allow the prescription of arbitrary velocity distributions of the form f≤ft({{v}\\parallel},{{v}\\bot},\\psi,θ \\right) . For hydrogen (H) minority heating of a deuterium (D) plasma with anisotropic Maxwellian H distributions, the fractional H absorption varies significantly with changes in parallel temperature but is essentially independent of perpendicular temperature. On the other hand, for HHFW regime with anisotropic Maxwellian fast ion distribution, the fractional beam ion absorption varies mainly with changes in the perpendicular temperature. The evaluation of the wave-field and power absorption, through the full wave solver, with the ion distribution function provided by either a Monte-Carlo particle and Fokker-Planck codes is also examined for Alcator C-Mod and NSTX plasmas. Non-Maxwellian effects generally tend to increase the absorption with respect to the equivalent Maxwellian distribution.
Bertelli, N.; Valeo, E. J.; Green, D. L.; ...
2017-04-03
At the power levels required for significant heating and current drive in magnetically-confined toroidal plasma, modification of the particle distribution function from a Maxwellian shape is likely (Stix 1975 Nucl. Fusion 15 737), with consequent changes in wave propagation and in the location and amount of absorption. In order to study these effects computationally, both the finite-Larmor-radius and the high-harmonic fast wave (HHFW), versions of the full-wave, hot-plasma toroidal simulation code TORIC (Brambilla 1999 Plasma Phys. Control. Fusion 41 1 and Brambilla 2002 Plasma Phys. Control. Fusion 44 2423), have been extended to allow the prescription of arbitrary velocity distributionsmore » of the form f(v(parallel to), v(perpendicular to) , psi, theta). For hydrogen (H) minority heating of a deuterium (D) plasma with anisotropic Maxwellian H distributions, the fractional H absorption varies significantly with changes in parallel temperature but is essentially independent of perpendicular temperature. On the other hand, for HHFW regime with anisotropic Maxwellian fast ion distribution, the fractional beam ion absorption varies mainly with changes in the perpendicular temperature. The evaluation of the wave-field and power absorption, through the full wave solver, with the ion distribution function provided by either a Monte-Carlo particle and Fokker-Planck codes is also examined for Alcator C-Mod and NSTX plasmas. Non-Maxwellian effects generally tend to increase the absorption with respect to the equivalent Maxwellian distribution.« less
Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu
2018-04-20
A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
NASA Astrophysics Data System (ADS)
Layes, Vincent; Monje, Sascha; Corbella, Carles; Schulz-von der Gathen, Volker; von Keudell, Achim; de los Arcos, Teresa
2017-05-01
In-vacuum characterization of magnetron targets after High Power Impulse Magnetron Sputtering (HiPIMS) has been performed by X-ray photoelectron spectroscopy (XPS). Al-Cr composite targets (circular, 50 mm diameter) mounted in two different geometries were investigated: an Al target with a small Cr disk embedded at the racetrack position and a Cr target with a small Al disk embedded at the racetrack position. The HiPIMS discharge and the target surface composition were characterized in parallel for low, intermediate, and high power conditions, thus covering both the Ar-dominated and the metal-dominated HiPIMS regimes. The HiPIMS plasma was investigated using optical emission spectroscopy and fast imaging using a CCD camera; the spatially resolved XPS surface characterization was performed after in-vacuum transfer of the magnetron target to the XPS chamber. This parallel evaluation showed that (i) target redeposition of sputtered species was markedly more effective for Cr atoms than for Al atoms; (ii) oxidation at the target racetrack was observed even though the discharge ran in pure Ar gas without O2 admixture, the oxidation depended on the discharge power and target composition; and (iii) a bright emission spot fixed on top of the inserted Cr disk appeared for high power conditions.
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
On some Aitken-like acceleration of the Schwarz method
NASA Astrophysics Data System (ADS)
Garbey, M.; Tromeur-Dervout, D.
2002-12-01
In this paper we present a family of domain decomposition based on Aitken-like acceleration of the Schwarz method seen as an iterative procedure with a linear rate of convergence. We first present the so-called Aitken-Schwarz procedure for linear differential operators. The solver can be a direct solver when applied to the Helmholtz problem with five-point finite difference scheme on regular grids. We then introduce the Steffensen-Schwarz variant which is an iterative domain decomposition solver that can be applied to linear and nonlinear problems. We show that these solvers have reasonable numerical efficiency compared to classical fast solvers for the Poisson problem or multigrids for more general linear and nonlinear elliptic problems. However, the salient feature of our method is that our algorithm has high tolerance to slow network in the context of distributed parallel computing and is attractive, generally speaking, to use with computer architecture for which performance is limited by the memory bandwidth rather than the flop performance of the CPU. This is nowadays the case for most parallel. computer using the RISC processor architecture. We will illustrate this highly desirable property of our algorithm with large-scale computing experiments.
Fast Photon Monte Carlo for Water Cherenkov Detectors
NASA Astrophysics Data System (ADS)
Latorre, Anthony; Seibert, Stanley
2012-03-01
We present Chroma, a high performance optical photon simulation for large particle physics detectors, such as the water Cerenkov far detector option for LBNE. This software takes advantage of the CUDA parallel computing platform to propagate photons using modern graphics processing units. In a computer model of a 200 kiloton water Cerenkov detector with 29,000 photomultiplier tubes, Chroma can propagate 2.5 million photons per second, around 200 times faster than the same simulation with Geant4. Chroma uses a surface based approach to modeling geometry which offers many benefits over a solid based modelling approach which is used in other simulations like Geant4.
NASA Astrophysics Data System (ADS)
Laubscher, Markus; Bourquin, Stéphane; Froehly, Luc; Karamata, Boris; Lasser, Theo
2004-07-01
Current spectroscopic optical coherence tomography (OCT) methods rely on a posteriori numerical calculation. We present an experimental alternative for accessing spectroscopic information in OCT without post-processing based on wavelength de-multiplexing and parallel detection using a diffraction grating and a smart pixel detector array. Both a conventional A-scan with high axial resolution and the spectrally resolved measurement are acquired simultaneously. A proof-of-principle demonstration is given on a dynamically changing absorbing sample. The method's potential for fast spectroscopic OCT imaging is discussed. The spectral measurements obtained with this approach are insensitive to scan non-linearities or sample movements.
Mathematical and Numerical Aspects of the Adaptive Fast Multipole Poisson-Boltzmann Solver
Zhang, Bo; Lu, Benzhuo; Cheng, Xiaolin; ...
2013-01-01
This paper summarizes the mathematical and numerical theories and computational elements of the adaptive fast multipole Poisson-Boltzmann (AFMPB) solver. We introduce and discuss the following components in order: the Poisson-Boltzmann model, boundary integral equation reformulation, surface mesh generation, the nodepatch discretization approach, Krylov iterative methods, the new version of fast multipole methods (FMMs), and a dynamic prioritization technique for scheduling parallel operations. For each component, we also remark on feasible approaches for further improvements in efficiency, accuracy and applicability of the AFMPB solver to large-scale long-time molecular dynamics simulations. Lastly, the potential of the solver is demonstrated with preliminary numericalmore » results.« less
Pyramidal Wavefront Sensor Demonstrator at INO
NASA Astrophysics Data System (ADS)
Martin, Olivier; Véran, Jean-Pierre; Anctil, Geneviève; Bourqui, Pascal; Châteauneuf, François; Gauvin, Jonny; Goyette, Philippe; Lagacé, François; Turbide, Simon; Wang, Min
2014-08-01
Wavefront sensing is one of the key elements of an Adaptive Optics System. Although Shack-Hartmann WFS are the most commonly used whether for astronomical or biomedical applications, the high-sensitivity and large dynamic-range of the Pyramid-WFS (P-WFS) technology is promising and needs to be further investigated for proper justification in future Extremely Large Telescopes (ELT) applications. At INO, center for applied research in optics and technology transfer in Quebec City, Canada, we have recently set to develop a Pyramid wavefront sensor (P-WFS), an option for which no other research group in Canada had any experience. A first version had been built and tested in 2013 in collaboration with NRC-HIA Victoria. Here we present a second iteration of demonstrator with an extended spectral range, fast modulation capability and low-noise, fast-acquisition EMCCD sensor. The system has been designed with compactness and robustness in mind to allow on-sky testing at Mont Mégantic facility, in parallel with a Shack- Hartmann sensor so as to compare both options.
NASA Technical Reports Server (NTRS)
Mandt, M. E.; Lee, L. C.
1991-01-01
The high correlation of Pc 1 events with magnetospheric compressions is known. A mechanism is proposed which leads to the generation of Pc 1 waves. The interaction of a dynamic pressure pulse with the earth's bow shock leads to the formation of a weak fast-mode shock propagating into the magnetoshealth. The shock wave can pass right through a tangential discontinuity (magnetopause) and into the magnetosphere, without disturbing either of the structures. In a quasiperpendicular geometry, the shock wave exhibits anisotropic heating. This anisotropy drives unstable ion-cyclotron waves which can contribute to the generation of the Pc 1 waves which are detected. The viability of the mechanism is demonstrated with simulations. This mechanism could explain the peak in the occurrence of observed Pc 1 waves in the postnoon sector where a field-aligned discontinuity in the solar wind would most often be parallel to the magnetopause surface due to the average Parker-spiral magnetic-field configuration.
Fast segmentation of stained nuclei in terabyte-scale, time resolved 3D microscopy image stacks.
Stegmaier, Johannes; Otte, Jens C; Kobitski, Andrei; Bartschat, Andreas; Garcia, Ariel; Nienhaus, G Ulrich; Strähle, Uwe; Mikut, Ralf
2014-01-01
Automated analysis of multi-dimensional microscopy images has become an integral part of modern research in life science. Most available algorithms that provide sufficient segmentation quality, however, are infeasible for a large amount of data due to their high complexity. In this contribution we present a fast parallelized segmentation method that is especially suited for the extraction of stained nuclei from microscopy images, e.g., of developing zebrafish embryos. The idea is to transform the input image based on gradient and normal directions in the proximity of detected seed points such that it can be handled by straightforward global thresholding like Otsu's method. We evaluate the quality of the obtained segmentation results on a set of real and simulated benchmark images in 2D and 3D and show the algorithm's superior performance compared to other state-of-the-art algorithms. We achieve an up to ten-fold decrease in processing times, allowing us to process large data sets while still providing reasonable segmentation results.
Circulating metabolite predictors of glycemia in middle-aged men and women.
Würtz, Peter; Tiainen, Mika; Mäkinen, Ville-Petteri; Kangas, Antti J; Soininen, Pasi; Saltevo, Juha; Keinänen-Kiukaanniemi, Sirkka; Mäntyselkä, Pekka; Lehtimäki, Terho; Laakso, Markku; Jula, Antti; Kähönen, Mika; Vanhala, Mauno; Ala-Korpela, Mika
2012-08-01
Metabolite predictors of deteriorating glucose tolerance may elucidate the pathogenesis of type 2 diabetes. We investigated associations of circulating metabolites from high-throughput profiling with fasting and postload glycemia cross-sectionally and prospectively on the population level. Oral glucose tolerance was assessed in two Finnish, population-based studies consisting of 1,873 individuals (mean age 52 years, 58% women) and reexamined after 6.5 years for 618 individuals in one of the cohorts. Metabolites were quantified by nuclear magnetic resonance spectroscopy from fasting serum samples. Associations were studied by linear regression models adjusted for established risk factors. Nineteen circulating metabolites, including amino acids, gluconeogenic substrates, and fatty acid measures, were cross-sectionally associated with fasting and/or postload glucose (P < 0.001). Among these metabolic intermediates, branched-chain amino acids, phenylalanine, and α1-acid glycoprotein were predictors of both fasting and 2-h glucose at 6.5-year follow-up (P < 0.05), whereas alanine, lactate, pyruvate, and tyrosine were uniquely associated with 6.5-year postload glucose (P = 0.003-0.04). None of the fatty acid measures were prospectively associated with glycemia. Changes in fatty acid concentrations were associated with changes in fasting and postload glycemia during follow-up; however, changes in branched-chain amino acids did not follow glucose dynamics, and gluconeogenic substrates only paralleled changes in fasting glucose. Alterations in branched-chain and aromatic amino acid metabolism precede hyperglycemia in the general population. Further, alanine, lactate, and pyruvate were predictive of postchallenge glucose exclusively. These gluconeogenic precursors are potential markers of long-term impaired insulin sensitivity that may relate to attenuated glucose tolerance later in life.
A Fusion Nuclear Science Facility for a fast-track path to DEMO
Garofalo, Andrea M.; Abdou, M.; Canik, John M.; ...
2014-10-01
An accelerated fusion energy development program, a “fast-track” approach, requires developing an understanding of fusion nuclear science (FNS) in parallel with research on ITER to study burning plasmas. A Fusion Nuclear Science Facility (FNSF) in parallel with ITER provides the capability to resolve FNS feasibility issues related to power extraction, tritium fuel sustainability, and reliability, and to begin construction of DEMO upon the achievement of Q~10 in ITER. Fusion nuclear components, including the first wall (FW)/blanket, divertor, heating/fueling systems, etc. are complex systems with many inter-related functions and different materials, fluids, and physical interfaces. These in-vessel nuclear components must operatemore » continuously and reliably with: (a) Plasma exposure, surface particle & radiation loads, (b) High energy 2 neutron fluxes and their interactions in materials (e.g. peaked volumetric heating with steep gradients, tritium production, activation, atomic displacements, gas production, etc.), (c) Strong magnetic fields with temporal and spatial variations (electromagnetic coupling to the plasma including off-normal events like disruptions), and (d) a High temperature, high vacuum, chemically active environment. While many of these conditions and effects are being studied with separate and multiple effect experimental test stands and modeling, fusion nuclear conditions cannot be completely simulated outside the fusion environment. This means there are many new multi-physics, multi-scale phenomena and synergistic effects yet to be discovered and accounted for in the understanding, design and operation of fusion as a self-sustaining, energy producing system, and significant experimentation and operational experience in a true fusion environment is an essential requirement. In the following sections we discuss the FNSF objectives, describe the facility requirements and a facility concept and operation approach that can accomplish those objectives, and assess the readiness to construct with respect to several key FNSF issues: materials, steady-state operation, disruptions, power exhaust, and breeding blanket. Finally we present our conclusions.« less
BarraCUDA - a fast short read sequence aligner using graphics processing units
2012-01-01
Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
Mark, Alicja Budek; Poulsen, Malene Wibe; Andersen, Stine; Andersen, Jeanette Marker; Bak, Monika Judyta; Ritz, Christian; Holst, Jens Juul; Nielsen, John; de Courten, Barbora; Dragsted, Lars Ove; Bügel, Susanne Gjedsted
2014-01-01
OBJECTIVE High-heat cooking of food induces the formation of advanced glycation end products (AGEs), which are thought to impair glucose metabolism in type 2 diabetic patients. High intake of fructose might additionally affect endogenous formation of AGEs. This parallel intervention study investigated whether the addition of fructose or cooking methods influencing the AGE content of food affect insulin sensitivity in overweight individuals. RESEARCH DESIGN AND METHODS Seventy-four overweight women were randomized to follow either a high- or low-AGE diet for 4 weeks, together with consumption of either fructose or glucose drinks. Glucose and insulin concentrations-after fasting and 2 h after an oral glucose tolerance test-were measured before and after the intervention. Homeostasis model assessment of insulin resistance (HOMA-IR) and insulin sensitivity index were calculated. Dietary and urinary AGE concentrations were measured (liquid chromatography tandem mass spectrometry) to estimate AGE intake and excretion. RESULTS When adjusted for changes in anthropometric measures during the intervention, the low-AGE diet decreased urinary AGEs, fasting insulin concentrations, and HOMA-IR, compared with the high-AGE diet. Addition of fructose did not affect any outcomes. CONCLUSIONS Diets with high AGE content may increase the development of insulin resistance. AGEs can be reduced by modulation of cooking methods but is unaffected by moderate fructose intake.
Parallel equilibrium current effect on existence of reversed shear Alfvén eigenmodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Hua-sheng, E-mail: huashengxie@gmail.com; Xiao, Yong, E-mail: yxiao@zju.edu.cn
2015-02-15
A new fast global eigenvalue code, where the terms are segregated according to their physics contents, is developed to study Alfvén modes in tokamak plasmas, particularly, the reversed shear Alfvén eigenmode (RSAE). Numerical calculations show that the parallel equilibrium current corresponding to the kink term is strongly unfavorable for the existence of the RSAE. An improved criterion for the RSAE existence is given for with and without the parallel equilibrium current. In the limits of ideal magnetohydrodynamics (MHD) and zero-pressure, the toroidicity effect is the main possible favorable factor for the existence of the RSAE, which is however usually small.more » This suggests that it is necessary to include additional physics such as kinetic term in the MHD model to overcome the strong unfavorable effect of the parallel current in order to enable the existence of RSAE.« less
Mantle flow through a tear in the Nazca slab inferred from shear wave splitting
NASA Astrophysics Data System (ADS)
Lynner, Colton; Anderson, Megan L.; Portner, Daniel E.; Beck, Susan L.; Gilbert, Hersh
2017-07-01
A tear in the subducting Nazca slab is located between the end of the Pampean flat slab and normally subducting oceanic lithosphere. Tomographic studies suggest mantle material flows through this opening. The best way to probe this hypothesis is through observations of seismic anisotropy, such as shear wave splitting. We examine patterns of shear wave splitting using data from two seismic deployments in Argentina that lay updip of the slab tear. We observe a simple pattern of plate-motion-parallel fast splitting directions, indicative of plate-motion-parallel mantle flow, beneath the majority of the stations. Our observed splitting contrasts previous observations to the north and south of the flat slab region. Since plate-motion-parallel splitting occurs only coincidentally with the slab tear, we propose mantle material flows through the opening resulting in Nazca plate-motion-parallel flow in both the subslab mantle and mantle wedge.
Fast parallel 3D profilometer with DMD technology
NASA Astrophysics Data System (ADS)
Hou, Wenmei; Zhang, Yunbo
2011-12-01
Confocal microscope has been a powerful tool for three-dimensional profile analysis. Single mode confocal microscope is limited by scanning speed. This paper presents a 3D profilometer prototype of parallel confocal microscope based on DMD (Digital Micromirror Device). In this system the DMD takes the place of Nipkow Disk which is a classical parallel scanning scheme to realize parallel lateral scanning technique. Operated with certain pattern, the DMD generates a virtual pinholes array which separates the light into multi-beams. The key parameters that affect the measurement (pinhole size and the lateral scanning distance) can be configured conveniently by different patterns sent to DMD chip. To avoid disturbance between two virtual pinholes working at the same time, a scanning strategy is adopted. Depth response curve both axial and abaxial were extract. Measurement experiments have been carried out on silicon structured sample, and axial resolution of 55nm is achieved.
Parallel Continuous Flow: A Parallel Suffix Tree Construction Tool for Whole Genomes
Farreras, Montse
2014-01-01
Abstract The construction of suffix trees for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become more complex everyday, requiring fast queries to multiple genomes. In this article, we present parallel continuous flow (PCF), a parallel suffix tree construction method that is suitable for very long genomes. We tested our method for the suffix tree construction of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input genome grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the human genome in 7 minutes using 172 processes. PMID:24597675
A Massively Parallel Code for Polarization Calculations
NASA Astrophysics Data System (ADS)
Akiyama, Shizuka; Höflich, Peter
2001-03-01
We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
One-step trinary signed-digit arithmetic using an efficient encoding scheme
NASA Astrophysics Data System (ADS)
Salim, W. Y.; Fyath, R. S.; Ali, S. A.; Alam, Mohammad S.
2000-11-01
The trinary signed-digit (TSD) number system is of interest for ultra fast optoelectronic computing systems since it permits parallel carry-free addition and borrow-free subtraction of two arbitrary length numbers in constant time. In this paper, a simple coding scheme is proposed to encode the decimal number directly into the TSD form. The coding scheme enables one to perform parallel one-step TSD arithmetic operation. The proposed coding scheme uses only a 5-combination coding table instead of the 625-combination table reported recently for recoded TSD arithmetic technique.
Design, fabrication and characterization of a micro-fluxgate intended for parallel robot application
NASA Astrophysics Data System (ADS)
Kirchhoff, M. R.; Bogdanski, G.; Büttgenbach, S.
2009-05-01
This paper presents a micro-magnetometer based on the fluxgate principle. Fluxgates detect the magnitude and direction of DC and low-frequency AC magnetic fields. The detectable flux density typically ranges from several 10 nT to about 1 mT. The introduced fluxgate sensor is fabricated using MEMS-technologies, basically UV depth lithography and electroplating for manufacturing high aspect ratio structures. It consists of helical copper coils around a soft magnetic nickel-iron (NiFe) core. The core is designed in so-called racetrack geometry, whereby the directional sensitivity of the sensor is considerably higher compared to common ring-core fluxgates. The electrical operation is based on analyzing the 2nd harmonic of the AC output signal. Configuration, manufacturing and selected characteristics of the fluxgate magnetometer are discussed in this work. The fluxgate builds the basis of an innovative angular sensor system for a parallel robot with HEXA-structure. Integrated into the passive joints of the parallel robot, the fluxgates are combined with permanent magnets rotating on the joint shafts. The magnet transmits the angular information via its magnetic orientation. In this way, the angles between the kinematic elements are measured, which allows self-calibration of the robot and the fast analytical solution of direct kinematics for an advanced workspace monitoring.
NASA Astrophysics Data System (ADS)
Adamek, J.; Seidl, J.; Horacek, J.; Komm, M.; Eich, T.; Panek, R.; Cavalier, J.; Devitre, A.; Peterka, M.; Vondracek, P.; Stöckel, J.; Sestak, D.; Grover, O.; Bilkova, P.; Böhm, P.; Varju, J.; Havranek, A.; Weinzettl, V.; Lovell, J.; Dimitrova, M.; Mitosinkova, K.; Dejarnac, R.; Hron, M.; The COMPASS Team; The EUROfusion MST1 Team
2017-11-01
A new system of probes was recently installed in the divertor of tokamak COMPASS in order to investigate the ELM energy density with high spatial and temporal resolution. The new system consists of two arrays of rooftop-shaped Langmuir probes (LPs) used to measure the floating potential or the ion saturation current density and one array of Ball-pen probes (BPPs) used to measure the plasma potential with a spatial resolution of ~3.5 mm. The combination of floating BPPs and LPs yields the electron temperature with microsecond temporal resolution. We report on the design of the new divertor probe arrays and first results of electron temperature profile measurements in ELMy H-mode and L-mode. We also present comparative measurements of the parallel heat flux using the new probe arrays and fast infrared termography (IR) data during L-mode with excellent agreement between both techniques using a heat power transmission coefficient γ = 7. The ELM energy density {{\\varepsilon }\\parallel } was measured during a set of NBI assisted ELMy H-mode discharges. The peak values of {{\\varepsilon }\\parallel } were compared with those predicted by model and with experimental data from JET, AUG and MAST with a good agreement.
Hesford, Andrew J; Tillett, Jason C; Astheimer, Jeffrey P; Waag, Robert C
2014-08-01
Accurate and efficient modeling of ultrasound propagation through realistic tissue models is important to many aspects of clinical ultrasound imaging. Simplified problems with known solutions are often used to study and validate numerical methods. Greater confidence in a time-domain k-space method and a frequency-domain fast multipole method is established in this paper by analyzing results for realistic models of the human breast. Models of breast tissue were produced by segmenting magnetic resonance images of ex vivo specimens into seven distinct tissue types. After confirming with histologic analysis by pathologists that the model structures mimicked in vivo breast, the tissue types were mapped to variations in sound speed and acoustic absorption. Calculations of acoustic scattering by the resulting model were performed on massively parallel supercomputer clusters using parallel implementations of the k-space method and the fast multipole method. The efficient use of these resources was confirmed by parallel efficiency and scalability studies using large-scale, realistic tissue models. Comparisons between the temporal and spectral results were performed in representative planes by Fourier transforming the temporal results. An RMS field error less than 3% throughout the model volume confirms the accuracy of the methods for modeling ultrasound propagation through human breast.
Catastrophic onset of fast magnetic reconnection with a guide field
NASA Astrophysics Data System (ADS)
Cassak, P. A.; Drake, J. F.; Shay, M. A.
2007-05-01
It was recently shown that the slow (collisional) Sweet-Parker and the fast (collisionless) Hall magnetic reconnection solutions simultaneously exist for a wide range of resistivities; reconnection is bistable [Cassak, Shay, and Drake, Phys. Rev. Lett., 95, 235002 (2005)]. When the thickness of the dissipation region becomes smaller than a critical value, the Sweet-Parker solution disappears and fast reconnection ensues, potentially explaining how large amounts of magnetic free energy can accrue without significant release before the onset of fast reconnection. Two-fluid numerical simulations extending the previous results for anti-parallel reconnection (where the critical thickness is the ion skin depth) to component reconnection with a large guide field (where the critical thickness is the thermal ion Larmor radius) are presented. Applications to laboratory experiments of magnetic reconnection and the sawtooth crash are discussed.
NASA Astrophysics Data System (ADS)
Reerink, Thomas J.; van de Berg, Willem Jan; van de Wal, Roderik S. W.
2016-11-01
This paper accompanies the second OBLIMAP open-source release. The package is developed to map climate fields between a general circulation model (GCM) and an ice sheet model (ISM) in both directions by using optimal aligned oblique projections, which minimize distortions. The curvature of the surfaces of the GCM and ISM grid differ, both grids may be irregularly spaced and the ratio of the grids is allowed to differ largely. OBLIMAP's stand-alone version is able to map data sets that differ in various aspects on the same ISM grid. Each grid may either coincide with the surface of a sphere, an ellipsoid or a flat plane, while the grid types might differ. Re-projection of, for example, ISM data sets is also facilitated. This is demonstrated by relevant applications concerning the major ice caps. As the stand-alone version also applies to the reverse mapping direction, it can be used as an offline coupler. Furthermore, OBLIMAP 2.0 is an embeddable GCM-ISM coupler, suited for high-frequency online coupled experiments. A new fast scan method is presented for structured grids as an alternative for the former time-consuming grid search strategy, realising a performance gain of several orders of magnitude and enabling the mapping of high-resolution data sets with a much larger number of grid nodes. Further, a highly flexible masked mapping option is added. The limitation of the fast scan method with respect to unstructured and adaptive grids is discussed together with a possible future parallel Message Passing Interface (MPI) implementation.
High-Throughput, Adaptive FFT Architecture for FPGA-Based Spaceborne Data Processors
NASA Technical Reports Server (NTRS)
NguyenKobayashi, Kayla; Zheng, Jason X.; He, Yutao; Shah, Biren N.
2011-01-01
Exponential growth in microelectronics technology such as field-programmable gate arrays (FPGAs) has enabled high-performance spaceborne instruments with increasing onboard data processing capabilities. As a commonly used digital signal processing (DSP) building block, fast Fourier transform (FFT) has been of great interest in onboard data processing applications, which needs to strike a reasonable balance between high-performance (throughput, block size, etc.) and low resource usage (power, silicon footprint, etc.). It is also desirable to be designed so that a single design can be reused and adapted into instruments with different requirements. The Multi-Pass Wide Kernel FFT (MPWK-FFT) architecture was developed, in which the high-throughput benefits of the parallel FFT structure and the low resource usage of Singleton s single butterfly method is exploited. The result is a wide-kernel, multipass, adaptive FFT architecture. The 32K-point MPWK-FFT architecture includes 32 radix-2 butterflies, 64 FIFOs to store the real inputs, 64 FIFOs to store the imaginary inputs, complex twiddle factor storage, and FIFO logic to route the outputs to the correct FIFO. The inputs are stored in sequential fashion into the FIFOs, and the outputs of each butterfly are sequentially written first into the even FIFO, then the odd FIFO. Because of the order of the outputs written into the FIFOs, the depth of the even FIFOs, which are 768 each, are 1.5 times larger than the odd FIFOs, which are 512 each. The total memory needed for data storage, assuming that each sample is 36 bits, is 2.95 Mbits. The twiddle factors are stored in internal ROM inside the FPGA for fast access time. The total memory size to store the twiddle factors is 589.9Kbits. This FFT structure combines the benefits of high throughput from the parallel FFT kernels and low resource usage from the multi-pass FFT kernels with desired adaptability. Space instrument missions that need onboard FFT capabilities such as the proposed DESDynl, SWOT (Surface Water Ocean Topography), and Europa sounding radar missions would greatly benefit from this technology with significant reductions in non-recurring cost and risk.
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
NASA Astrophysics Data System (ADS)
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.
GPU Lossless Hyperspectral Data Compression System
NASA Technical Reports Server (NTRS)
Aranki, Nazeeh I.; Keymeulen, Didier; Kiely, Aaron B.; Klimesh, Matthew A.
2014-01-01
Hyperspectral imaging systems onboard aircraft or spacecraft can acquire large amounts of data, putting a strain on limited downlink and storage resources. Onboard data compression can mitigate this problem but may require a system capable of a high throughput. In order to achieve a high throughput with a software compressor, a graphics processing unit (GPU) implementation of a compressor was developed targeting the current state-of-the-art GPUs from NVIDIA(R). The implementation is based on the fast lossless (FL) compression algorithm reported in "Fast Lossless Compression of Multispectral-Image Data" (NPO- 42517), NASA Tech Briefs, Vol. 30, No. 8 (August 2006), page 26, which operates on hyperspectral data and achieves excellent compression performance while having low complexity. The FL compressor uses an adaptive filtering method and achieves state-of-the-art performance in both compression effectiveness and low complexity. The new Consultative Committee for Space Data Systems (CCSDS) Standard for Lossless Multispectral & Hyperspectral image compression (CCSDS 123) is based on the FL compressor. The software makes use of the highly-parallel processing capability of GPUs to achieve a throughput at least six times higher than that of a software implementation running on a single-core CPU. This implementation provides a practical real-time solution for compression of data from airborne hyperspectral instruments.
Evaluating the performance of the particle finite element method in parallel architectures
NASA Astrophysics Data System (ADS)
Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.
2014-05-01
This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.
Design of a real-time wind turbine simulator using a custom parallel architecture
NASA Technical Reports Server (NTRS)
Hoffman, John A.; Gluck, R.; Sridhar, S.
1995-01-01
The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
NASA Astrophysics Data System (ADS)
Ohtani, S.; Nose, M.; Miyashita, Y.; Lui, A.
2014-12-01
We investigate the responses of different ion species (H+, He+, He++, and O+) to fast plasma flows and local dipolarization in the plasma sheet in terms of energy density. We use energetic (9-210 keV) ion composition measurements made by the Geotail satellite at r = 10~31 RE. The results are summarized as follows: (1) whereas the O+-to-H+ ratio decreases with earthward flow velocity, it increases with tailward flow velocity with Vx dependence steeper for perpendicular flows than for parallel flows; (2) for fast earthward flows, the energy density of each ion species increases without any clear preference for heavy ions; (3) for fast tailward flows the ion energy density increases initially, then it decreases to below pre-flow levels except for O+; (4) the O+-to-H+ ratio does not increase through local dipolarization irrespective of dipolarization amplitude, background BZ, X distance, and VX; (5) in general, the H+ and He++ ions behave similarly. Result (1) can be attributed to radial transport along with the earthward increase of the background O+-to-H+ ratio. Results (2) and (4) indicate that ion energization associated with local dipolarization is not mass-dependent possibly because in the energy range of our interest the ions are not magnetized irrespective of species. In the tailward outflow region of reconnection, where the plasma sheet becomes thinner, the H+ ions escape along the field line more easily than the O+ ions, which possibly explains result (3). Result (5) suggests that the solar wind is the primary source of the high-energy H+ ions.
A method of fast mosaic for massive UAV images
NASA Astrophysics Data System (ADS)
Xiang, Ren; Sun, Min; Jiang, Cheng; Liu, Lei; Zheng, Hui; Li, Xiaodong
2014-11-01
With the development of UAV technology, UAVs are used widely in multiple fields such as agriculture, forest protection, mineral exploration, natural disaster management and surveillances of public security events. In contrast of traditional manned aerial remote sensing platforms, UAVs are cheaper and more flexible to use. So users can obtain massive image data with UAVs, but this requires a lot of time to process the image data, for example, Pix4UAV need approximately 10 hours to process 1000 images in a high performance PC. But disaster management and many other fields require quick respond which is hard to realize with massive image data. Aiming at improving the disadvantage of high time consumption and manual interaction, in this article a solution of fast UAV image stitching is raised. GPS and POS data are used to pre-process the original images from UAV, belts and relation between belts and images are recognized automatically by the program, in the same time useless images are picked out. This can boost the progress of finding match points between images. Levenberg-Marquard algorithm is improved so that parallel computing can be applied to shorten the time of global optimization notably. Besides traditional mosaic result, it can also generate superoverlay result for Google Earth, which can provide a fast and easy way to show the result data. In order to verify the feasibility of this method, a fast mosaic system of massive UAV images is developed, which is fully automated and no manual interaction is needed after original images and GPS data are provided. A test using 800 images of Kelan River in Xinjiang Province shows that this system can reduce 35%-50% time consumption in contrast of traditional methods, and increases respond speed of UAV image processing rapidly.
NASA Astrophysics Data System (ADS)
Fontaine, Fabrice J.; Rabinowicz, M.; Cannat, M.
2017-05-01
We present numerical models to explore possible couplings along the axis of fast-spreading ridges, between hydrothermal convection in the upper crust and magmatic flow in the lower crust. In an end-member category of models corresponding to effective viscosities μM lower than 1013 Pa.s in a melt-rich lower crustal along-axis corridor and permeability k not exceeding ˜10-16 m2 in the upper crust, the hot, melt-rich, gabbroic lower crust convects as a viscous fluid, with convection rolls parallel to the ridge axis. In these models, we show that the magmatic-hydrothermal interface settles at realistic depths for fast ridges, i.e., 1-2 km below seafloor. Convection cells in both horizons are strongly coupled and kilometer-wide hydrothermal upflows/plumes, spaced by 8-10 km, arise on top of the magmatic upflows. Such magmatic-hydrothermal convective couplings may explain the distribution of vent fields along the East (EPR) and South-East Pacific Rise (SEPR). The lower crustal plumes deliver melt locally at the top of the magmatic horizon possibly explaining the observed distribution of melt-rich regions/pockets in the axial melt lenses of EPR and SEPR. Crystallization of this melt provides the necessary latent heat to sustain permanent ˜100 MW vents fields. Our models also contribute to current discussions on how the lower crust forms at fast ridges: they provide a possible mechanism for focused transport of melt-rich crystal mushes from moho level to the axial melt lens where they further crystallize, feed eruptions, and are transported both along and off-axis to produce the lower crust.
Postprandial Monocyte Activation in Individuals With Metabolic Syndrome
Khan, Ilvira M.; Pokharel, Yashashwi; Dadu, Razvan T.; Lewis, Dorothy E.; Hoogeveen, Ron C.; Wu, Huaizhu
2016-01-01
Context: Postprandial hyperlipidemia has been suggested to contribute to atherogenesis by inducing proinflammatory changes in monocytes. Individuals with metabolic syndrome (MS), shown to have higher blood triglyceride concentration and delayed triglyceride clearance, may thus have increased risk for development of atherosclerosis. Objective: Our objective was to examine fasting levels and effects of a high-fat meal on phenotypes of monocyte subsets in individuals with obesity and MS and in healthy controls. Design, Setting, Participants, Intervention: Individuals with obesity and MS and gender- and age-matched healthy controls were recruited. Blood was collected from participants after an overnight fast (baseline) and at 3 and 5 hours after ingestion of a high-fat meal. At each time point, monocyte phenotypes were examined by multiparameter flow cytometry. Main Outcome Measures: Baseline levels of activation markers and postprandial inflammatory response in each of the three monocyte subsets were measured. Results: At baseline, individuals with obesity and MS had higher proportions of circulating lipid-laden foamy monocytes than controls, which were positively correlated with fasting triglyceride levels. Additionally, the MS group had increased counts of nonclassical monocytes, higher CD11c, CX3CR1, and human leukocyte antigen-DR levels on intermediate monocytes, and higher CCR5 and tumor necrosis factor-α levels on classical monocytes in the circulation. Postprandial triglyceride increases in both groups were paralleled by upregulation of lipid-laden foamy monocytes. MS, but not control, subjects had significant postprandial increases of CD11c and percentages of IL-1β+ and tumor necrosis factor-α+ cells in nonclassical monocytes. Conclusions: Compared to controls, individuals with obesity and MS had increased fasting and postprandial monocyte lipid accumulation and activation. PMID:27575945
NASA Astrophysics Data System (ADS)
Malloy, Matt; Thiel, Brad; Bunday, Benjamin D.; Wurm, Stefan; Jindal, Vibhu; Mukhtar, Maseeh; Quoi, Kathy; Kemen, Thomas; Zeidler, Dirk; Eberle, Anna Lena; Garbowski, Tomasz; Dellemann, Gregor; Peters, Jan Hendrik
2015-09-01
The new device architectures and materials being introduced for sub-10nm manufacturing, combined with the complexity of multiple patterning and the need for improved hotspot detection strategies, have pushed current wafer inspection technologies to their limits. In parallel, gaps in mask inspection capability are growing as new generations of mask technologies are developed to support these sub-10nm wafer manufacturing requirements. In particular, the challenges associated with nanoimprint and extreme ultraviolet (EUV) mask inspection require new strategies that enable fast inspection at high sensitivity. The tradeoffs between sensitivity and throughput for optical and e-beam inspection are well understood. Optical inspection offers the highest throughput and is the current workhorse of the industry for both wafer and mask inspection. E-beam inspection offers the highest sensitivity but has historically lacked the throughput required for widespread adoption in the manufacturing environment. It is unlikely that continued incremental improvements to either technology will meet tomorrow's requirements, and therefore a new inspection technology approach is required; one that combines the high-throughput performance of optical with the high-sensitivity capabilities of e-beam inspection. To support the industry in meeting these challenges SUNY Poly SEMATECH has evaluated disruptive technologies that can meet the requirements for high volume manufacturing (HVM), for both the wafer fab [1] and the mask shop. Highspeed massively parallel e-beam defect inspection has been identified as the leading candidate for addressing the key gaps limiting today's patterned defect inspection techniques. As of late 2014 SUNY Poly SEMATECH completed a review, system analysis, and proof of concept evaluation of multiple e-beam technologies for defect inspection. A champion approach has been identified based on a multibeam technology from Carl Zeiss. This paper includes a discussion on the need for high-speed e-beam inspection and then provides initial imaging results from EUV masks and wafers from 61 and 91 beam demonstration systems. Progress towards high resolution and consistent intentional defect arrays (IDA) is also shown.
Fast and confident: postdicting eyewitness identification accuracy in a field study.
Sauerland, Melanie; Sporer, Siegfried L
2009-03-01
The combined postdictive value of postdecision confidence, decision time, and Remember-Know-Familiar (RKF) judgments as markers of identification accuracy was evaluated with 10 targets and 720 participants. In a pedestrian area, passers-by were asked for directions. Identifications were made from target-absent or target-present lineups. Fast (optimum time boundary at 6 seconds) and confident (optimum confidence boundary at 90%) witnesses were highly accurate, slow and nonconfident witnesses highly inaccurate. Although this combination of postdictors was clearly superior to using either postdictor by itself these combinations refer only to a subsample of choosers. Know answers were associated with higher identification performance than Familiar answers, with no difference between Remember and Know answers. The results of participants' post hoc decision time estimates paralleled those with measured decision times. To explore decision strategies of nonchoosers, three subgroups were formed according to their reasons given for rejecting the lineup. Nonchoosers indicating that the target had simply been absent made faster and more confident decisions than nonchoosers stating lack of confidence or lack of memory. There were no significant differences with regard to identification performance across nonchooser groups. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
X-PROP: a fast and robust diffusion-weighted propeller technique.
Li, Zhiqiang; Pipe, James G; Lee, Chu-Yu; Debbins, Josef P; Karis, John P; Huo, Donglai
2011-08-01
Diffusion-weighted imaging (DWI) has shown great benefits in clinical MR exams. However, current DWI techniques have shortcomings of sensitivity to distortion or long scan times or combinations of the two. Diffusion-weighted echo-planar imaging (EPI) is fast but suffers from severe geometric distortion. Periodically rotated overlapping parallel lines with enhanced reconstruction diffusion-weighted imaging (PROPELLER DWI) is free of geometric distortion, but the scan time is usually long and imposes high Specific Absorption Rate (SAR) especially at high fields. TurboPROP was proposed to accelerate the scan by combining signal from gradient echoes, but the off-resonance artifacts from gradient echoes can still degrade the image quality. In this study, a new method called X-PROP is presented. Similar to TurboPROP, it uses gradient echoes to reduce the scan time. By separating the gradient and spin echoes into individual blades and removing the off-resonance phase, the off-resonance artifacts in X-PROP are minimized. Special reconstruction processes are applied on these blades to correct for the motion artifacts. In vivo results show its advantages over EPI, PROPELLER DWI, and TurboPROP techniques. Copyright © 2011 Wiley-Liss, Inc.
REPEATING FAST RADIO BURSTS FROM HIGHLY MAGNETIZED PULSARS TRAVELING THROUGH ASTEROID BELTS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Z. G.; Wang, J. S.; Huang, Y. F.
Very recently, Spitler et al. and Scholz et al. reported their detections of 16 additional bright bursts in the direction of the fast radio burst (FRB) 121102. This repeating FRB is inconsistent with all of the catastrophic event models put forward previously for hypothetically non-repeating FRBs. Here, we propose a different model, in which highly magnetized pulsars travel through the asteroid belts of other stars. We show that a repeating FRB could originate from such a pulsar encountering a large number of asteroids in the belt. During each pulsar-asteroid impact, an electric field induced outside of the asteroid has suchmore » a large component parallel to the stellar magnetic field that electrons are torn off the asteroidal surface and accelerated to ultra-relativistic energies instantaneously. The subsequent movement of these electrons along magnetic field lines will cause coherent curvature radiation, which can account for all of the properties of an FRB. In addition, this model can self-consistently explain the typical duration, luminosity, and repetitive rate of the 17 bursts of FRB 121102. The predicted occurrence rate of repeating FRB sources may imply that our model would be testable in the next few years.« less
GPU-based ultra-fast dose calculation using a finite size pencil beam model.
Gu, Xuejun; Choi, Dongju; Men, Chunhua; Pan, Hubert; Majumdar, Amitava; Jiang, Steve B
2009-10-21
Online adaptive radiation therapy (ART) is an attractive concept that promises the ability to deliver an optimal treatment in response to the inter-fraction variability in patient anatomy. However, it has yet to be realized due to technical limitations. Fast dose deposit coefficient calculation is a critical component of the online planning process that is required for plan optimization of intensity-modulated radiation therapy (IMRT). Computer graphics processing units (GPUs) are well suited to provide the requisite fast performance for the data-parallel nature of dose calculation. In this work, we develop a dose calculation engine based on a finite-size pencil beam (FSPB) algorithm and a GPU parallel computing framework. The developed framework can accommodate any FSPB model. We test our implementation in the case of a water phantom and the case of a prostate cancer patient with varying beamlet and voxel sizes. All testing scenarios achieved speedup ranging from 200 to 400 times when using a NVIDIA Tesla C1060 card in comparison with a 2.27 GHz Intel Xeon CPU. The computational time for calculating dose deposition coefficients for a nine-field prostate IMRT plan with this new framework is less than 1 s. This indicates that the GPU-based FSPB algorithm is well suited for online re-planning for adaptive radiotherapy.
Ramdane, Said; Daoudi-Gueddah, Doria
2011-08-01
We examined retrospectively the concurrent relationships between fasting plasma total cholesterol, triglycerides, and glucose levels, and Alzheimer's disease (AD), in a clinical setting-based study. Total cholesterol level was higher in patients with AD compared to elderly controls; triglycerides or glucose levels did not significantly differ between the 2 groups. Respective plotted trajectories of change in cholesterol level across age were fairly parallel. No significant difference in total cholesterol levels was recorded between patients with AD classified by the Clinical Dementia Rating (CDR) score subgroups. These results suggest that patients with AD have relative mild total hypercholesterolemia, normal triglyceridemia, and normal fasting plasma glucose level. Mild total hypercholesterolemia seems to be permanent across age, and across dementia severity staging, and fairly parallels the trajectory of age-related change in total cholesterolemia of healthy controls. We speculate that these biochemical parameters pattern may be present long before-a decade at least-the symptomatic onset of the disease.
Seismic anisotropy and large-scale deformation of the Eastern Alps
NASA Astrophysics Data System (ADS)
Bokelmann, Götz; Qorbani, Ehsan; Bianchi, Irene
2013-12-01
Mountain chains at the Earth's surface result from deformation processes within the Earth. Such deformation processes can be observed by seismic anisotropy, via the preferred alignment of elastically anisotropic minerals. The Alps show complex deformation at the Earth's surface. In contrast, we show here that observations of seismic anisotropy suggest a relatively simple pattern of internal deformation. Together with earlier observations from the Western Alps, the SKS shear-wave splitting observations presented here show one of the clearest examples yet of mountain chain-parallel fast orientations worldwide, with a simple pattern nearly parallel to the trend of the mountain chain. In the Eastern Alps, the fast orientations do not connect with neighboring mountain chains, neither the present-day Carpathians, nor the present-day Dinarides. In that region, the lithosphere is thin and the observed anisotropy thus resides within the asthenosphere. The deformation is consistent with the eastward extrusion toward the Pannonian basin that was previously suggested based on seismicity and surface geology.
A fast algorithm for computer aided collimation gamma camera (CACAO)
NASA Astrophysics Data System (ADS)
Jeanguillaume, C.; Begot, S.; Quartuccio, M.; Douiri, A.; Franck, D.; Pihet, P.; Ballongue, P.
2000-08-01
The computer aided collimation gamma camera is aimed at breaking down the resolution sensitivity trade-off of the conventional parallel hole collimator. It uses larger and longer holes, having an added linear movement at the acquisition sequence. A dedicated algorithm including shift and sum, deconvolution, parabolic filtering and rotation is described. Examples of reconstruction are given. This work shows that a simple and fast algorithm, based on a diagonal dominant approximation of the problem can be derived. Its gives a practical solution to the CACAO reconstruction problem.
Large-Constraint-Length, Fast Viterbi Decoder
NASA Technical Reports Server (NTRS)
Collins, O.; Dolinar, S.; Hsu, In-Shek; Pollara, F.; Olson, E.; Statman, J.; Zimmerman, G.
1990-01-01
Scheme for efficient interconnection makes VLSI design feasible. Concept for fast Viterbi decoder provides for processing of convolutional codes of constraint length K up to 15 and rates of 1/2 to 1/6. Fully parallel (but bit-serial) architecture developed for decoder of K = 7 implemented in single dedicated VLSI circuit chip. Contains six major functional blocks. VLSI circuits perform branch metric computations, add-compare-select operations, and then store decisions in traceback memory. Traceback processor reads appropriate memory locations and puts out decoded bits. Used as building block for decoders of larger K.
Houser, Dorian S.; Champagne, Cory D.; Crocker, Daniel E.
2013-01-01
Insulin resistance in modern society is perceived as a pathological consequence of excess energy consumption and reduced physical activity. Its presence in relation to the development of cardiovascular risk factors has been termed the metabolic syndrome, which produces increased mortality and morbidity and which is rapidly increasing in human populations. Ironically, insulin resistance likely evolved to assist animals during food shortages by increasing the availability of endogenous lipid for catabolism while protecting protein from use in gluconeogenesis and eventual oxidation. Some species that incorporate fasting as a predictable component of their life history demonstrate physiological traits similar to the metabolic syndrome during prolonged fasts. One such species is the northern elephant seal (Mirounga angustirostris), which fasts from food and water for periods of up to 4 months. During this time, ∼90% of the seals metabolic demands are met through fat oxidation and circulating non-esterified fatty acids are high (0.7–3.2 mM). All life history stages of elephant seal studied to date demonstrate insulin resistance and fasting hyperglycemia as well as variations in hormones and adipocytokines that reflect the metabolic syndrome to some degree. Elephant seals demonstrate some intriguing adaptations with the potential for medical advancement; for example, ketosis is negligible despite significant and prolonged fatty acid oxidation and investigation of this feature might provide insight into the treatment of diabetic ketoacidosis. The parallels to the metabolic syndrome are likely reflected to varying degrees in other marine mammals, most of which evolved on diets high in lipid and protein content but essentially devoid of carbohydrate. Utilization of these natural models of insulin resistance may further our understanding of the pathophysiology of the metabolic syndrome in humans and better assist the development of preventative measures and therapies. PMID:24198811
An implementation of a tree code on a SIMD, parallel computer
NASA Technical Reports Server (NTRS)
Olson, Kevin M.; Dorband, John E.
1994-01-01
We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-07
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
Bio-inspired approach for intelligent unattended ground sensors
NASA Astrophysics Data System (ADS)
Hueber, Nicolas; Raymond, Pierre; Hennequin, Christophe; Pichler, Alexander; Perrot, Maxime; Voisin, Philippe; Moeglin, Jean-Pierre
2015-05-01
Improving the surveillance capacity over wide zones requires a set of smart battery-powered Unattended Ground Sensors capable of issuing an alarm to a decision-making center. Only high-level information has to be sent when a relevant suspicious situation occurs. In this paper we propose an innovative bio-inspired approach that mimics the human bi-modal vision mechanism and the parallel processing ability of the human brain. The designed prototype exploits two levels of analysis: a low-level panoramic motion analysis, the peripheral vision, and a high-level event-focused analysis, the foveal vision. By tracking moving objects and fusing multiple criteria (size, speed, trajectory, etc.), the peripheral vision module acts as a fast relevant event detector. The foveal vision module focuses on the detected events to extract more detailed features (texture, color, shape, etc.) in order to improve the recognition efficiency. The implemented recognition core is able to acquire human knowledge and to classify in real-time a huge amount of heterogeneous data thanks to its natively parallel hardware structure. This UGS prototype validates our system approach under laboratory tests. The peripheral analysis module demonstrates a low false alarm rate whereas the foveal vision correctly focuses on the detected events. A parallel FPGA implementation of the recognition core succeeds in fulfilling the embedded application requirements. These results are paving the way of future reconfigurable virtual field agents. By locally processing the data and sending only high-level information, their energy requirements and electromagnetic signature are optimized. Moreover, the embedded Artificial Intelligence core enables these bio-inspired systems to recognize and learn new significant events. By duplicating human expertise in potentially hazardous places, our miniature visual event detector will allow early warning and contribute to better human decision making.
González-Ortiz, Manuel; Balcázar-Muñoz, Blanca R; Mora-Martínez, José M; Martínez-Abundis, Esperanza
2004-09-01
The objective of this study was to evaluate the effect of a high fat or high carbohydrate breakfast on postprandial lipid profile in healthy subjects with or without family history of type 2 diabetes mellitus. A single blind, controlled clinical trial with parallel groups was performed in 20 healthy subjects; 10 subjects with family history of type 2 diabetes mellitus and 10 individuals without that background. Each group was randomized to receive a high fat or high carbohidrate breakfast. A metabolic profile that included fasting and postprandial lipids, as well as, the assessment of insulin sensitivity were performed. Lower high-lipoprotein cholesterol (p < 0.02) and apolipoprotein A1 (p < 0.03) concentrations were found in subjects with family history of type 2 diabetes mellitus than those without that background. In this same above mentioned group with the high carbohydrate breakfast, there were significant increments in apoliprotein B at minute 300 (p < 0.03) and in triglycerides at minute 360 (p < 0.03). In the group without family history of diabetes that received the high fat breakfast, there were increments in triglycerides (p < 0.03) and very-low density lipoprotein concentrations at minute 180 (p < 0.03). In conclusion, healthy subjects with family history of type 2 diabetes showed some atherogenic characteristics in their metabolic profile, and the high carbohydrate breakfast produced in them increments in apolipoprotein B and in triglycerides, meanwhile that, in those subjects without such background the high fast breakfast produced unfavorable effects on their lipid concentrations.
Evaluation and application of a fast module in a PLC based interlock and control system
NASA Astrophysics Data System (ADS)
Zaera-Sanz, M.
2009-08-01
The LHC Beam Interlock system requires a controller performing a simple matrix function to collect the different beam dump requests. To satisfy the expected safety level of the Interlock, the system should be robust and reliable. The PLC is a promising candidate to fulfil both aspects but too slow to meet the expected response time which is of the order of μseconds. Siemens has introduced a ``so called'' fast module (FM352-5 Boolean Processor). It provides independent and extremely fast control of a process within a larger control system using an onboard processor, a Field Programmable Gate Array (FPGA), to execute code in parallel which results in extremely fast scan times. It is interesting to investigate its features and to evaluate it as a possible candidate for the beam interlock system. This paper publishes the results of this study. As well, this paper could be useful for other applications requiring fast processing using a PLC.
Liu, Tiemin; Kong, Dong; Shah, Bhavik P.; Ye, Chianping; Koda, Shuichi; Saunders, Arpiar; Ding, Jun B.; Yang, Zongfang; Sabatini, Bernardo L.; Lowell, Bradford B.
2012-01-01
SUMMARY AgRP neuron activity drives feeding and weight gain while that of nearby POMC neurons does the opposite. However, the role of excitatory glutamatergic input in controlling these neurons is unknown. To address this question, we generated mice lacking NMDA receptors (NMDARs) on either AgRP or POMC neurons. Deletion of NMDARs from AgRP neurons markedly reduced weight, body fat and food intake whereas deletion from POMC neurons had no effect. Activation of AgRP neurons by fasting, as assessed by c-Fos, Agrp and Npy mRNA expression, AMPA receptor-mediated EPSCs, depolarization and firing rates, required NMDARs. Furthermore, AgRP but not POMC neurons have dendritic spines and increased glutamatergic input onto AgRP neurons caused by fasting was paralleled by an increase in spines, suggesting fasting induced synaptogenesis and spinogenesis. Thus glutamatergic synaptic transmission and its modulation by NMDARs play key roles in controlling AgRP neurons and determining the cellular and behavioral response to fasting. PMID:22325203
NASA Astrophysics Data System (ADS)
Yu, Y.; Reed, C. A.; Gao, S. S.; Liu, K. H.; Massinque, B.; Mdala, H. S.; Chindandali, P. R. N.; Moidaki, M.; Mutamina, D. M.
2014-12-01
In spite of numerous geoscientific studies, the mechanisms responsible for the initiation and development of continental rifts are still poorly understood. The key information required to constrain various geodynamic models on rift initiation can be derived from the crust/mantle structure and anisotropy beneath incipient rifts such as the Southern and Southwestern branches of the East African Rift System. As part of a National Science Foundation funded interdisciplinary project, 50 PASSCAL broadband seismic stations were deployed across the Malawi, Luangwa, and Okavango rift zones from the summer of 2012 to the summer of 2014. Preliminary results from these 50 SAFARI (Seismic Arrays for African Rift Initiation) and adjacent stations are presented utilizing shear-wave splitting (SWS) and P-S receiver function techniques. 1109 pairs of high-quality SWS measurements, consisting of fast polarization orientations and splitting times, have been obtained from a total of 361 seismic events. The results demonstrate dominantly NE-SW fast orientations throughout Botswana as well as along the northwestern flank of the Luangwa rift valley. Meanwhile, fast orientations beneath the eastern Luangwa rift flank rotate from NNW to NNE along the western border of the Malawi rift. Stations located alongside the western Malawi rift border faults yield ENE fast orientations, with stations situated in Mozambique exhibiting more E-W orientations. In the northern extent of the study region, fast orientations parallel the trend of the Rukwa and Usangu rift basins. Receiver function results reveal that, relative to the adjacent Pan-African mobile belts, the Luangwa rift zone has a thin (30 to 35 km) crust. The crustal thickness within the Okavango rift basin is highly variable. Preliminary findings indicate a northeastward thinning along the southeast Okavango border fault system congruent with decreasing extension toward the southwest. The Vp/Vs measurements in the Okavango basin are roughly 1.75 on average, suggesting an unmodified crustal composition, while those of the Luangwa and southern Malawi rift zones are relatively high, probably suggesting ancient or ongoing magmatic emplacement. The Pan-African mobile belts enveloping the rift zones are mostly characterized by more felsic and thicker crust.
Particle-in-cell studies of fast-ion slowing-down rates in cool tenuous magnetized plasma
NASA Astrophysics Data System (ADS)
Evans, Eugene S.; Cohen, Samuel A.; Welch, Dale R.
2018-04-01
We report on 3D-3V particle-in-cell simulations of fast-ion energy-loss rates in a cold, weakly-magnetized, weakly-coupled plasma where the electron gyroradius, ρe, is comparable to or less than the Debye length, λDe, and the fast-ion velocity exceeds the electron thermal velocity, a regime in which the electron response may be impeded. These simulations use explicit algorithms, spatially resolve ρe and λDe, and temporally resolve the electron cyclotron and plasma frequencies. For mono-energetic dilute fast ions with isotropic velocity distributions, these scaling studies of the slowing-down time, τs, versus fast-ion charge are in agreement with unmagnetized slowing-down theory; with an applied magnetic field, no consistent anisotropy between τs in the cross-field and field-parallel directions could be resolved. Scaling the fast-ion charge is confirmed as a viable way to reduce the required computational time for each simulation. The implications of these slowing down processes are described for one magnetic-confinement fusion concept, the small, advanced-fuel, field-reversed configuration device.
NASA Astrophysics Data System (ADS)
Zhu, Dan; Shang, Jing; Ye, Xiaodong; Shen, Jian
2016-12-01
The understanding of macromolecular structures and interactions is important but difficult, due to the facts that a macromolecules are of versatile conformations and aggregate states, which vary with environmental conditions and histories. In this work two polyamides with parallel or anti-parallel dipoles along the linear backbone, named as ABAB (parallel) and AABB (anti-parallel) have been studied. By using a combination of methods, the phase behaviors of the polymers during the aggregate and gelation, i.e., the forming or dissociation processes of nuclei and fibril, cluster of fibrils, and cluster-cluster aggregation have been revealed. Such abundant phase behaviors are dominated by the inter-chain interactions, including dispersion, polarity and hydrogen bonding, and correlatd with the solubility parameters of solvents, the temperature, and the polymer concentration. The results of X-ray diffraction and fast-mode dielectric relaxation indicate that AABB possesses more rigid conformation than ABAB, and because of that AABB aggregates are of long fibers while ABAB is of hairy fibril clusters, the gelation concentration in toluene is 1 w/v% for AABB, lower than the 3 w/v% for ABAB.
Method and apparatus for offloading compute resources to a flash co-processing appliance
Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing -bung
2015-10-13
Solid-State Drive (SSD) burst buffer nodes are interposed into a parallel supercomputing cluster to enable fast burst checkpoint of cluster memory to or from nearby interconnected solid-state storage with asynchronous migration between the burst buffer nodes and slower more distant disk storage. The SSD nodes also perform tasks offloaded from the compute nodes or associated with the checkpoint data. For example, the data for the next job is preloaded in the SSD node and very fast uploaded to the respective compute node just before the next job starts. During a job, the SSD nodes perform fast visualization and statistical analysis upon the checkpoint data. The SSD nodes can also perform data reduction and encryption of the checkpoint data.
Fast iterative censoring CFAR algorithm for ship detection from SAR images
NASA Astrophysics Data System (ADS)
Gu, Dandan; Yue, Hui; Zhang, Yuan; Gao, Pengcheng
2017-11-01
Ship detection is one of the essential techniques for ship recognition from synthetic aperture radar (SAR) images. This paper presents a fast iterative detection procedure to eliminate the influence of target returns on the estimation of local sea clutter distributions for constant false alarm rate (CFAR) detectors. A fast block detector is first employed to extract potential target sub-images; and then, an iterative censoring CFAR algorithm is used to detect ship candidates from each target blocks adaptively and efficiently, where parallel detection is available, and statistical parameters of G0 distribution fitting local sea clutter well can be quickly estimated based on an integral image operator. Experimental results of TerraSAR-X images demonstrate the effectiveness of the proposed technique.
Bergö, Martin; Wu, Gengshu; Ruge, Toralph; Olivecrona, Thomas
2002-04-05
During short term fasting, lipoprotein lipase (LPL) activity in rat adipose tissue is rapidly down-regulated. This down-regulation occurs on a posttranslational level; it is not accompanied by changes in LPL mRNA or protein levels. The LPL activity can be restored within 4 h by refeeding. Previously, we showed that during fasting there is a shift in the distribution of lipase protein toward an inactive form with low heparin affinity. To study the nature of the regulatory mechanism, we determined the in vivo turnover of LPL activity, protein mass, and mRNA in rat adipose tissue. When protein synthesis was inhibited with cycloheximide, LPL activity and protein mass decreased rapidly and in parallel with half-lives of around 2 h, and the effect of refeeding was blocked. This indicates that maintaining high levels of LPL activity requires continuous synthesis of new enzyme protein. When transcription was inhibited by actinomycin, LPL mRNA decreased with half-lives of 13.3 and 16.8 h in the fed and fasted states, respectively, demonstrating slow turnover of the LPL transcript. Surprisingly, when actinomycin was given to fed rats, LPL activity was not down-regulated during fasting, indicating that actinomycin interferes with the transcription of a gene that blocks the activation of newly synthesized LPL protein. When actinomycin was given to fasted rats, LPL activity increased 4-fold within 6 h, even in the absence of refeeding. The same effect was seen with alpha-amanitin, another inhibitor of transcription. The response to actinomycin was much less pronounced in aging rats, which are obese and insulin-resistant. These data suggest a default state where LPL protein is synthesized on a relatively stable mRNA and is processed into its active form. During fasting, a gene is switched on whose product prevents the enzyme from becoming active even though synthesis of LPL protein continues unabated.
NASA Astrophysics Data System (ADS)
Shahzad, M.; Rizvi, H.; Panwar, A.; Ryu, C. M.
2017-06-01
We have re-visited the existence criterion of the reverse shear Alfven eigenmodes (RSAEs) in the presence of the parallel equilibrium current by numerically solving the eigenvalue equation using a fast eigenvalue solver code KAES. The parallel equilibrium current can bring in the kink effect and is known to be strongly unfavorable for the RSAE. We have numerically estimated the critical value of the toroidicity factor Qtor in a circular tokamak plasma, above which RSAEs can exist, and compared it to the analytical one. The difference between the numerical and analytical critical values is small for low frequency RSAEs, but it increases as the frequency of the mode increases, becoming greater for higher poloidal harmonic modes.
Parallel processing approach to transform-based image coding
NASA Astrophysics Data System (ADS)
Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.
1991-06-01
This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
A convenient and accurate parallel Input/Output USB device for E-Prime.
Canto, Rosario; Bufalari, Ilaria; D'Ausilio, Alessandro
2011-03-01
Psychological and neurophysiological experiments require the accurate control of timing and synchrony for Input/Output signals. For instance, a typical Event-Related Potential (ERP) study requires an extremely accurate synchronization of stimulus delivery with recordings. This is typically done via computer software such as E-Prime, and fast communications are typically assured by the Parallel Port (PP). However, the PP is an old and disappearing technology that, for example, is no longer available on portable computers. Here we propose a convenient USB device enabling parallel I/O capabilities. We tested this device against the PP on both a desktop and a laptop machine in different stress tests. Our data demonstrate the accuracy of our system, which suggests that it may be a good substitute for the PP with E-Prime.
A transient-enhanced NMOS low dropout voltage regulator with parallel feedback compensation
NASA Astrophysics Data System (ADS)
Han, Wang; Lin, Tan
2016-02-01
This paper presents a transient-enhanced NMOS low-dropout regulator (LDO) for portable applications with parallel feedback compensation. The parallel feedback structure adds a dynamic zero to get an adequate phase margin with a load current variation from 0 to 1 A. A class-AB error amplifier and a fast charging/discharging unit are adopted to enhance the transient performance. The proposed LDO has been implemented in a 0.35 μm BCD process. From experimental results, the regulator can operate with a minimum dropout voltage of 150 mV at a maximum 1 A load and IQ of 165 μA. Under the full range load current step, the voltage undershoot and overshoot of the proposed LDO are reduced to 38 mV and 27 mV respectively.
KMC 2: fast and resource-frugal k-mer counting.
Deorowicz, Sebastian; Kokot, Marek; Grabowski, Szymon; Debudaj-Grabysz, Agnieszka
2015-05-15
Building the histogram of occurrences of every k-symbol long substring of nucleotide data is a standard step in many bioinformatics applications, known under the name of k-mer counting. Its applications include developing de Bruijn graph genome assemblers, fast multiple sequence alignment and repeat detection. The tremendous amounts of NGS data require fast algorithms for k-mer counting, preferably using moderate amounts of memory. We present a novel method for k-mer counting, on large datasets about twice faster than the strongest competitors (Jellyfish 2, KMC 1), using about 12 GB (or less) of RAM. Our disk-based method bears some resemblance to MSPKmerCounter, yet replacing the original minimizers with signatures (a carefully selected subset of all minimizers) and using (k, x)-mers allows to significantly reduce the I/O and a highly parallel overall architecture allows to achieve unprecedented processing speeds. For example, KMC 2 counts the 28-mers of a human reads collection with 44-fold coverage (106 GB of compressed size) in about 20 min, on a 6-core Intel i7 PC with an solid-state disk. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Cortical Specializations Underlying Fast Computations
Volgushev, Maxim
2016-01-01
The time course of behaviorally relevant environmental events sets temporal constraints on neuronal processing. How does the mammalian brain make use of the increasingly complex networks of the neocortex, while making decisions and executing behavioral reactions within a reasonable time? The key parameter determining the speed of computations in neuronal networks is a time interval that neuronal ensembles need to process changes at their input and communicate results of this processing to downstream neurons. Theoretical analysis identified basic requirements for fast processing: use of neuronal populations for encoding, background activity, and fast onset dynamics of action potentials in neurons. Experimental evidence shows that populations of neocortical neurons fulfil these requirements. Indeed, they can change firing rate in response to input perturbations very quickly, within 1 to 3 ms, and encode high-frequency components of the input by phase-locking their spiking to frequencies up to 300 to 1000 Hz. This implies that time unit of computations by cortical ensembles is only few, 1 to 3 ms, which is considerably faster than the membrane time constant of individual neurons. The ability of cortical neuronal ensembles to communicate on a millisecond time scale allows for complex, multiple-step processing and precise coordination of neuronal activity in parallel processing streams, while keeping the speed of behavioral reactions within environmentally set temporal constraints. PMID:25689988
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishioka, K.; Nakamura, Y.; Nishimura, S.
A moment approach to calculate neoclassical transport in non-axisymmetric torus plasmas composed of multiple ion species is extended to include the external parallel momentum sources due to unbalanced tangential neutral beam injections (NBIs). The momentum sources that are included in the parallel momentum balance are calculated from the collision operators of background particles with fast ions. This method is applied for the clarification of the physical mechanism of the neoclassical parallel ion flows and the multi-ion species effect on them in Heliotron J NBI plasmas. It is found that parallel ion flow can be determined by the balance between themore » parallel viscosity and the external momentum source in the region where the external source is much larger than the thermodynamic force driven source in the collisional plasmas. This is because the friction between C{sup 6+} and D{sup +} prevents a large difference between C{sup 6+} and D{sup +} flow velocities in such plasmas. The C{sup 6+} flow velocities, which are measured by the charge exchange recombination spectroscopy system, are numerically evaluated with this method. It is shown that the experimentally measured C{sup 6+} impurity flow velocities do not contradict clearly with the neoclassical estimations, and the dependence of parallel flow velocities on the magnetic field ripples is consistent in both results.« less
Small-tip fast recovery imaging using non-slice-selective tailored tip-up pulses and RF-spoiling
Nielsen, Jon-Fredrik; Yoon, Daehyun; Noll, Douglas C.
2012-01-01
Small-tip fast recovery (STFR) imaging is a new steady-state imaging sequence that is a potential alternative to balanced steady-state free precession (bSSFP). Under ideal imaging conditions, STFR may provide comparable signal-to-noise ratio (SNR) and image contrast as bSSFP, but without signal variations due to resonance offset. STFR relies on a tailored “tip-up”, or “fast recovery”, RF pulse to align the spins with the longitudinal axis after each data readout segment. The design of the tip-up pulse is based on the acquisition of a separate off-resonance (B0) map. Unfortunately, the design of fast (a few ms) slice- or slab-selective RF pulses that accurately tailor the excitation pattern to the local B0 inhomogeneity over the entire imaging volume remains a challenging and unsolved problem. We introduce a novel implementation of STFR imaging based on non-slice-selective tip-up pulses, which simplifies the RF design problem significantly. Out-of-slice magnetization pathways are suppressed using RF-spoiling. Brain images obtained with this technique show excellent gray/white matter contrast, and point to the possibility of rapid steady-state T2/T1-weighted imaging with intrinsic suppression of cerebrospinal fluid, through-plane vessel signal, and off-resonance artifacts. In the future we expect STFR imaging to benefit significantly from parallel excitation hardware and high-order gradient shim systems. PMID:22511367
Suzuki, Miwa; Lee, Andrew Y; Vázquez-Medina, José Pablo; Viscarra, Jose A; Crocker, Daniel E; Ortiz, Rudy M
2015-05-15
Fibroblast growth factor (FGF)-21 is secreted from the liver, pancreas, and adipose in response to prolonged fasting/starvation to facilitate lipid and glucose metabolism. Northern elephant seals naturally fast for several months, maintaining a relatively elevated metabolic rate to satisfy their energetic requirements. Thus, to better understand the impact of prolonged food deprivation on FGF21-associated changes, we analyzed the expression of FGF21, FGF receptor-1 (FGFR1), β-klotho (KLB; a co-activator of FGFR) in adipose, and plasma FGF21, glucose and 3-hydroxybutyrate in fasted elephant seal pups. Expression of FGFR1 and KLB mRNA decreased 98% and 43%, respectively, with fasting duration. While the 80% decrease in mean adipose FGF21 mRNA expression with fasting did not reach statistical significance, it paralleled the 39% decrease in plasma FGF21 concentrations suggesting that FGF21 is suppressed with fasting in elephant seals. Data demonstrate an atypical response of FGF21 to prolonged fasting in a mammal suggesting that FGF21-mediated mechanisms have evolved differentially in elephant seals. Furthermore, the typical fasting-induced, FGF21-mediated actions such as the inhibition of lipolysis in adipose may not be required in elephant seals as part of a naturally adapted mechanism to support their unique metabolic demands during prolonged fasting. Copyright © 2015 Elsevier Inc. All rights reserved.
Suzuki, Miwa; Lee, Andrew; Vázquez-Medina, Jose Pablo; Viscarra, Jose A.; Crocker, Daniel E.; Ortiz, Rudy M.
2015-01-01
Fibroblast growth factor (FGF)-21 is secreted from the liver, pancreas, and adipose in response to prolonged fasting/starvation to facilitate lipid and glucose metabolism. Northern elephant seals naturally fast for several months, maintaining a relatively elevated metabolic rate to satisfy their energetic requirements. Thus, to better understand the impact of prolonged food deprivation on FGF21-associated changes, we analyzed the expression of FGF21, FGF receptor-1 (FGFR1), β-klotho (KLB; a co-activator of FGFR) in adipose, and plasma FGF21, glucose and 3-hydroxybutyrate in fasted elephant seal pups. Expression of FGFR1 and KLB mRNA decreased 98% and 43%, respectively, with fasting duration. While the 80% decrease in mean adipose FGF21 mRNA expression with fasting did not reach statistical significance, it paralleled the 39% decrease in plasma FGF21 concentrations suggesting that FGF21 is suppressed with fasting in elephant seals. Data demonstrate an atypical response of FGF21 to prolonged fasting in a mammal suggesting that FGF21-mediated mechanisms have evolved differentially in elephant seals. Furthermore, the typical fasting-induced, FGF21-mediated actions such as the inhibition of lipolysis in adipose may not be required in elephant seals as part of a naturally adapted mechanism to support their unique metabolic demands during prolonged fasting. PMID:25857751
NASA Astrophysics Data System (ADS)
Alves Júnior, A. A.; Sokoloff, M. D.
2017-10-01
MCBooster is a header-only, C++11-compliant library that provides routines to generate and perform calculations on large samples of phase space Monte Carlo events. To achieve superior performance, MCBooster is capable to perform most of its calculations in parallel using CUDA- and OpenMP-enabled devices. MCBooster is built on top of the Thrust library and runs on Linux systems. This contribution summarizes the main features of MCBooster. A basic description of the user interface and some examples of applications are provided, along with measurements of performance in a variety of environments
Jacobsohn, D.H.; Merrill, L.C.
1959-01-20
An improved parallel addition unit is described which is especially adapted for use in electronic digital computers and characterized by propagation of the carry signal through each of a plurality of denominationally ordered stages within a minimum time interval. In its broadest aspects, the invention incorporates a fast multistage parallel digital adder including a plurality of adder circuits, carry-propagation circuit means in all but the most significant digit stage, means for conditioning each carry-propagation circuit during the time period in which information is placed into the adder circuits, and means coupling carry-generation portions of thc adder circuit to the carry propagating means.
Perceptual learning in visual search: fast, enduring, but non-specific.
Sireteanu, R; Rettenbach, R
1995-07-01
Visual search has been suggested as a tool for isolating visual primitives. Elementary "features" were proposed to involve parallel search, while serial search is necessary for items without a "feature" status, or, in some cases, for conjunctions of "features". In this study, we investigated the role of practice in visual search tasks. We found that, under some circumstances, initially serial tasks can become parallel after a few hundred trials. Learning in visual search is far less specific than learning of visual discriminations and hyperacuity, suggesting that it takes place at another level in the central visual pathway, involving different neural circuits.
A Fast parallel tridiagonal algorithm for a class of CFD applications
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Sun, Xian-He
1996-01-01
The parallel diagonal dominant (PDD) algorithm is an efficient tridiagonal solver. This paper presents for study a variation of the PDD algorithm, the reduced PDD algorithm. The new algorithm maintains the minimum communication provided by the PDD algorithm, but has a reduced operation count. The PDD algorithm also has a smaller operation count than the conventional sequential algorithm for many applications. Accuracy analysis is provided for the reduced PDD algorithm for symmetric Toeplitz tridiagonal (STT) systems. Implementation results on Langley's Intel Paragon and IBM SP2 show that both the PDD and reduced PDD algorithms are efficient and scalable.
Shear wave splitting and crustal anisotropy at the Mid-Atlantic Ridge, 35°N
NASA Astrophysics Data System (ADS)
Barclay, Andrew H.; Toomey, Douglas R.
2003-08-01
Shear wave splitting observed in microearthquake data at the axis of the Mid-Atlantic Ridge near 35°N has a fast polarization direction that is parallel to the trend of the axial valley. The time delays between fast and slow S wave arrivals range from 35 to 180 ms, with an average of 90 ms, and show no relationship with ray path length, source-to-receiver azimuth, or receiver location. The anisotropy is attributed to a shallow distribution of vertical, fluid-filled cracks, aligned parallel to the trend of the axial valley. Joint modeling of the shear wave anisotropy and coincident P wave anisotropy results, using recent theoretical models for the elasticity of a porous medium with aligned cracks, suggests that the crack distribution that causes the observed P wave anisotropy can account for at most 10 ms of the shear wave delay. Most of the shear wave delay thus likely accrues within the shallowmost 500 m (seismic layer 2A), and the percent S wave anisotropy within this highly fissured layer is 8-30%. Isolated, fluid-filled cracks at 500 m to 3 km depth that are too thin or too shallow to be detected by the P wave experiment may also contribute to the shear wave delays. The joint analysis of P and S wave anisotropy is an important approach for constraining the crack distributions in the upper oceanic crust and is especially suited for seismically active hydrothermal systems at slow and intermediate spreading mid-ocean ridges.
Ice-sheet modelling accelerated by graphics cards
NASA Astrophysics Data System (ADS)
Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek
2014-11-01
Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.
ClusCo: clustering and comparison of protein models.
Jamroz, Michal; Kolinski, Andrzej
2013-02-22
The development, optimization and validation of protein modeling methods require efficient tools for structural comparison. Frequently, a large number of models need to be compared with the target native structure. The main reason for the development of Clusco software was to create a high-throughput tool for all-versus-all comparison, because calculating similarity matrix is the one of the bottlenecks in the protein modeling pipeline. Clusco is fast and easy-to-use software for high-throughput comparison of protein models with different similarity measures (cRMSD, dRMSD, GDT_TS, TM-Score, MaxSub, Contact Map Overlap) and clustering of the comparison results with standard methods: K-means Clustering or Hierarchical Agglomerative Clustering. The application was highly optimized and written in C/C++, including the code for parallel execution on CPU and GPU, which resulted in a significant speedup over similar clustering and scoring computation programs.
Seismic imaging using finite-differences and parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ober, C.C.
1997-12-31
A key to reducing the risks and costs of associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Prestack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar wave equation using finite differences. As part of an ongoing ACTI project funded by the US Department of Energy, a finite difference, 3-D prestack, depth migration code has been developed. The goal of this work is to demonstrate that massively parallel computersmore » can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite difference, prestack, depth migration practical for oil and gas exploration. Several problems had to be addressed to get an efficient code for the Intel Paragon. These include efficient I/O, efficient parallel tridiagonal solves, and high single-node performance. Furthermore, to provide portable code the author has been restricted to the use of high-level programming languages (C and Fortran) and interprocessor communications using MPI. He has been using the SUNMOS operating system, which has affected many of his programming decisions. He will present images created from two verification datasets (the Marmousi Model and the SEG/EAEG 3D Salt Model). Also, he will show recent images from real datasets, and point out locations of improved imaging. Finally, he will discuss areas of current research which will hopefully improve the image quality and reduce computational costs.« less
rfpipe: Radio interferometric transient search pipeline
NASA Astrophysics Data System (ADS)
Law, Casey J.
2017-10-01
rfpipe supports Python-based analysis of radio interferometric data (especially from the Very Large Array) and searches for fast radio transients. This extends on the rtpipe library (ascl:1706.002) with new approaches to parallelization, acceleration, and more portable data products. rfpipe can run in standalone mode or be in a cluster environment.
USDA-ARS?s Scientific Manuscript database
New, faster methods have been developed for analysis of vitamin D and triacylglycerols that eliminate hours of wet chemistry and preparative chromatography, while providing more information than classical methods for analysis. Unprecedented detail is provided by combining liquid chromatography with ...
NASA Astrophysics Data System (ADS)
Javidi, Bahram
The present conference discusses topics in the fields of neural networks, acoustooptic signal processing, pattern recognition, phase-only processing, nonlinear signal processing, image processing, optical computing, and optical information processing. Attention is given to the optical implementation of an inner-product neural associative memory, optoelectronic associative recall via motionless-head/parallel-readout optical disk, a compact real-time acoustooptic image correlator, a multidimensional synthetic estimation filter, and a light-efficient joint transform optical correlator. Also discussed are a high-resolution spatial light modulator, compact real-time interferometric Fourier-transform processors, a fast decorrelation algorithm for permutation arrays, the optical interconnection of optical modules, and carry-free optical binary adders.
Posse, Stefan
2011-01-01
The rapid development of fMRI was paralleled early on by the adaptation of MR spectroscopic imaging (MRSI) methods to quantify water relaxation changes during brain activation. This review describes the evolution of multi-echo acquisition from high-speed MRSI to multi-echo EPI and beyond. It highlights milestones in the development of multi-echo acquisition methods, such as the discovery of considerable gains in fMRI sensitivity when combining echo images, advances in quantification of the BOLD effect using analytical biophysical modeling and interleaved multi-region shimming. The review conveys the insight gained from combining fMRI and MRSI methods and concludes with recent trends in ultra-fast fMRI, which will significantly increase temporal resolution of multi-echo acquisition. PMID:22056458
NASA Astrophysics Data System (ADS)
Sato, Yasuhiro; Furuki, Makoto; Tian, Minquan; Iwasa, Izumi; Pu, Lyong Sun; Tatsuura, Satoshi
2002-04-01
We demonstrated ultrafast single-shot multichannel demultiplexing by using a squarylium dye J aggregate film as an optical Kerr medium. High efficiency and fast recovery of the optical Kerr responses were achieved when a signal-pulse wavelength was close to the absorption peak of the J aggregate film with off-resonant excitation. The on/off ratio in demultiplexing of 1 Tb/s signals was improved to be approximately 5. By introducing time delay to both horizontal and vertical directions, we succeeded in directly observing the conversion of 1 Tb/s serial signals into two-dimensionally arranged parallel signals.
Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton
2014-11-11
We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry
Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
Fine Structure in Helium-like Fluorine by Fast-Beam Laser Spectroscopy
NASA Astrophysics Data System (ADS)
Myers, E. G.; Thompson, J. K.; Silver, J. D.
1998-05-01
With the aim of providing an additional precise test of higher-order corrections to high precision calculations of fine structure in helium and helium-like ions(T. Zhang, Z.-C. Yan and G.W.F. Drake, Phys. Rev. Lett. 77), 1715 (1996)., a measurement of the 2^3P_2,F - 2^3P_1,F' fine structure in ^19F^7+ is in progress. The method involves doppler-tuned laser spectroscopy using a CO2 laser on a foil-stripped fluorine ion beam. We aim to achieve a higher precision, compared to an earlier measurement(E.G. Myers, P. Kuske, H.J. Andrae, I.A. Armour, H.A. Klein, J.D. Silver, and E. Traebert, Phys. Rev. Lett. 47), 87 (1981)., by using laser beams parallel and anti-parallel to the ion beam, to obtain partial cancellation of the doppler shift(J.K. Thompson, D.J.H. Howie and E.G. Myers, Phys. Rev. A 57), 180 (1998).. A calculation of the hyperfine structure, allowing for relativistic, QED and nuclear size effects, will be required to obtain the ``hyperfine-free'' fine structure interval from the measurements.
Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry; ...
2016-10-27
Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
NASA Astrophysics Data System (ADS)
Muraviev, A. V.; Smolski, V. O.; Loparo, Z. E.; Vodopyanov, K. L.
2018-04-01
Mid-infrared spectroscopy offers supreme sensitivity for the detection of trace gases, solids and liquids based on tell-tale vibrational bands specific to this spectral region. Here, we present a new platform for mid-infrared dual-comb Fourier-transform spectroscopy based on a pair of ultra-broadband subharmonic optical parametric oscillators pumped by two phase-locked thulium-fibre combs. Our system provides fast (7 ms for a single interferogram), moving-parts-free, simultaneous acquisition of 350,000 spectral data points, spaced by a 115 MHz intermodal interval over the 3.1-5.5 µm spectral range. Parallel detection of 22 trace molecular species in a gas mixture, including isotopologues containing isotopes such as 13C, 18O, 17O, 15N, 34S, 33S and deuterium, with part-per-billion sensitivity and sub-Doppler resolution is demonstrated. The technique also features absolute optical frequency referencing to an atomic clock, a high degree of mutual coherence between the two mid-infrared combs with a relative comb-tooth linewidth of 25 mHz, coherent averaging and feasibility for kilohertz-scale spectral resolution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Na; Jia, Zhe; Wang, Zhihui
Here in this paper, the structure degradation of commercial Lithium-ion battery (LIB) graphite anodes with different cycling numbers and charge rates was investigated by focused ion beam (FIB) and scanning electron microscopy (SEM). The cross-section image of graphite anode by FIB milling shows that cracks, resulted in the volume expansion of graphite electrode during long-term cycling, were formed in parallel with the current collector. The crack occurs in the bulk of graphite particles near the lithium insertion surface, which might derive from the stress induced during lithiation and de-lithiation cycles. Subsequently, crack takes place along grain boundaries of the polycrystallinemore » graphite, but only in the direction parallel with the current collector. Furthermore, fast charge graphite electrodes are more prone to form cracks since the tensile strength of graphite is more likely to be surpassed at higher charge rates. Therefore, for LIBs long-term or high charge rate applications, the tensile strength of graphite anode should be taken into account.« less
Lin, Na; Jia, Zhe; Wang, Zhihui; ...
2017-10-01
Here in this paper, the structure degradation of commercial Lithium-ion battery (LIB) graphite anodes with different cycling numbers and charge rates was investigated by focused ion beam (FIB) and scanning electron microscopy (SEM). The cross-section image of graphite anode by FIB milling shows that cracks, resulted in the volume expansion of graphite electrode during long-term cycling, were formed in parallel with the current collector. The crack occurs in the bulk of graphite particles near the lithium insertion surface, which might derive from the stress induced during lithiation and de-lithiation cycles. Subsequently, crack takes place along grain boundaries of the polycrystallinemore » graphite, but only in the direction parallel with the current collector. Furthermore, fast charge graphite electrodes are more prone to form cracks since the tensile strength of graphite is more likely to be surpassed at higher charge rates. Therefore, for LIBs long-term or high charge rate applications, the tensile strength of graphite anode should be taken into account.« less
NASA Astrophysics Data System (ADS)
Evangelidis, C. P.
2017-12-01
The segmentation and differentiation of subducting slabs have considerable effects on mantle convection and tectonics. The Hellenic subduction zone is a complex convergent margin with strong curvature and fast slab rollback. The upper mantle seismic anisotropy in the region is studied focusing at its western and eastern edges in order to explore the effects of possible slab segmentation on mantle flow and fabrics. Complementary to new SKS shear-wave splitting measurements in regions not adequately sampled so far, the source-side splitting technique is applied to constrain the depth of anisotropy and to densify measurements. In the western Hellenic arc, a trench-normal subslab anisotropy is observed near the trench. In the forearc domain, source-side and SKS measurements reveal a trench-parallel pattern. This indicates subslab trench-parallel mantle flow, associated with return flow due to the fast slab rollback. The passage from continental to oceanic subduction in the western Hellenic zone is illustrated by a forearc transitional anisotropy pattern. This indicates subslab mantle flow parallel to a NE-SW smooth ramp that possibly connects the two subducted slabs. A young tear fault initiated at the Kefalonia Transform Fault is likely not entirely developed, as this trench-parallel anisotropy pattern is observed along the entire western Hellenic subduction system, even following this horizontal offset between the two slabs. At the eastern side of the Hellenic subduction zone, subslab source-side anisotropy measurements show a general trench-normal pattern. These are associated with mantle flow through a possible ongoing tearing of the oceanic lithosphere in the area. Although the exact geometry of this slab tear is relatively unknown, SKS trench-parallel measurements imply that the tear has not reached the surface yet. Further exploration of the Hellenic subduction system is necessary; denser seismic networks should be deployed at both its edges in order to achieve a more definite image of the structure and geodynamics of this area.
SeqMule: automated pipeline for analysis of human exome/genome sequencing data.
Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai
2015-09-18
Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
GPU-based Branchless Distance-Driven Projection and Backprojection
Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong
2017-01-01
Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm. PMID:29333480
GPU-based Branchless Distance-Driven Projection and Backprojection.
Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong
2017-12-01
Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm.
High Performance Parallel Architectures
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek; Kaewpijit, Sinthop
1998-01-01
Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.
Adropin induction of lipoprotein lipase expression in tilapia hepatocytes.
Lian, Anji; Wu, Keqiang; Liu, Tianqiang; Jiang, Nan; Jiang, Quan
2016-01-01
The peptide hormone adropin plays a role in energy homeostasis. However, biological actions of adropin in non-mammalian species are still lacking. Using tilapia as a model, we examined the role of adropin in lipoprotein lipase (LPL) regulation in hepatocytes. To this end, the structural identity of tilapia adropin was established by 5'/3'-rapid amplification of cDNA ends (RACE). The transcripts of tilapia adropin were ubiquitously expressed in various tissues with the highest levels in the liver and hypothalamus. The prolonged fasting could elevate tilapia hepatic adropin gene expression, whereas no effect of fasting was observed on hypothalamic adropin gene levels. In primary cultures of tilapia hepatocytes, synthetic adropin was effective in stimulating LPL release, cellular LPL content, and total LPL production. The increase in LPL production also occurred with parallel rises in LPL gene levels. In parallel experiments, adropin could elevate cAMP production and up-regulate protein kinase A (PKA) and PKC activities. Using a pharmacological approach, cAMP/PKA and PLC/inositol trisphosphate (IP3)/PKC cascades were shown to be involved in adropin-stimulated LPL gene expression. Parallel inhibition of p38MAPK and Erk1/2, however, were not effective in these regards. Our findings provide, for the first time, evidence that adropin could stimulate LPL gene expression via direct actions in tilapia hepatocytes through the activation of multiple signaling mechanisms. © 2016 Society for Endocrinology.
Block-Parallel Data Analysis with DIY2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morozov, Dmitriy; Peterka, Tom
DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Very low-carbohydrate versus isocaloric high-carbohydrate diet in dietary obese rats.
Axen, Kathleen V; Axen, Kenneth
2006-08-01
The effects of a very low-carbohydrate (VLC), high-fat (HF) dietary regimen on metabolic syndrome were compared with those of an isocaloric high-carbohydrate (HC), low-fat (LF) regimen in dietary obese rats. Male Sprague-Dawley rats, made obese by 8 weeks ad libitum consumption of an HF diet, developed features of the metabolic syndrome vs. lean control (C) rats, including greater visceral, subcutaneous, and hepatic fat masses, elevated plasma cholesterol levels, impaired glucose tolerance, and fasting and post-load insulin resistance. Half of the obese rats (VLC) were then fed a popular VLC-HF diet (Weeks 9 and 10 at 5% and Weeks 11 to 14 at 15% carbohydrate), and one-half (HC) were pair-fed an HC-LF diet (Weeks 9 to 14 at 60% carbohydrate). Energy intakes of pair-fed VLC and HC rats were less than C rats throughout Weeks 9 to 14. Compared with HC rats, VLC rats exhibited impaired insulin and glycemic responses to an intraperitoneal glucose load at Week 10 and lower plasma triacylglycerol levels but retarded loss of hepatic, retroperitoneal, and total body fat at Week 14. VLC, HC, and C rats no longer differed in body weight, plasma cholesterol, glucose tolerance, or fasting insulin resistance at Week 14. Progressive decreases in fasting insulin resistance in obese groups paralleled concomitant reductions in hepatic, retroperitoneal, and total body fat. When energy intake was matched, the VLC-HF diet provided no advantage in weight loss or in improving those components of the metabolic syndrome induced by dietary obesity and may delay loss of hepatic and visceral fat as compared with an HC-LF diet.
Gaglianese, A; Costagli, M; Ueno, K; Ricciardi, E; Bernardi, G; Pietrini, P; Cheng, K
2015-01-22
The main visual pathway that conveys motion information to the middle temporal complex (hMT+) originates from the primary visual cortex (V1), which, in turn, receives spatial and temporal features of the perceived stimuli from the lateral geniculate nucleus (LGN). In addition, visual motion information reaches hMT+ directly from the thalamus, bypassing the V1, through a direct pathway. We aimed at elucidating whether this direct route between LGN and hMT+ represents a 'fast lane' reserved to high-speed motion, as proposed previously, or it is merely involved in processing motion information irrespective of speeds. We evaluated functional magnetic resonance imaging (fMRI) responses elicited by moving visual stimuli and applied connectivity analyses to investigate the effect of motion speed on the causal influence between LGN and hMT+, independent of V1, using the Conditional Granger Causality (CGC) in the presence of slow and fast visual stimuli. Our results showed that at least part of the visual motion information from LGN reaches hMT+, bypassing V1, in response to both slow and fast motion speeds of the perceived stimuli. We also investigated whether motion speeds have different effects on the connections between LGN and functional subdivisions within hMT+: direct connections between LGN and MT-proper carry mainly slow motion information, while connections between LGN and MST carry mainly fast motion information. The existence of a parallel pathway that connects the LGN directly to hMT+ in response to both slow and fast speeds may explain why MT and MST can still respond in the presence of V1 lesions. Copyright © 2014 IBRO. Published by Elsevier Ltd. All rights reserved.