Preibisch, Christine; Wallenhorst, Tim; Heidemann, Robin; Zanella, Friedhelm E; Lanfermann, Heinrich
2008-03-01
To evaluate the parallel acquisition techniques, generalized autocalibrating partially parallel acquisitions (GRAPPA) and modified sensitivity encoding (mSENSE), and determine imaging parameters maximizing sensitivity toward functional activation at 3T. A total of eight imaging protocols with different parallel imaging techniques (GRAPPA and mSENSE) and reduction factors (R = 1, 2, 3) were compared at different matrix sizes (64 and 128) with respect to temporal noise characteristics, artifact behavior, and sensitivity toward functional activation. Echo planar imaging (EPI) with GRAPPA and a reduction factor of 2 revealed similar image quality and sensitivity than full k-space EPI. A higher incidence of artifacts and a marked sensitivity loss occurred at R = 3. Even though the same eight-channel head coil was used for signal detection in all experiments, GRAPPA generally showed more benign patterns of spatially-varying noise amplification, and mSENSE was also more susceptible to residual unfolding artifacts than GRAPPA. At 3T and a reduction factor of 2, parallel imaging can be used with only little penalty with regard to sensitivity. With our implementation and coil setup the performance of GRAPPA was clearly superior to mSENSE. Thus, it seems advisable to pay special attention to the employed parallel imaging method and its implementation.
Wright, Katherine L; Chen, Yong; Saybasili, Haris; Griswold, Mark A; Seiberlich, Nicole; Gulani, Vikas
2014-10-01
Dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) examinations of the kidneys provide quantitative information on renal perfusion and filtration. However, these examinations are often difficult to implement because of respiratory motion and their need for a high spatiotemporal resolution and 3-dimensional coverage. Here, we present a free-breathing quantitative renal DCE-MRI examination acquired with a highly accelerated stack-of-stars trajectory and reconstructed with 3-dimensional (3D) through-time radial generalized autocalibrating partially parallel acquisition (GRAPPA), using half and quarter doses of gadolinium contrast. Data were acquired in 10 asymptomatic volunteers using a stack-of-stars trajectory that was undersampled in-plane by a factor of 12.6 with respect to Nyquist sampling criterion and using partial Fourier of 6/8 in the partition direction. Data had a high temporal (2.1-2.9 seconds per frame) and spatial (approximately 2.2 mm) resolution with full 3D coverage of both kidneys (350-370 mm × 79-92 mm). Images were successfully reconstructed with 3D through-time radial GRAPPA, and interframe respiratory motion was compensated by using an algorithm developed to automatically use images from multiple points of enhancement as references for registration. Quantitative pharmacokinetic analysis was performed using a separable dual-compartment model. Region-of-interest (ROI) pharmacokinetic analysis provided estimates (mean (SD)) of quantitative renal parameters after a half dose: 218.1 (57.1) mL/min per 100 mL; plasma mean transit time, 4.8 (2.2) seconds; renal filtration, 28.7 (10.0) mL/min per 100 mL; and tubular mean transit time, 131.1 (60.2) seconds in 10 kidneys. The ROI pharmacokinetic analysis provided estimates (mean (SD)) of quantitative renal parameters after a quarter dose: 218.1 (57.1) mL/min per 100 mL; plasma mean transit time, 4.8 (2.2) seconds; renal filtration, 28.7 (10.0) mL/min per 100 mL; and tubular mean transit time
Yin, Xiaoming; Larson, Andrew C
2009-03-01
Multiple gradient-recalled echo (MGRE) methods are commonly used for abdominal R(2)* mapping. Accelerated MGRE acquisitions would offer the potential to shorten requisite breathhold times and/or increase spatial resolution and coverage. In both phantom and normal volunteer studies, view-sharing (VS) methods, generalized autocalibrating partially parallel acquisition (GRAPPA) methods, and newly proposed k-echo time (k-TE) GRAPPA methods were compared for the purpose of accelerating MGRE acquisitions. Utilization of water-selective spatial spectral excitation pulses reduced artifact levels for both VS and k-TE GRAPPA approaches. VS approaches were found to be highly sensitive to off-resonance effects, particularly at increasing acceleration rates. k-TE GRAPPA significantly reduced residual artifact levels compared to GRAPPA approaches while improving the accuracy of accelerated abdominal R(2)* measurements. These initial feasibility studies demonstrate that k-TE GRAPPA is an effective method to reduce scan times during abdominal R(2)*-mapping procedures.
Cauley, Stephen F; Setsompop, Kawin; Bilgic, Berkin; Bhat, Himanshu; Gagoski, Borjan; Wald, Lawrence L
2017-09-01
Fast MRI acquisitions often rely on efficient traversal of k-space and hardware limitations, or other physical effects can cause the k-space trajectory to deviate from a theoretical path in a manner dependent on the image prescription and protocol parameters. Additional measurements or generalized calibrations are typically needed to characterize the discrepancies. We propose an autocalibrated technique to determine these discrepancies. A joint optimization is used to estimate the trajectory simultaneously with the parallel imaging reconstruction, without the need for additional measurements. Model reduction is introduced to make this optimization computationally efficient, and to ensure final image quality. We demonstrate our approach for the wave-CAIPI fast acquisition method that uses a corkscrew k-space path to efficiently encode k-space and spread the voxel aliasing. Model reduction allows for the 3D trajectory to be automatically calculated in fewer than 30 s on standard vendor hardware. The method achieves equivalent accuracy to full-gradient calibration scans. The proposed method allows for high-quality wave-CAIPI reconstruction across wide ranges of protocol parameters, such as field of view (FOV) location/orientation, bandwidth, echo time (TE), resolution, and sinusoidal amplitude/frequency. Our framework should allow for the autocalibration of gradient trajectories from many other fast MRI techniques in clinically relevant time. Magn Reson Med 78:1093-1099, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
ESPIRiT — An Eigenvalue Approach to Autocalibrating Parallel MRI: Where SENSE meets GRAPPA
Uecker, Martin; Lai, Peng; Murphy, Mark J.; Virtue, Patrick; Elad, Michael; Pauly, John M.; Vasanawala, Shreyas S.; Lustig, Michael
2014-01-01
Purpose Parallel imaging allows the reconstruction of images from undersampled multi-coil data. The two main approaches are: SENSE, which explicitly uses coil sensitivities, and GRAPPA, which makes use of learned correlations in k-space. The purpose of this work is to clarify their relationship and to develop and evaluate an improved algorithm Theory and Methods A theoretical analysis shows: 1. The correlations in k-space are encoded in the null space of a calibration matrix. 2. Both approaches restrict the solution to a subspace spanned by the sensitivities. 3. The sensitivities appear as the main eigenvector of a reconstruction operator computed from the null space. The basic assumptions and the quality of the sensitivity maps are evaluated in experimental examples. The appearance of additional eigenvectors motivates an extended SENSE reconstruction with multiple maps, which is compared to existing methods Results The existence of a null space and the high quality of the extracted sensitivities are confirmed. The extended reconstruction combines all advantages of SENSE with robustness to certain errors similar to GRAPPA. Conclusion In this paper the gap between both approaches is finally bridged. A new autocalibration technique combines the benefits of both. PMID:23649942
Andre, J.B.; Zaharchuk, G.; Fischbein, N.J.; Augustin, M.; Skare, S.; Straka, M.; Rosenberg, J.; Lansberg, M.G.; Kemp, S.; Wijman, C.A.C.; Albers, G.W.; Schwartz, N.E.; Bammer, R.
2012-01-01
BACKGROUND AND PURPOSE PI improves routine EPI-based DWI by enabling higher spatial resolution and reducing geometric distortion, though it remains unclear which of these is most important. We evaluated the relative contribution of these factors and assessed their ability to increase lesion conspicuity and diagnostic confidence by using a GRAPPA technique. MATERIALS AND METHODS Four separate DWI scans were obtained at 1.5T in 48 patients with independent variation of in-plane spatial resolution (1.88 mm2 versus 1.25 mm2) and/or reduction factor (R = 1 versus R = 3). A neuroradiologist with access to clinical history and additional imaging sequences provided a reference standard diagnosis for each case. Three blinded neuroradiologists assessed scans for abnormalities and also evaluated multiple imaging-quality metrics by using a 5-point ordinal scale. Logistic regression was used to determine the impact of each factor on subjective image quality and confidence. RESULTS Reference standard diagnoses in the patient cohort were acute ischemic stroke (n = 30), ischemic stroke with hemorrhagic conversion (n = 4), intraparenchymal hemorrhage (n = 9), or no acute lesion (n = 5). While readers preferred both a higher reduction factor and a higher spatial resolution, the largest effect was due to an increased reduction factor (odds ratio, 47 ± 16). Small lesions were more confidently discriminated from artifacts on R = 3 images. The diagnosis changed in 5 of 48 scans, always toward the reference standard reading and exclusively for posterior fossa lesions. CONCLUSIONS PI improves DWI primarily by reducing geometric distortion rather than by increasing spatial resolution. This outcome leads to a more accurate and confident diagnosis of small lesions. PMID:22403781
Goerner, Frank L; Duong, Timothy; Stafford, R Jason; Clarke, Geoffrey D
2013-08-01
To investigate the utility of five different standard measurement methods for determining image uniformity for partially parallel imaging (PPI) acquisitions in terms of consistency across a variety of pulse sequences and reconstruction strategies. Images were produced with a phantom using a 12-channel head matrix coil in a 3T MRI system (TIM TRIO, Siemens Medical Solutions, Erlangen, Germany). Images produced using echo-planar, fast spin echo, gradient echo, and balanced steady state free precession pulse sequences were evaluated. Two different PPI reconstruction methods were investigated, generalized autocalibrating partially parallel acquisition algorithm (GRAPPA) and modified sensitivity-encoding (mSENSE) with acceleration factors (R) of 2, 3, and 4. Additionally images were acquired with conventional, two-dimensional Fourier imaging methods (R=1). Five measurement methods of uniformity, recommended by the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) were considered. The methods investigated were (1) an ACR method and a (2) NEMA method for calculating the peak deviation nonuniformity, (3) a modification of a NEMA method used to produce a gray scale uniformity map, (4) determining the normalized absolute average deviation uniformity, and (5) a NEMA method that focused on 17 areas of the image to measure uniformity. Changes in uniformity as a function of reconstruction method at the same R-value were also investigated. Two-way analysis of variance (ANOVA) was used to determine whether R-value or reconstruction method had a greater influence on signal intensity uniformity measurements for partially parallel MRI. Two of the methods studied had consistently negative slopes when signal intensity uniformity was plotted against R-value. The results obtained comparing mSENSE against GRAPPA found no consistent difference between GRAPPA and mSENSE with regard to signal intensity uniformity. The results of the two
NASA Astrophysics Data System (ADS)
Lee, Mike M.; Cho, Byung Lok
2001-11-01
In this paper, we proposed a new First Partial product Addition (FPA) architecture with new compressor (or parallel counter) to CSA tree built in the process of adding partial product for improving speed in the fast parallel multiplier to improve the speed of calculating partial product by about 20% compared with existing parallel counter using full Adder. The new circuit reduces the CLA bit finding final sum by N/2 using the novel FPA architecture. A 5.14ns of multiplication speed of the 16X16 multiplier is obtained using 0.25um CMOS technology. The architecture of the multiplier is easily opted for pipeline design and demonstrates high speed performance.
The Force Singularity for Partially Immersed Parallel Plates
NASA Astrophysics Data System (ADS)
Bhatnagar, Rajat; Finn, Robert
2016-12-01
In earlier work, we provided a general description of the forces of attraction and repulsion, encountered by two parallel vertical plates of infinite extent and of possibly differing materials, when partially immersed in an infinite liquid bath and subject to surface tension forces. In the present study, we examine some unusual details of the exotic behavior that can occur at the singular configuration separating infinite rise from infinite descent of the fluid between the plates, as the plates approach each other. In connection with this singular behavior, we present also some new estimates on meniscus height details.
Solution of partial differential equations on vector and parallel computers
NASA Technical Reports Server (NTRS)
Ortega, J. M.; Voigt, R. G.
1985-01-01
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Sparsity-Promoting Calibration for GRAPPA Accelerated Parallel MRI Reconstruction
Weller, Daniel S.; Polimeni, Jonathan R.; Grady, Leo; Wald, Lawrence L.; Adalsteinsson, Elfar; Goyal, Vivek K
2013-01-01
The amount of calibration data needed to produce images of adequate quality can prevent auto-calibrating parallel imaging reconstruction methods like Generalized Autocalibrating Partially Parallel Acquisitions (GRAPPA) from achieving a high total acceleration factor. To improve the quality of calibration when the number of auto-calibration signal (ACS) lines is restricted, we propose a sparsity-promoting regularized calibration method that finds a GRAPPA kernel consistent with the ACS fit equations that yields jointly sparse reconstructed coil channel images. Several experiments evaluate the performance of the proposed method relative to un-regularized and existing regularized calibration methods for both low-quality and underdetermined fits from the ACS lines. These experiments demonstrate that the proposed method, like other regularization methods, is capable of mitigating noise amplification, and in addition, the proposed method is particularly effective at minimizing coherent aliasing artifacts caused by poor kernel calibration in real data. Using the proposed method, we can increase the total achievable acceleration while reducing degradation of the reconstructed image better than existing regularized calibration methods. PMID:23584259
Bankson, James A; Stafford, R Jason; Hazle, John D
2005-03-01
Magnetic resonance temperature imaging can be used to monitor the progress of thermal ablation therapies, increasing treatment efficacy and improving patient safety. High temporal resolution is important when therapies rapidly heat tissue, but many approaches to faster image acquisition compromise image resolution, slice coverage, or phase sensitivity. Partially parallel imaging techniques offer the potential for improved temporal resolution without forcing such concessions. Although these techniques perturb image phase, relative phase changes between dynamically acquired phase-sensitive images, such as those acquired for MR temperature imaging, can be reliably measured through partially parallel imaging techniques using reconstruction filters that remain constant across the series. Partially parallel and non-accelerated phase-difference-sensitive data can be obtained through arrays of surface coils using this method. Average phase differences measured through partially parallel and fully Fourier encoded images are virtually identical, while phase noise increases with g(sqrt)L as in standard partially parallel image acquisitions..
Parallel Reconstruction Using Null Operations (PRUNO)
Zhang, Jian; Liu, Chunlei; Moseley, Michael E.
2011-01-01
A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Polarization Imaging Apparatus with Auto-Calibration
NASA Technical Reports Server (NTRS)
Zou, Yingyin Kevin (Inventor); Zhao, Hongzhi (Inventor); Chen, Qiushui (Inventor)
2013-01-01
A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5 deg, a second variable phase retarder with its optical axis aligned 45 deg, a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I(sub 0), I(sub 1), I(sub 2) and I(sub 3) of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (pi,0), (pi,pi) and (pi/2,pi), respectively. Then four Stokes components of a Stokes image, S(sub 0), S(sub 1), S(sub 2) and S(sub 3) were calculated using the four intensity images.
An online recursive autocalibration of triaxial accelerometer.
Lin Ye; Su, Steven W; Dong Lei; Nguyen, Hung T
2016-08-01
In this paper, we proposed a novel method for autocalibration of triaxial Micro-Electro-Mechanical systems (MEMS) accelerometer that does not require any sophisticated laboratory facilities. In particular, this method is an online calibration method which can be conveniently implemented with the accuracy of MEMS accelerometer being significantly improved. The procedure exploits the fact that the output vector of the accelerometer must match the local gravity in static state condition. To achieve online calibration, the model as well as the cost function are linearized at the beginning, and an online recursive method is then utilized to identify the unknown parameters and remove the bias caused by linearization. This online recursive method is based on damped recursive least square estimation (DRLS), which can significantly reduce the calculation complexity comparing to nonlinear optimization method. In addition, the unknown parameters can be solved in a short time and the estimated parameters can remain stable during calibration. Experimentally, this method was tested by comparing the output results before and after calibration in different condition. It showed that the output, after calibrated by the proposed method, is more accurate with respect to raw output using default factory parameters.
Polarization imaging apparatus with auto-calibration
Zou, Yingyin Kevin; Zhao, Hongzhi; Chen, Qiushui
2013-08-20
A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5.degree., a second variable phase retarder with its optical axis aligned 45.degree., a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I.sub.0, I.sub.1, I.sub.2 and I.sub.3 of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (.pi.,0), (.pi.,.pi.) and (.pi./2,.pi.), respectively. Then four Stokes components of a Stokes image, S.sub.0, S.sub.1, S.sub.2 and S.sub.3 were calculated using the four intensity images.
Software Compression for Partially Parallel Imaging with Multi-channels.
Huang, Feng; Vijayakumar, Sathya; Akao, James
2005-01-01
In magnetic resonance imaging, multi-channel phased array coils enjoy a high signal to noise ratio (SNR) and better parallel imaging performance. But with the increase in number of channels, the reconstruction time and requirement for computer memory become inevitable problems. In this work, principle component analysis is applied to reduce the size of data and protect the performance of parallel imaging. Clinical data collected using a 32-channel cardiac coil are used in the experiments. Experimental results show that the proposed method dramatically reduces the processing time without much damage to the reconstructed image.
Adaptive Methods and Parallel Computation for Partial Differential Equations
1992-05-01
E. Batcher, W. C. Meilander, and J. L. Potter, Eds ., Proceedings of the Inter- national Conference on Parallel Processing, Computer Society Press...11. P. L. Baehmann, S. L. Wittchen , M. S. Shephard, K. R. Grice, and M. A. Yerry, Robust, geometrically based, automatic two-dimensional mesh
Generating Parallel Execution Plans with a Partial Order Planner
1994-05-01
the atomic act ion assumptipon01 atid they c an be executed in parallel. Thie setniant ins of stents front the fact that th lie S1 11’s-it% yle repri...1976), O-PLAN (Currie & Tate and only if, for all conditions that are relevant 1991), MP, and MPI (Kambhampati 1994). The class to achieving G, the
NASA Technical Reports Server (NTRS)
Toomarian, N.; Fijany, A.; Barhen, J.
1993-01-01
Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.
NASA Technical Reports Server (NTRS)
Toomarian, N.; Fijany, A.; Barhen, J.
1993-01-01
Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.
A parallel performance study of the Cartesian method for partial differential equations on a sphere
Drake, J.B.; Coddington, M.P.
1997-04-01
A 3-D Cartesian method for integration of partial differential equations on a spherical surface is developed for parallel computation. The target computer architectures are distributed memory, message passing computers such as the Intel Paragon. The parallel algorithms are described along with mesh partitioning strategies. Performance of the algorithms is considered for a standard test case of the shallow water equations on the sphere. The authors find the computation time scale well with increasing numbers of processors.
NASA Technical Reports Server (NTRS)
Nguyen, Howard; Willacy, Karen; Allen, Mark
2012-01-01
KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.
NASA Technical Reports Server (NTRS)
Nguyen, Howard; Willacy, Karen; Allen, Mark
2012-01-01
KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.
NASA Astrophysics Data System (ADS)
Lyu, Jingyuan; Nakarmi, Ukash; Zhang, Chaoyi; Ying, Leslie
2016-05-01
This paper presents a new approach to highly accelerated dynamic parallel MRI using low rank matrix completion, partial separability (PS) model. In data acquisition, k-space data is moderately randomly undersampled at the center kspace navigator locations, but highly undersampled at the outer k-space for each temporal frame. In reconstruction, the navigator data is reconstructed from undersampled data using structured low-rank matrix completion. After all the unacquired navigator data is estimated, the partial separable model is used to obtain partial k-t data. Then the parallel imaging method is used to acquire the entire dynamic image series from highly undersampled data. The proposed method has shown to achieve high quality reconstructions with reduction factors up to 31, and temporal resolution of 29ms, when the conventional PS method fails.
Analysis and Modeling of Parallel Photovoltaic Systems under Partial Shading Conditions
NASA Astrophysics Data System (ADS)
Buddala, Santhoshi Snigdha
Since the industrial revolution, fossil fuels like petroleum, coal, oil, natural gas and other non-renewable energy sources have been used as the primary energy source. The consumption of fossil fuels releases various harmful gases into the atmosphere as byproducts which are hazardous in nature and they tend to deplete the protective layers and affect the overall environmental balance. Also the fossil fuels are bounded resources of energy and rapid depletion of these sources of energy, have prompted the need to investigate alternate sources of energy called renewable energy. One such promising source of renewable energy is the solar/photovoltaic energy. This work focuses on investigating a new solar array architecture with solar cells connected in parallel configuration. By retaining the structural simplicity of the parallel architecture, a theoretical small signal model of the solar cell is proposed and modeled to analyze the variations in the module parameters when subjected to partial shading conditions. Simulations were run in SPICE to validate the model implemented in Matlab. The voltage limitations of the proposed architecture are addressed by adopting a simple dc-dc boost converter and evaluating the performance of the architecture in terms of efficiencies by comparing it with the traditional architectures. SPICE simulations are used to compare the architectures and identify the best one in terms of power conversion efficiency under partial shading conditions.
White, Melanie Y; Brown, David A; Sheng, Simon; Cole, Robert N; O'Rourke, Brian; Van Eyk, Jennifer E
2011-02-01
The ability to decipher the dynamic protein component of any system is determined by the inherent limitations of the technologies used, the complexity of the sample, and the existence of an annotated genome. In the absence of an annotated genome, large-scale proteomic investigations can be technically difficult. Yet the functional and biological species differences across animal models can lead to selection of partially or nonannotated organisms over those with an annotated genome. The outweighing of biology over technology leads us to investigate the degree to which a parallel approach can facilitate proteome coverage in the absence of complete genome annotation. When studying species without complete genome annotation, a particular challenge is how to ensure high proteome coverage while meeting the bioinformatic stringencies of high-throughput proteomics. A protein inventory of Oryctolagus cuniculus mitochondria was created by overlapping "protein-centric" and "peptide-centric" one-dimensional and two-dimensional liquid chromatography strategies; with additional partitioning into membrane-enriched and soluble fractions. With the use of these five parallel approaches, 2934 unique peptides were identified, corresponding to 558 nonredundant protein groups. 230 of these proteins (41%) were identified by only a single technical approach, confirming the need for parallel techniques to improve annotation. To determine the extent of coverage, a side-by-side comparison with human and mouse cardiomyocyte mitochondrial studies was performed. A nonredundant list of 995 discrete proteins was compiled, of which 244 (25%) were common across species. The current investigation identified 142 unique protein groups, the majority of which were detected here by only one technical approach, in particular peptide- and protein-centric two-dimensional liquid chromatography. Although no single approach achieved more than 40% coverage, the combination of three approaches (protein- and
NASA Technical Reports Server (NTRS)
Hunt, L. R.; Villarreal, Ramiro
1987-01-01
System theorists understand that the same mathematical objects which determine controllability for nonlinear control systems of ordinary differential equations (ODEs) also determine hypoellipticity for linear partial differentail equations (PDEs). Moreover, almost any study of ODE systems begins with linear systems. It is remarkable that Hormander's paper on hypoellipticity of second order linear p.d.e.'s starts with equations due to Kolmogorov, which are shown to be analogous to the linear PDEs. Eigenvalue placement by state feedback for a controllable linear system can be paralleled for a Kolmogorov equation if an appropriate type of feedback is introduced. Results concerning transformations of nonlinear systems to linear systems are similar to results for transforming a linear PDE to a Kolmogorov equation.
Parallelizing across time when solving time-dependent partial differential equations
Worley, P.H.
1991-09-01
The standard numerical algorithms for solving time-dependent partial differential equations (PDEs) are inherently sequential in the time direction. This paper describes algorithms for the time-accurate solution of certain classes of linear hyperbolic and parabolic PDEs that can be parallelized in both time and space and have serial complexities that are proportional to the serial complexities of the best known algorithms. The algorithms for parabolic PDEs are variants of the waveform relaxation multigrid method (WFMG) of Lubich and Ostermann where the scalar ordinary differential equations (ODEs) that make up the kernel of WFMG are solved using a cyclic reduction type algorithm. The algorithms for hyperbolic PDEs use the cyclic reduction algorithm to solve ODEs along characteristics. 43 refs.
A Simple Application of Compressed Sensing to Further Accelerate Partially Parallel Imaging
Miao, Jun; Guo, Weihong; Narayan, Sreenath; Wilson, David L.
2012-01-01
Compressed Sensing (CS) and partially parallel imaging (PPI) enable fast MR imaging by reducing the amount of k-space data required for reconstruction. Past attempts to combine these two have been limited by the incoherent sampling requirement of CS, since PPI routines typically sample on a regular (coherent) grid. Here, we developed a new method, “CS+GRAPPA,” to overcome this limitation. We decomposed sets of equidistant samples into multiple random subsets. Then, we reconstructed each subset using CS, and averaging the results to get a final CS k-space reconstruction. We used both a standard CS, and an edge and joint-sparsity guided CS reconstruction. We tested these intermediate results on both synthetic and real MR phantom data, and performed a human observer experiment to determine the effectiveness of decomposition, and to optimize the number of subsets. We then used these CS reconstructions to calibrate the GRAPPA complex coil weights. In vivo parallel MR brain and heart data sets were used. An objective image quality evaluation metric, Case-PDM, was used to quantify image quality. Coherent aliasing and noise artifacts were significantly reduced using two decompositions. More decompositions further reduced coherent aliasing and noise artifacts but introduced blurring. However, the blurring was effectively minimized using our new edge and joint-sparsity guided CS using two decompositions. Numerical results on parallel data demonstrated that the combined method greatly improved image quality as compared to standard GRAPPA, on average halving Case-PDM scores across a range of sampling rates. The proposed technique allowed the same Case-PDM scores as standard GRAPPA, using about half the number of samples. We conclude that the new method augments GRAPPA by combining it with CS, allowing CS to work even when the k-space sampling pattern is equidistant. PMID:22902065
Fast high-spatial-resolution MRI of the ankle with parallel imaging using GRAPPA at 3 T.
Bauer, Jan Stefan; Banerjee, Suchandrima; Henning, Tobias D; Krug, Roland; Majumdar, Sharmilla; Link, Thomas M
2007-07-01
The purpose of our study was to compare an autocalibrating parallel imaging technique at 3 T with standard acquisitions at 3 and 1.5 T for small-field-of-view imaging of the ankle. MRI of the ankle was performed in three fresh human cadaver specimens and three healthy volunteers. Axial and sagittal T1-weighted, axial fat-saturated T2-weighted, and coronal intermediate-weighted fast spin-echo sequences, as well as a fat-saturated spoiled gradient-echo sequence, were acquired at 1.5 and 3 T. At 3 T, reduced data sets were reconstructed using a generalized autocalibrating partially parallel acquisition (GRAPPA) technique, with a scan time reduction of approximately 44%. All images were assessed by two radiologists independently concerning image quality. The signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) were measured in every data set. In the cadaver specimens, macroscopic findings after dissection served as a reference for the pathologic evaluation. SNR and CNR in the GRAPPA images were comparable to the standard acquisition at 3 T. The image quality was rated significantly higher at 3 T with both normal and parallel acquisition compared with 1.5 T. There was no significant difference in ligament and cartilage visualization or in image quality between standard and GRAPPA reconstruction at 3 T. Ankle abnormalities were better seen at 3 T than at 1.5 T for both normal and parallel acquisitions. Using higher field strength combined with parallel technique, MR images of the ankle were obtained with excellent diagnostic quality and a scan time reduction of about 44%. In addition, parallel imaging can provide more flexibility in protocol design.
NASA Astrophysics Data System (ADS)
Martin, I.; Tirado, F.; Vazquez, L.
We present a process to achieve the solution of the two dimensional nonlinear Schrödinger equation using a multigrid technique on a distributed memory machine. Some features about the multigrid technique as its good convergence and parallel properties are explained in this paper. This makes multigrid method the optimal one to solve the systems of equations arising at each time step from an implicit numerical scheme. We give some experimental results about the parallel numerical simulation of this equation on a message passing parallel machine.
Wang, Yilei; Pillai, Suresh Kumar Raman; Chan-Park, Mary B
2013-09-09
Single-walled carbon nanotubes (SWNTs) are widely thought to be a strong contender for next-generation printed electronic transistor materials. However, large-scale solution-based parallel assembly of SWNTs to obtain high-performance transistor devices is challenging. SWNTs have anisotropic properties and, although partial alignment of the nanotubes has been theoretically predicted to achieve optimum transistor device performance, thus far no parallel solution-based technique can achieve this. Herein a novel solution-based technique, the immersion-cum-shake method, is reported to achieve partially aligned SWNT networks using semiconductive (99% enriched) SWNTs (s-SWNTs). By immersing an aminosilane-treated wafer into a solution of nanotubes placed on a rotary shaker, the repetitive flow of the nanotube solution over the wafer surface during the deposition process orients the nanotubes toward the fluid flow direction. By adjusting the nanotube concentration in the solution, the nanotube density of the partially aligned network can be controlled; linear densities ranging from 5 to 45 SWNTs/μm are observed. Through control of the linear SWNT density and channel length, the optimum SWNT-based field-effect transistor devices achieve outstanding performance metrics (with an on/off ratio of ~3.2 × 10(4) and mobility 46.5 cm(2) /Vs). Atomic force microscopy shows that the partial alignment is uniform over an area of 20 × 20 mm(2) and confirms that the orientation of the nanotubes is mostly along the fluid flow direction, with a narrow orientation scatter characterized by a full width at half maximum (FWHM) of <15° for all but the densest film, which is 35°. This parallel process is large-scale applicable and exploits the anisotropic properties of the SWNTs, presenting a viable path forward for industrial adoption of SWNTs in printed, flexible, and large-area electronics.
New equation for the computation of flow velocity in partially filled pipes arranged in parallel.
Zeghadnia, Lotfi; Djemili, Lakhdar; Houichi, Larbi; Rezgui, Nouredin
2014-01-01
This paper presents a new approach for the computation of flow velocity in pipes arranged in parallel based on an analytic development. The estimation of the flow parameters using existing methods requires trial and error procedures. The assessment of flow velocity is of great importance in flow measurement methods and in the design of drainage networks, among others. In drainage network design, the flow is mostly of free surface type. A new method is developed to eliminate the need for trial methods, where the computation of the flow velocity becomes easy, simple, and direct with zero deviation compared to Manning equation results and other approaches such as that have been considered as the best existing solutions. This research work shows that these approaches lack accuracy and do not cover the entire range of flow surface angles: 0° ≤ θ ≤ 360°.
Single-shot magnetic resonance spectroscopic imaging with partial parallel imaging.
Posse, Stefan; Otazo, Ricardo; Tsai, Shang-Yueh; Yoshimoto, Akio Ernesto; Lin, Fa-Hsuan
2009-03-01
A magnetic resonance spectroscopic imaging (MRSI) pulse sequence based on proton-echo-planar-spectroscopic-imaging (PEPSI) is introduced that measures two-dimensional metabolite maps in a single excitation. Echo-planar spatial-spectral encoding was combined with interleaved phase encoding and parallel imaging using SENSE to reconstruct absorption mode spectra. The symmetrical k-space trajectory compensates phase errors due to convolution of spatial and spectral encoding. Single-shot MRSI at short TE was evaluated in phantoms and in vivo on a 3-T whole-body scanner equipped with a 12-channel array coil. Four-step interleaved phase encoding and fourfold SENSE acceleration were used to encode a 16 x 16 spatial matrix with a 390-Hz spectral width. Comparison with conventional PEPSI and PEPSI with fourfold SENSE acceleration demonstrated comparable sensitivity per unit time when taking into account g-factor-related noise increases and differences in sampling efficiency. LCModel fitting enabled quantification of inositol, choline, creatine, and N-acetyl-aspartate (NAA) in vivo with concentration values in the ranges measured with conventional PEPSI and SENSE-accelerated PEPSI. Cramer-Rao lower bounds were comparable to those obtained with conventional SENSE-accelerated PEPSI at the same voxel size and measurement time. This single-shot MRSI method is therefore suitable for applications that require high temporal resolution to monitor temporal dynamics or to reduce sensitivity to tissue movement.
Choongsang Cho; Sangkeun Lee
2016-04-01
Image smoothing has been used for image segmentation, image reconstruction, object classification, and 3D content generation. Several smoothing approaches have been used at the pre-processing step to retain the critical edge, while removing noise and small details. However, they have limited performance, especially in removing small details and smoothing discrete regions. Therefore, to provide fast and accurate smoothing, we propose an effective scheme that uses a weighted combination of the gradient, Laplacian, and diagonal derivatives of a smoothed image. In addition, to reduce computational complexity, we designed and implemented a parallel processing structure for the proposed scheme on a graphics processing unit (GPU). For an objective evaluation of the smoothing performance, the images were linearly quantized into several layers to generate experimental images, and the quantized images were smoothed using several methods for reconstructing the smoothly changed shape and intensity of the original image. Experimental results showed that the proposed scheme has higher objective scores and better successful smoothing performance than similar schemes, while preserving and removing critical and trivial details, respectively. For computational complexity, the proposed smoothing scheme running on a GPU provided 18 and 16 times lower complexity than the proposed smoothing scheme running on a CPU and the L0-based smoothing scheme, respectively. In addition, a simple noise reduction test was conducted to show the characteristics of the proposed approach; it reported that the presented algorithm outperforms the state-of-the art algorithms by more than 5.4 dB. Therefore, we believe that the proposed scheme can be a useful tool for efficient image smoothing.
Sinha, Pranava; Deutsch, Nina; Ratnayaka, Kanishka; He, Dingchao; Peer, Murfad; Kurkluoglu, Mustafa; Nuszkowski, Mark; Montague, Erin; Mikesell, Gerald; Zurakowski, David; Jonas, Richard
2017-06-15
Mechanical assistance of systemic single ventricle is effective in pulling blood through a cavopulmonary circuit. In patients with superior cavopulmonary connection, this strategy can lead to arterial desaturation secondary to increased inferior caval flow. We hypothesized that overall augmentation in cardiac output with mechanical assistance compensates for the drop in oxygen saturation thereby maintaining tissue oxygen delivery (DO2). Bidirectional Glenn (BDG) was established in seven swine (25 kg) after a common atrium had been established by balloon septostomy. Mechanical circulatory assistance of the single ventricle was achieved using an axial flow pump with ventricular inflow and aortic outflow. Cardiac output, mean pulmonary artery pressure (PAP), common atrial pressure (left atrial pressure [LAP]), arterial oxygen saturation (SaO2), partial pressure of arterial oxygen (PaO2), and DO2 were compared between assisted and nonassisted circulation. Significant augmentation of cardiac output was achieved with mechanical assistance in BDG circulation (BDG: median [interquartile range {IQR}], 0.8 [0.9-1.15] L/min versus assisted BDG: median [IQR], 1.5 [1.15-1.7] L/min; p = 0.05). Although oxygen saturations and PaO2 trended to be lower with assistance (SaO2; BDG: median [IQR], 43% [32-57%]; assisted BDG: median [IQR], 32% [24-35%]; p = 0.07) (PaO2; BDG: median [IQR], 24 [20-30] mm Hg; assisted BDG: median [IQR], 20 [17-21] mm Hg; p = 0.08), DO2 was unchanged with mechanical assistance (BDG: median [IQR], 94 [35-99] ml/min; assisted BDG: median [IQR], 79 [63-85] ml/min; p = 0.81). No significant change in the LAP or PAP was observed. In the setting of superior cavopulmonary connection/single ventricle, the systemic ventricular assistance with a ventricular assist device (VAD) leads to increase in cardiac output. Arterial oxygen saturations however may be lower with mechanical assistance, without any change in DO2.
Ye, Lin; Su, Steven W
2015-01-01
Optimum Experimental Design (OED) is an information gathering technique used to estimate parameters, which aims to minimize the variance of parameter estimation and prediction. In this paper, we further investigate an OED for MEMS accelerometer calibration of the 9-parameter auto-calibration model. Based on a linearized 9-parameter accelerometer model, we show the proposed OED is both G-optimal and rotatable, which are the desired properties for the calibration of wearable sensors for which only simple calibration devices are available. The experimental design is carried out with a newly developed wearable health monitoring device and desired experimental results have been achieved.
Direct parallel image reconstructions for spiral trajectories using GRAPPA.
Heidemann, Robin M; Griswold, Mark A; Seiberlich, Nicole; Krüger, Gunnar; Kannengiesser, Stephan A R; Kiefer, Berthold; Wiggins, Graham; Wald, Lawrence L; Jakob, Peter M
2006-08-01
The use of spiral trajectories is an efficient way to cover a desired k-space partition in magnetic resonance imaging (MRI). Compared to conventional Cartesian k-space sampling, it allows faster acquisitions and results in a slight reduction of the high gradient demand in fast dynamic scans, such as in functional MRI (fMRI). However, spiral images are more susceptible to off-resonance effects that cause blurring artifacts and distortions of the point-spread function (PSF), and thereby degrade the image quality. Since off-resonance effects scale with the readout duration, the respective artifacts can be reduced by shortening the readout trajectory. Multishot experiments represent one approach to reduce these artifacts in spiral imaging, but result in longer scan times and potentially increased flow and motion artifacts. Parallel imaging methods are another promising approach to improve image quality through an increase in the acquisition speed. However, non-Cartesian parallel image reconstructions are known to be computationally time-consuming, which is prohibitive for clinical applications. In this study a new and fast approach for parallel image reconstructions for spiral imaging based on the generalized autocalibrating partially parallel acquisitions (GRAPPA) methodology is presented. With this approach the computational burden is reduced such that it becomes comparable to that needed in accelerated Cartesian procedures. The respective spiral images with two- to eightfold acceleration clearly benefit from the advantages of parallel imaging, such as enabling parallel MRI single-shot spiral imaging with the off-resonance behavior of multishot acquisitions. Copyright 2006 Wiley-Liss, Inc.
Instrument Variables for Reducing Noise in Parallel MRI Reconstruction
Lin, Hong
2017-01-01
Generalized autocalibrating partially parallel acquisition (GRAPPA) has been a widely used parallel MRI technique. However, noise deteriorates the reconstructed image when reduction factor increases or even at low reduction factor for some noisy datasets. Noise, initially generated from scanner, propagates noise-related errors during fitting and interpolation procedures of GRAPPA to distort the final reconstructed image quality. The basic idea we proposed to improve GRAPPA is to remove noise from a system identification perspective. In this paper, we first analyze the GRAPPA noise problem from a noisy input-output system perspective; then, a new framework based on errors-in-variables (EIV) model is developed for analyzing noise generation mechanism in GRAPPA and designing a concrete method—instrument variables (IV) GRAPPA to remove noise. The proposed EIV framework provides possibilities that noiseless GRAPPA reconstruction could be achieved by existing methods that solve EIV problem other than IV method. Experimental results show that the proposed reconstruction algorithm can better remove the noise compared to the conventional GRAPPA, as validated with both of phantom and in vivo brain data. PMID:28197419
NASA Astrophysics Data System (ADS)
Cikalova, Ulana; Schreiber, Jürgen; Hillmann, Susanne; Meyendorf, Norbert
2014-02-01
The magnetic Barkhausen Noise (BN) is well suited to evaluate the effects of mechanical stresses of ferromagnetic materials, e.g. the indirect detection of residual stress states. The most common causes for the occurrence of residual stresses are manufacturing processes, such as casting, welding, machining, forming, heat treatment, etc., consecutive repairs and design changes, and installation or assembly and overloads during the operating life of a construction. A significant calibration effort based on a set of reference values and/or test samples is needed for these measurements, which require a great deal of time and material resources. Additionally, it is impossible to determine the stress states of different components (σxx and σyy) at the surface. Therefore, a new auto-calibration method was developed to analyze two-dimensional stresses. A fixed calibration function based on defined parameters (determined experimentally) was applied. To adjust the auto-calibration function to the experimental reference values by varying functional parameters, a large number of measurement points were used. We present a method that can calculate, based on the multi-dimensional stress state at the measuring point, the stress components σxx and σyy for two perpendicular magnetization directions using the Barkhausen Noise effect.
Prototype of an auto-calibrating, context-aware, hybrid brain-computer interface.
Faller, J; Torrellas, S; Miralles, F; Holzner, C; Kapeller, C; Guger, C; Bund, J; Müller-Putz, G R; Scherer, R
2012-01-01
We present the prototype of a context-aware framework that allows users to control smart home devices and to access internet services via a Hybrid BCI system of an auto-calibrating sensorimotor rhythm (SMR) based BCI and another assistive device (Integra Mouse mouth joystick). While there is extensive literature that describes the merit of Hybrid BCIs, auto-calibrating and co-adaptive ERD BCI training paradigms, specialized BCI user interfaces, context-awareness and smart home control, there is up to now, no system that includes all these concepts in one integrated easy-to-use framework that can truly benefit individuals with severe functional disabilities by increasing independence and social inclusion. Here we integrate all these technologies in a prototype framework that does not require expert knowledge or excess time for calibration. In a first pilot-study, 3 healthy volunteers successfully operated the system using input signals from an ERD BCI and an Integra Mouse and reached average positive predictive values (PPV) of 72 and 98% respectively. Based on what we learned here we are planning to improve the system for a test with a larger number of healthy volunteers so we can soon bring the system to benefit individuals with severe functional disability.
Qi, Haikun; Huang, Feng; Zhou, Hongmei; Chen, Huijun
2017-03-01
k-t principle component analysis (k-t PCA) is a distinguished method for high spatiotemporal resolution dynamic MRI. To further improve the accuracy of k-t PCA, a combination with partial parallel imaging (PPI), k-t PCA/SENSE, has been tested. However, k-t PCA/SENSE suffers from long reconstruction time and limited improvement. This study aims to improve the combination of k-t PCA and PPI on both reconstruction speed and accuracy. A sequential combination scheme called k-t PCA GROWL (GRAPPA operator for wider readout line) was proposed. The GRAPPA operator was performed before k-t PCA to extend each readout line into a wider band, which improved the condition of the encoding matrix in the following k-t PCA reconstruction. k-t PCA GROWL was tested and compared with k-t PCA and k-t PCA/SENSE on cardiac imaging. k-t PCA GROWL consistently resulted in better image quality compared with k-t PCA/SENSE at high acceleration factors for both retrospectively and prospectively undersampled cardiac imaging, with a much lower computation cost. The improvement in image quality became greater with the increase of acceleration factor. By sequentially combining the GRAPPA operator and k-t PCA, the proposed k-t PCA GROWL method outperformed k-t PCA/SENSE in both reconstruction speed and accuracy, suggesting that k-t PCA GROWL is a better combination scheme than k-t PCA/SENSE. Magn Reson Med 77:1058-1067, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Sparse Auto-Calibration for Radar Coincidence Imaging with Gain-Phase Errors
Zhou, Xiaoli; Wang, Hongqiang; Cheng, Yongqiang; Qin, Yuliang
2015-01-01
Radar coincidence imaging (RCI) is a high-resolution staring imaging technique without the limitation of relative motion between target and radar. The sparsity-driven approaches are commonly used in RCI, while the prior knowledge of imaging models needs to be known accurately. However, as one of the major model errors, the gain-phase error exists generally, and may cause inaccuracies of the model and defocus the image. In the present report, the sparse auto-calibration method is proposed to compensate the gain-phase error in RCI. The method can determine the gain-phase error as part of the imaging process. It uses an iterative algorithm, which cycles through steps of target reconstruction and gain-phase error estimation, where orthogonal matching pursuit (OMP) and Newton’s method are used, respectively. Simulation results show that the proposed method can improve the imaging quality significantly and estimate the gain-phase error accurately. PMID:26528981
Multiwavelength digital holography with autocalibration of phase shifts and artificial wavelengths
Carl, Daniel; Fratz, Markus; Pfeifer, Marcel; Giel, Dominik M.; Hoefler, Heinrich
2009-12-01
A novel implementation of lensless multiwavelength digital holography with autocalibration of temporal phase shifts and artificial wavelength is presented. The algorithm we used to calculate the phase shifts was previously proposed [Opt. Lett.29 183 (2004)OPLEDP0146-959210.1364/OL.29.000183] and, to our knowledge, is now used for the first time in lensless holography. Because precise knowledge of the generated artificial wavelength is crucial for absolute measurement accuracy, a simple and efficient method to determine the artificial wavelength directly is presented. The calibration method is based on a simple modification of the experimental setup and needs just one additional image acquisition per wavelength. The results of shape measurement of a metallic test object with a rough surface and steep edges are shown and the measurement accuracy is discussed.
Gerdes, Lee; Gerdes, Peter; Lee, Sung W; H Tegeler, Charles
2013-01-01
Disturbances of neural oscillation patterns have been reported with many disease states. We introduce methodology for HIRREM™ (high-resolution, relational, resonance-based electroencephalic mirroring), also known as Brainwave Optimization™, a noninvasive technology to facilitate relaxation and auto-calibration of neural oscillations. HIRREM is a precision-guided technology for allostatic therapeutics, intended to help the brain calibrate its own functional set points to optimize fitness. HIRREM technology collects electroencephalic data through two-channel recordings and delivers a series of audible musical tones in near real time. Choices of tone pitch and timing are made by mathematical algorithms, principally informed by the dominant frequency in successive instants of time, to permit resonance between neural oscillatory frequencies and the musical tones. Relaxation of neural oscillations through HIRREM appears to permit auto-calibration toward greater hemispheric symmetry and more optimized proportionation of regional spectral power. To illustrate an application of HIRREM, we present data from a randomized clinical trial of HIRREM as an intervention for insomnia (n = 19). On average, there was reduction of right-dominant temporal lobe high-frequency (23–36 Hz) EEG asymmetry over the course of eight successive HIRREM sessions. There was a trend for correlation between reduction of right temporal lobe dominance and magnitude of insomnia symptom reduction. Disturbances of neural oscillation have implications for both neuropsychiatric health and downstream peripheral (somatic) physiology. The possibility of noninvasive optimization for neural oscillatory set points through HIRREM suggests potentially multitudinous roles for this technology. Research is currently ongoing to further explore its potential applications and mechanisms of action. PMID:23532171
Gerdes, Lee; Gerdes, Peter; Lee, Sung W; H Tegeler, Charles
2013-03-01
Disturbances of neural oscillation patterns have been reported with many disease states. We introduce methodology for HIRREM™ (high-resolution, relational, resonance-based electroencephalic mirroring), also known as Brainwave Optimization™, a noninvasive technology to facilitate relaxation and auto-calibration of neural oscillations. HIRREM is a precision-guided technology for allostatic therapeutics, intended to help the brain calibrate its own functional set points to optimize fitness. HIRREM technology collects electroencephalic data through two-channel recordings and delivers a series of audible musical tones in near real time. Choices of tone pitch and timing are made by mathematical algorithms, principally informed by the dominant frequency in successive instants of time, to permit resonance between neural oscillatory frequencies and the musical tones. Relaxation of neural oscillations through HIRREM appears to permit auto-calibration toward greater hemispheric symmetry and more optimized proportionation of regional spectral power. To illustrate an application of HIRREM, we present data from a randomized clinical trial of HIRREM as an intervention for insomnia (n = 19). On average, there was reduction of right-dominant temporal lobe high-frequency (23-36 Hz) EEG asymmetry over the course of eight successive HIRREM sessions. There was a trend for correlation between reduction of right temporal lobe dominance and magnitude of insomnia symptom reduction. Disturbances of neural oscillation have implications for both neuropsychiatric health and downstream peripheral (somatic) physiology. The possibility of noninvasive optimization for neural oscillatory set points through HIRREM suggests potentially multitudinous roles for this technology. Research is currently ongoing to further explore its potential applications and mechanisms of action.
Garrigue, Stephane; Gentilini, Claudio; Hofgartner, Franz; Mouton, Elisabeth; Rousseau, Anne; Clementy, Jacques
2002-06-01
The rate responsiveness of a single chamber, accelerometer-based pacemaker with an autocalibration function (Opus G VVIR pacemaker, ELA Medical) was studied with a daily life protocol developed to automatically optimize the programming of accelerometer-based sensors. This new sensor was compared with two other body activity sensors that were manually optimized patient by patient. Forty-three pacemaker recipients (mean age 71 +/- 11 years), paced > 95% of the time, underwent a daily life protocol consisting of rapid walking for 6 minutes (W), climbing upstairs for 1.5 minutes (U), and downstairs for 1.5 minutes (D), alternated by recovery phases. The results were compared with performances measured in a control population of healthy subjects and in two paced patient populations (one equipped with a Dash Intermedics VVIR pacemaker and the other equipped with a Sensolog III Pacesetter/St. Jude VVIR pacemaker). Sex distribution and mean age between paced patients and control subjects were statistically comparable. The mean heartrate achieved by all paced patients at each time sample was compared with the normograms, assigning acceleration (slope) and rate (rate) scores for exercise and recovery phases. Scores ranged from -10 (hypochronotropic) to +10 (hyperchronotropic). Zero represents exact concordance with the responses of healthy individuals, and values between -2.5 and +2.5 were considered statistically similar to normal. During W, although the overall performances of the Dash, Sensolog, and Opus G did not statistically differ from healthy controls, the scores obtained by the Opus G were significantly closer to controls than those of the two other pacemakers (P = 0.02). For U, the three sensors were hypochronotropic (P = 0.03), though the Opus G was associated with a heart rate response closer to that of healthy controls (P = 0.04). D provided similar mean heart rate scores for the Opus G and the Dash compared with healthy controls, in contrast with the
Auto-Calibration of SOL-ACES in the EUV Spectral Region
NASA Astrophysics Data System (ADS)
Schmidtke, G.; Brunner, R.; Eberhard, D.; Hofmann, A.; Klocke, U.; Knothe, M.; Konz, W.; Riedel, W.-J.; Wolf, H.
The Sol-ACES (SOLAR Auto-Calibrating EUV/UV Spectrometers) experiment is prepared to be flown with the ESA SOLAR payload to the International Space Station as planned for the Shuttle mission E1 in August 2006. Four grazing incidence spectrometers of planar geometry cover the wavelength range from 16-220 nm with a spectral resolution from 0.5-2.3 nm. These high-efficiency spectrometers will be re-calibrated by two three-signal ionization chambers to be operated with 44 band pass filters on routine during the mission. Re-measuring the filter transmissions with the spectrometers also allows a very accurate determination of the changing second (optical) order efficiencies of the spectrometers as well as the stray light contributions to the spectral recording in different wavelength ranges. In this context the primary requirements for measurements of high radiometric accuracy will be discussed in detail. - The absorption gases of the ionization chambers are neon, xenon and a mixture of 10 % nitric oxide and 90 % xenon. As the laboratory measurements show that by this method secondary effects can be determined to a high degree resulting in very accurate irradiance measurements that is ranging from 5 to 3 % in absolute terms depending on the wavelegth range.
Biton, Victor; Krauss, Gregory; Vasquez-Santana, Blanca; Bibbiani, Francesco; Mann, Allison; Perdomo, Carlos; Narurkar, Milind
2011-02-01
Efficacy and safety of adjunctive rufinamide (3,200 mg/day) was assessed in adolescents and adults with inadequately controlled partial-onset seizures receiving maintenance therapy with up to three antiepileptic drugs (AEDs). This randomized, double-blind, placebo-controlled, parallel-group, multicenter study comprised a 56-day baseline phase (BP), 12-day titration phase, and 84-day maintenance phase (MP). The primary efficacy variable was percentage change in total partial seizure frequency per 28 days (MP vs. BP). Secondary efficacy outcome measures included ≥50% responder rate and reduction in mean total partial seizure frequency during the MP. Safety and tolerability evaluation included adverse events (AEs), physical and neurologic examinations, and laboratory values. Pharmacokinetic and pharmacodynamic assessments were conducted. Three hundred fifty-seven patients were randomized: 176 to rufinamide and 181 to placebo. Patients had a median of 13.3 seizures per 28 days during BP; 86% were receiving ≥2 AEDs. For the intent-to-treat population, the median percentage reduction in total partial seizure frequency per 28 days was 23.25 for rufinamide versus 9.80 for placebo (p = 0.007). Rufinamide-treated patients were more than twice as likely to have had a ≥50% reduction in partial seizure frequency (32.5% vs. 14.3%; p < 0.001) and had a greater reduction in median total partial seizure rate per 28 days during the MP (13.2 vs. 5.2; p < 0.001). Treatment-emergent AEs occurring at ≥5% higher incidence in the rufinamide group compared with placebo were dizziness, fatigue, nausea, somnolence, and diplopia. Adjunctive treatment with rufinamide reduced total partial seizures in refractory patients. AEs reported were consistent with the known tolerability profile of rufinamide. Wiley Periodicals, Inc. © 2010 International League Against Epilepsy.
NASA Astrophysics Data System (ADS)
Vijayalekshmy, S.; Rama Iyer, S.; Beevi, Bisharathu
2015-09-01
The output power from the photovoltaic (PV) array decreases and the array exhibit multiple peaks when it is subjected to partial shading (PS). The power loss in the PV array varies with the array configuration, physical location and the shading pattern. This paper compares the relative performance of a PV array consisting of a short string of three PV modules for two different configurations. The mismatch loss, shading loss, fill factor and the power loss due to the failure in tracking of the global maximum power point, of a series string with bypass diodes and short parallel string are analysed using MATLAB/Simulink model. The performance of the system is investigated for three different conditions of solar insolation for the same shading pattern. Results indicate that there is considerable power loss due to shading in a series string during PS than in a parallel string with same number of modules.
NASA Astrophysics Data System (ADS)
Sheikhnejad, Yahya; Hosseini, Reza; Saffar Avval, Majid
2017-02-01
In this study, steady state laminar ferroconvection through circular horizontal tube partially filled with porous media under constant heat flux is experimentally investigated. Transverse magnetic fields were applied on ferrofluid flow by two fixed parallel magnet bar positioned on a certain distance from beginning of the test section. The results show promising notable enhancement in heat transfer as a consequence of partially filled porous media and magnetic field, up to 2.2 and 1.4 fold enhancement were observed in heat transfer coefficient respectively. It was found that presence of both porous media and magnetic field simultaneously can highly improve heat transfer up to 2.4 fold. Porous media of course plays a major role in this configuration. Virtually, application of Magnetic field and porous media also insert higher pressure loss along the pipe which again porous media contribution is higher that magnetic field.
Kazan, Samira M; Huber, Laurentius; Flandin, Guillaume; Ivanov, Dimo; Bandettini, Peter; Weiskopf, Nikolaus
2016-11-09
The statistical power of functional MRI (fMRI) group studies is significantly hampered by high intersubject spatial and magnitude variance. We recently presented a vascular autocalibration method (VasA) to account for vascularization differences between subjects and hence improve the sensitivity in group studies. Here, we validate the novel calibration method by means of direct comparisons of VasA with more established measures of baseline venous blood volume (and indirectly vascular reactivity), the M-value. Seven healthy volunteers participated in two 7 T (T) fMRI experiments to compare M-values with VasA estimates: (i) a hypercapnia experiment to estimate voxelwise M-value maps, and (ii) an fMRI experiment using visual stimulation to estimate voxelwise VasA maps. We show that VasA and M-value calibration maps show the same spatial profile, providing strong evidence that VasA is driven by local variations in vascular reactivity as reflected in the M-value. The agreement of vascular reactivity maps obtained with VasA when compared with M-value maps confirms empirically the hypothesis that the VasA method is an adequate tool to account for variations in fMRI response amplitudes caused by vascular reactivity differences in healthy volunteers. VasA can therefore directly account for them and increase the statistical power of group studies. The VasA toolbox is available as a statistical parametric mapping (SPM) toolbox, facilitating its general application. Magn Reson Med, 2016. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.
Huber, Laurentius; Flandin, Guillaume; Ivanov, Dimo; Bandettini, Peter; Weiskopf, Nikolaus
2016-01-01
Purpose The statistical power of functional MRI (fMRI) group studies is significantly hampered by high intersubject spatial and magnitude variance. We recently presented a vascular autocalibration method (VasA) to account for vascularization differences between subjects and hence improve the sensitivity in group studies. Here, we validate the novel calibration method by means of direct comparisons of VasA with more established measures of baseline venous blood volume (and indirectly vascular reactivity), the M‐value. Methods Seven healthy volunteers participated in two 7 T (T) fMRI experiments to compare M‐values with VasA estimates: (i) a hypercapnia experiment to estimate voxelwise M‐value maps, and (ii) an fMRI experiment using visual stimulation to estimate voxelwise VasA maps. Results We show that VasA and M‐value calibration maps show the same spatial profile, providing strong evidence that VasA is driven by local variations in vascular reactivity as reflected in the M‐value. Conclusion The agreement of vascular reactivity maps obtained with VasA when compared with M‐value maps confirms empirically the hypothesis that the VasA method is an adequate tool to account for variations in fMRI response amplitudes caused by vascular reactivity differences in healthy volunteers. VasA can therefore directly account for them and increase the statistical power of group studies. The VasA toolbox is available as a statistical parametric mapping (SPM) toolbox, facilitating its general application. Magn Reson Med 78:1168–1173, 2017. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine. PMID:27851867
Xie, Jingsi; Lai, Peng; Huang, Feng; Li, Yu; Li, Debiao
2010-05-01
Radial sampling has been demonstrated to be potentially useful in cardiac magnetic resonance imaging because it is less susceptible to motion than Cartesian sampling. Nevertheless, its capability of imaging acceleration remains limited by undersampling-induced streaking artifacts. In this study, a self-calibrated reconstruction method was developed to suppress streaking artifacts for highly accelerated parallel radial acquisitions in cardiac magnetic resonance imaging. Two- (2D) and three-dimensional (3D) radial k-space data were collected from a phantom and healthy volunteers. Images reconstructed using the proposed method and the conventional regridding method were compared based on statistical analysis on a four-point scale imaging scoring. It was demonstrated that the proposed method can effectively remove undersampling streaking artifacts and significantly improve image quality (P<.05). With the use of the proposed method, image score (1-4, 1=poor, 2=good, 3=very good, 4=excellent) was improved from 2.14 to 3.34 with the use of an undersampling factor of 4 and from 1.09 to 2.5 with the use of an undersampling factor of 8. Our study demonstrates that the proposed reconstruction method is effective for highly accelerated cardiac imaging applications using parallel radial acquisitions without calibration data.
NASA Astrophysics Data System (ADS)
Ma, Sangback
In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The
NASA Astrophysics Data System (ADS)
Pereira, Tiago M. D.; Uitenbroek, Han
2015-02-01
The emergence of three-dimensional magneto-hydrodynamic simulations of stellar atmospheres has sparked a need for efficient radiative transfer codes to calculate detailed synthetic spectra. We present RH 1.5D, a massively parallel code based on the RH code and capable of performing Zeeman polarised multi-level non-local thermodynamical equilibrium calculations with partial frequency redistribution for an arbitrary amount of chemical species. The code calculates spectra from 3D, 2D or 1D atmospheric models on a column-by-column basis (or 1.5D). While the 1.5D approximation breaks down in the cores of very strong lines in an inhomogeneous environment, it is nevertheless suitable for a large range of scenarios and allows for faster convergence with finer control over the iteration of each simulation column. The code scales well to at least tens of thousands of CPU cores, and is publicly available. In the present work we briefly describe its inner workings, strategies for convergence optimisation, its parallelism, and some possible applications.
Parallel Imaging with Nonlinear Reconstruction using Variational Penalties
Knoll, Florian; Clason, Christian; Bredies, Kristian; Uecker, Martin; Stollberger, Rudolf
2014-01-01
A new approach based on nonlinear inversion for autocalibrated parallel imaging with arbitrary sampling patterns is presented. By extending the iteratively regularized Gauss–Newton method with variational penalties, the improved reconstruction quality obtained from joint estimation of image and coil sensitivities is combined with the superior noise suppression of total variation and total generalized variation regularization. In addition, the proposed approach can lead to enhanced removal of sampling artifacts arising from pseudorandom and radial sampling patterns. This is demonstrated for phantom and in-vivo measurements. PMID:21710612
Elger, Christian E; Stefan, Hermann; Mann, Allison; Narurkar, Milind; Sun, Yijun; Perdomo, Carlos
2010-02-01
To assess the efficacy, safety, tolerability, and pharmacokinetics of adjunctive rufinamide in adults and adolescents with inadequately controlled partial seizures receiving treatment with one to three concomitant antiepileptic drugs (AEDs). A 24-week multicenter Phase II clinical study was conducted (n=647), comprising a 12-week prospective baseline phase and a 12-week randomized double-blind, parallel-group, five-arm (placebo and rufinamide 200, 400, 800, and 1600mg/day) treatment phase. The linear trend of dose response for seizure frequency per 28 days in the double-blind treatment phase - the primary efficacy outcome measure - was statistically significant in favor of rufinamide (estimated slope=-0.049, P=0.003; minimally efficacious dose, 400mg/day). Response rates, defined as a >or=50% reduction in seizure frequency per 28 days, also revealed a significant linear trend of dose response (P=0.0019, logistic regression analysis). Adverse events were comparable between placebo and all rufinamide groups except the 1600mg/day group; no safety signals were observed. These results suggest that in the dose range of 400-1600mg/day, add-on rufinamide therapy may benefit patients with inadequately controlled partial seizures and is generally well tolerated. These data also suggest that higher doses may confer additional efficacy without adversely affecting safety and tolerability.
Broch, Ole; Carbonell, Jose; Ferrando, Carlos; Metzner, Malte; Carstens, Arne; Albrecht, Martin; Gruenewald, Matthias; Höcker, Jan; Soro, Marina; Steinfath, Markus; Renner, Jochen; Bein, Berthold
2015-11-26
Less-invasive and easy to install monitoring systems for continuous estimation of cardiac index (CI) have gained increasing interest, especially in cardiac surgery patients who often exhibit abrupt haemodynamic changes. The aim of the present study was to compare the accuracy of CI by a new semi-invasive monitoring system with transpulmonary thermodilution before and after cardiopulmonary bypass (CPB). Sixty-five patients (41 Germany, 24 Spain) scheduled for elective coronary surgery were studied before and after CPB, respectively. Measurements included CI obtained by transpulmonary thermodilution (CITPTD) and autocalibrated semi-invasive pulse contour analysis (CIPFX). Percentage changes of CI were also calculated. There was only a poor correlation between CITPTD and CIPFX both before (r (2) = 0.34, p < 0.0001) and after (r (2) = 0.31, p < 0.0001) CPB, with a percentage error (PE) of 62 and 49 %, respectively. Four quadrant plots revealed a concordance rate over 90 % indicating an acceptable correlation of trends between CITPTD and CIPFX before (concordance: 93 %) and after (concordance: 94 %) CPB. In contrast, polar plot analysis showed poor trending before and an acceptable trending ability of changes in CI after CPB. Semi-invasive CI by autocalibrated pulse contour analysis showed a poor ability to estimate CI compared with transpulmonary thermodilution. Furthermore, the new semi-invasive device revealed an acceptable trending ability for haemodynamic changes only after CPB. ClinicalTrials.gov: NCT02312505 Date: 12.03.2012.
Mundt, Torsten; Al Jaghsi, Ahmad; Schwahn, Bernd; Hilgert, Janina; Lucas, Christian; Biffar, Reiner; Schwahn, Christian; Heinemann, Friedhelm
2016-07-30
Acceptable short-term survival rates (>90 %) of mini-implants (diameter < 3.0 mm) are only documented for mandibular overdentures. Sound data for mini-implants as strategic abutments for a better retention of partial removable dental prosthesis (PRDP) are not available. The purpose of this study is to test the hypothesis that immediately loaded mini-implants show more bone loss and less success than strategic mini-implants with delayed loading. In this four-center (one university hospital, three dental practices in Germany), parallel-group, controlled clinical trial, which is cluster randomized on patient level, a total of 80 partially edentulous patients with unfavourable number and distribution of remaining abutment teeth in at least one jaw will receive supplementary min-implants to stabilize their PRDP. The mini-implant are either immediately loaded after implant placement (test group) or delayed after four months (control group). Follow-up of the patients will be performed for 36 months. The primary outcome is the radiographic bone level changes at implants. The secondary outcome is the implant success as a composite variable. Tertiary outcomes include clinical, subjective (quality of life, satisfaction, chewing ability) and dental or technical complications. Strategic implants under an existing PRDP are only documented for standard-diameter implants. Mini-implants could be a minimal invasive and low cost solution for this treatment modality. The trial is registered at Deutsches Register Klinischer Studien (German register of clinical trials) under DRKS-ID: DRKS00007589 ( www.germanctr.de ) on January 13(th), 2015.
Tang, Hongtai; Lv, Guozhong; Fu, Jinfeng; Niu, Xihua; Li, Yeyang; Zhang, Mei; Zhang, Guoʼan; Hu, Dahai; Chen, Xiaodong; Lei, Jin; Qi, Hongyan; Xia, Zhaofan
2015-05-01
Partial-thickness burns are among the most frequently encountered types of burns, and numerous dressing materials are available for their treatment. A multicenter, open, randomized, and parallel study was undertaken to determine the efficacy and tolerability of silver sulfadiazine (SSD) compared with an absorbent foam silver dressing, Mepilex Ag, on patients aged between 5 years and 65 years with deep partial-thickness thermal burn injuries (2.5-25% total body surface area). Patients were randomly assigned to either SSD (n = 82) applied daily or a Mepilex Ag dressing (n = 71) applied every 5 days to 7 days. The treatment period was up to 4 weeks. There was no significant difference between the two treatment groups with respect to the primary end point of time to healing, which occurred in 56 (79%) of 71 patients after a median follow-up time of 15 days in the Mepilex Ag group compared with 65 (79%) of 82 patients after a median follow-up time of 16 days in the SSD group (p = 0.74). There was also no significant difference in the percentage of study burn healed. Patients in the Mepilex Ag group had 87.1% of their study burn healed (out of the total burn area) compared with 85.2% of patients in the SSD group. However, the mean total number of dressings used was significantly more in the SSD group (14.0) compared with the Mepilex Ag group (3.06, p < 0.0001). There was no significant difference in the time until skin graft was performed between the two study groups. There was no difference in healing rates between Mepilex Ag and SSD, with both products well tolerated. The longer wear time of Mepilex Ag promotes undisturbed healing and makes it easier for patients to continue with their normal lives sooner. Therapeutic study, level III.
Blazejewska, Anna I; Bhat, Himanshu; Wald, Lawrence L; Polimeni, Jonathan R
2017-05-15
Temporal signal-to-noise ratio (tSNR) is a key metric for assessing the ability to detect brain activation in fMRI data. A recent study has shown substantial variation of tSNR between multiple runs of accelerated EPI acquisitions reconstructed with the GRAPPA method using protocols commonly used for fMRI experiments. Across-run changes in the location of high-tSNR regions could lead to misinterpretation of the observed brain activation patterns, reduced sensitivity of the fMRI studies, and biased results. We compared conventional EPI autocalibration (ACS) methods with the recently-introduced FLEET ACS method, measuring their tSNR variability, as well as spatial overlap and displacement of high-tSNR clusters across runs in datasets acquired from human subjects at 7T and 3T. FLEET ACS reconstructed data had higher tSNR levels, as previously reported, as well as better temporal consistency and larger overlap of the high-tSNR clusters across runs compared with reconstructions using conventional multi-shot (ms) EPI ACS data. tSNR variability across two different runs of the same protocol using ms-EPI ACS data was about two times larger than for the protocol using FLEET ACS for acceleration factors (R) 2 and 3, and one and half times larger for R=4. The level of across-run tSNR consistency for data reconstructed with FLEET ACS was similar to within-run tSNR consistency. The displacement of high-tSNR clusters across two runs (inter-cluster distance) decreased from ∼8mm in the time-series reconstructed using conventional ms-EPI ACS data to ∼4mm for images reconstructed using FLEET ACS. However, the performance gap between conventional ms-EPI ACS and FLEET ACS narrowed with increasing parallel imaging acceleration factor. Overall, the FLEET ACS method provides a simple solution to the problem of varying tSNR across runs, and therefore helps ensure that an assumption of fMRI analysis-that tSNR is largely consistent across runs-is met for accelerated acquisitions
Kaufmann, Tobias; Völker, Stefan; Gunesch, Laura; Kübler, Andrea
2012-01-01
Brain–computer interfaces (BCI) based on event-related potentials (ERP) allow for selection of characters from a visually presented character-matrix and thus provide a communication channel for users with neurodegenerative disease. Although they have been topic of research for more than 20 years and were multiply proven to be a reliable communication method, BCIs are almost exclusively used in experimental settings, handled by qualified experts. This study investigates if ERP–BCIs can be handled independently by laymen without expert support, which is inevitable for establishing BCIs in end-user’s daily life situations. Furthermore we compared the classic character-by-character text entry against a predictive text entry (PTE) that directly incorporates predictive text into the character-matrix. N = 19 BCI novices handled a user-centered ERP–BCI application on their own without expert support. The software individually adjusted classifier weights and control parameters in the background, invisible to the user (auto-calibration). All participants were able to operate the software on their own and to twice correctly spell a sentence with the auto-calibrated classifier (once with PTE, once without). Our PTE increased spelling speed and, importantly, did not reduce accuracy. In sum, this study demonstrates feasibility of auto-calibrating ERP–BCI use, independently by laymen and the strong benefit of integrating predictive text directly into the character-matrix. PMID:22833713
Li, Bingyi; Chen, Liang; Wei, Chunpeng; Xie, Yizhuang; Chen, He; Yu, Wenyue
2017-01-01
With the development of satellite load technology and very large scale integrated (VLSI) circuit technology, onboard real-time synthetic aperture radar (SAR) imaging systems have become a solution for allowing rapid response to disasters. A key goal of the onboard SAR imaging system design is to achieve high real-time processing performance with severe size, weight, and power consumption constraints. In this paper, we analyse the computational burden of the commonly used chirp scaling (CS) SAR imaging algorithm. To reduce the system hardware cost, we propose a partial fixed-point processing scheme. The fast Fourier transform (FFT), which is the most computation-sensitive operation in the CS algorithm, is processed with fixed-point, while other operations are processed with single precision floating-point. With the proposed fixed-point processing error propagation model, the fixed-point processing word length is determined. The fidelity and accuracy relative to conventional ground-based software processors is verified by evaluating both the point target imaging quality and the actual scene imaging quality. As a proof of concept, a field- programmable gate array—application-specific integrated circuit (FPGA-ASIC) hybrid heterogeneous parallel accelerating architecture is designed and realized. The customized fixed-point FFT is implemented using the 130 nm complementary metal oxide semiconductor (CMOS) technology as a co-processor of the Xilinx xc6vlx760t FPGA. A single processing board requires 12 s and consumes 21 W to focus a 50-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384. PMID:28672813
Yang, Chen; Li, Bingyi; Chen, Liang; Wei, Chunpeng; Xie, Yizhuang; Chen, He; Yu, Wenyue
2017-06-24
With the development of satellite load technology and very large scale integrated (VLSI) circuit technology, onboard real-time synthetic aperture radar (SAR) imaging systems have become a solution for allowing rapid response to disasters. A key goal of the onboard SAR imaging system design is to achieve high real-time processing performance with severe size, weight, and power consumption constraints. In this paper, we analyse the computational burden of the commonly used chirp scaling (CS) SAR imaging algorithm. To reduce the system hardware cost, we propose a partial fixed-point processing scheme. The fast Fourier transform (FFT), which is the most computation-sensitive operation in the CS algorithm, is processed with fixed-point, while other operations are processed with single precision floating-point. With the proposed fixed-point processing error propagation model, the fixed-point processing word length is determined. The fidelity and accuracy relative to conventional ground-based software processors is verified by evaluating both the point target imaging quality and the actual scene imaging quality. As a proof of concept, a field- programmable gate array-application-specific integrated circuit (FPGA-ASIC) hybrid heterogeneous parallel accelerating architecture is designed and realized. The customized fixed-point FFT is implemented using the 130 nm complementary metal oxide semiconductor (CMOS) technology as a co-processor of the Xilinx xc6vlx760t FPGA. A single processing board requires 12 s and consumes 21 W to focus a 50-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384.
Trimmed autocalibrating k-space estimation based on structured matrix completion.
Bydder, Mark; Rapacchi, Stanislas; Girard, Olivier; Guye, Maxime; Ranjeva, Jean-Philippe
2017-11-01
Parallel imaging allows the reconstruction of undersampled data from multiple coils. This provides a means to reject and regenerate corrupt data (e.g. from motion artefact). The purpose of this work is to approach this problem using the SAKE parallel imaging method. Parallel imaging methods typically require calibration by fully sampling the center of k-space. This is a challenge in the presence of corrupted data, since the calibration data may be corrupted which leads to an errors-in-variables problem that cannot be solved by least squares or even iteratively reweighted least squares. The SAKE method, based on matrix completion and structured low rank approximation, was modified to detect and trim these errors from the data. Simulated and actual corrupted datasets were reconstructed with SAKE, the proposed approach and a more standard reconstruction method (based on solving a linear equation) with a data rejection criterion. The proposed approach was found to reduce artefacts considerably in comparison to the other two methods. SAKE with data trimming improves on previous methods for reconstructing images from grossly corrupted data. Copyright © 2017 Elsevier Inc. All rights reserved.
Wu, Yiping; Liu, Shuguang; Li, Zhengpeng; Dahal, Devendra; Young, Claudia J.; Schmidt, Gail L.; Liu, Jinxun; Davis, Brian; Sohl, Terry L.; Werner, Jeremy M.; Oeding, Jennifer
2014-01-01
Process-oriented ecological models are frequently used for predicting potential impacts of global changes such as climate and land-cover changes, which can be useful for policy making. It is critical but challenging to automatically derive optimal parameter values at different scales, especially at regional scale, and validate the model performance. In this study, we developed an automatic calibration (auto-calibration) function for a well-established biogeochemical model—the General Ensemble Biogeochemical Modeling System (GEMS)-Erosion Deposition Carbon Model (EDCM)—using data assimilation technique: the Shuffled Complex Evolution algorithm and a model-inversion R package—Flexible Modeling Environment (FME). The new functionality can support multi-parameter and multi-objective auto-calibration of EDCM at the both pixel and regional levels. We also developed a post-processing procedure for GEMS to provide options to save the pixel-based or aggregated county-land cover specific parameter values for subsequent simulations. In our case study, we successfully applied the updated model (EDCM-Auto) for a single crop pixel with a corn–wheat rotation and a large ecological region (Level II)—Central USA Plains. The evaluation results indicate that EDCM-Auto is applicable at multiple scales and is capable to handle land cover changes (e.g., crop rotations). The model also performs well in capturing the spatial pattern of grain yield production for crops and net primary production (NPP) for other ecosystems across the region, which is a good example for implementing calibration and validation of ecological models with readily available survey data (grain yield) and remote sensing data (NPP) at regional and national levels. The developed platform for auto-calibration can be readily expanded to incorporate other model inversion algorithms and potential R packages, and also be applied to other ecological models.
NASA Astrophysics Data System (ADS)
Zhang, Y. Y.; Shao, Q. X.; Ye, A. Z.; Xing, H. T.; Xia, J.
2016-02-01
Integrated water system modeling is a feasible approach to understanding severe water crises in the world and promoting the implementation of integrated river basin management. In this study, a classic hydrological model (the time variant gain model: TVGM) was extended to an integrated water system model by coupling multiple water-related processes in hydrology, biogeochemistry, water quality, and ecology, and considering the interference of human activities. A parameter analysis tool, which included sensitivity analysis, autocalibration and model performance evaluation, was developed to improve modeling efficiency. To demonstrate the model performances, the Shaying River catchment, which is the largest highly regulated and heavily polluted tributary of the Huai River basin in China, was selected as the case study area. The model performances were evaluated on the key water-related components including runoff, water quality, diffuse pollution load (or nonpoint sources) and crop yield. Results showed that our proposed model simulated most components reasonably well. The simulated daily runoff at most regulated and less-regulated stations matched well with the observations. The average correlation coefficient and Nash-Sutcliffe efficiency were 0.85 and 0.70, respectively. Both the simulated low and high flows at most stations were improved when the dam regulation was considered. The daily ammonium-nitrogen (NH4-N) concentration was also well captured with the average correlation coefficient of 0.67. Furthermore, the diffuse source load of NH4-N and the corn yield were reasonably simulated at the administrative region scale. This integrated water system model is expected to improve the simulation performances with extension to more model functionalities, and to provide a scientific basis for the implementation in integrated river basin managements.
NASA Astrophysics Data System (ADS)
Sayehvand, Habib-Olah; Basiri Parsa, Amir
Numerical investigation the problem of nanofluid heat and mass transfer in a channel partially filled with a porous medium in the presence of uniform magnetic field is carried out by a new computational iterative approach known as the spectral local linearization method (SLLM). The similarity solution is used to reduce the governing system of partial differential equations to a set of nonlinear ordinary differential equations which are then solved by SLLM and validity of our solutions is verified by the numerical results (fourth-order Runge-Kutta scheme with the shooting method). In modeling the flow in the channel, the effects of flow inertia, Brinkman friction, nanoparticles concentration and thickness of the porous region are taken into account. The results are obtained for velocity, temperature, concentration, skin friction, Nusselt number and Sherwood number. Also, effects of active parameters such as viscosity parameter, Hartmann number, Darcy number, Prandtl number, Schmidt number, Eckert number, Brownian motion parameter, thermophoresis parameter and the thickness of porous region on the hydrodynamics, heat and mass transfer behaviors are investigated.
Carter, Shelly L.; Karanes, Chatchada; Costa, Luciano J.; Wu, Juan; Devine, Steven M.; Wingard, John R.; Aljitawi, Omar S.; Cutler, Corey S.; Jagasia, Madan H.; Ballen, Karen K.; Eapen, Mary; O'Donnell, Paul V.
2011-01-01
The Blood and Marrow Transplant Clinical Trials Network conducted 2 parallel multicenter phase 2 trials for individuals with leukemia or lymphoma and no suitable related donor. Reduced intensity conditioning (RIC) was used with either unrelated double umbilical cord blood (dUCB) or HLA-haploidentical related donor bone marrow (Haplo-marrow) transplantation. For both trials, the transplantation conditioning regimen incorporated cyclophosphamide, fludarabine, and 200 cGy of total body irradiation. The 1-year probabilities of overall and progression-free survival were 54% and 46%, respectively, after dUCB transplantation (n = 50) and 62% and 48%, respectively, after Haplo-marrow transplantation (n = 50). The day +56 cumulative incidence of neutrophil recovery was 94% after dUCB and 96% after Haplo-marrow transplantation. The 100-day cumulative incidence of grade II-IV acute GVHD was 40% after dUCB and 32% after Haplo-marrow transplantation. The 1-year cumulative incidences of nonrelapse mortality and relapse after dUCB transplantation were 24% and 31%, respectively, with corresponding results of 7% and 45%, respectively, after Haplo-marrow transplantation. These multicenter studies confirm the utility of dUCB and Haplo-marrow as alternative donor sources and set the stage for a multicenter randomized clinical trial to assess the relative efficacy of these 2 strategies. The trials are registered at www.clinicaltrials.gov under NCT00864227 (BMT CTN 0604) and NCT00849147 (BMT CTN 0603). PMID:21527516
Treveaven, P.
1989-01-01
This book presents an introduction to object-oriented, functional, and logic parallel computing on which the fifth generation of computer systems will be based. Coverage includes concepts for parallel computing languages, a parallel object-oriented system (DOOM) and its language (POOL), an object-oriented multilevel VLSI simulator using POOL, and implementation of lazy functional languages on parallel architectures.
Krogh, M.; Painter, J.; Hansen, C.
1996-10-01
Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
NASA Astrophysics Data System (ADS)
Huberman, Bernardo A.
1989-11-01
This paper reviews three different aspects of parallel computation which are useful for physics. The first part deals with special architectures for parallel computing (SIMD and MIMD machines) and their differences, with examples of their uses. The second section discusses the speedup that can be achieved in parallel computation and the constraints generated by the issues of communication and synchrony. The third part describes computation by distributed networks of powerful workstations without global controls and the issues involved in understanding their behavior.
Nael, Kambiz; Villablanca, J Pablo; Pope, Whitney B; McNamara, Thomas O; Laub, Gerhard; Finn, J Paul
2007-02-01
To prospectively use 3.0-T breath-hold high-spatial-resolution contrast material-enhanced magnetic resonance (MR) angiography with highly accelerated parallel acquisition to image the supraaortic arteries of patients suspected of having arterial occlusive disease. Institutional review board approval and written informed consent were obtained for this HIPAA-compliant study. Eighty patients (44 men, 36 women; age range, 44-90 years) underwent contrast-enhanced MR angiography of the head and neck at 3.0 T with an eight-channel neurovascular array coil. By applying a generalized autocalibrating partially parallel acquisition algorithm with an acceleration factor of four, high-spatial-resolution (0.7 x 0.7 x 0.9 mm = 0.44-mm(3) voxels) three-dimensional contrast-enhanced MR angiography was performed during a 20-second breath hold. Two neuroradiologists evaluated vascular image quality and arterial stenoses. Interobserver variability was tested with the kappa coefficient. Quantitation of stenosis at MR angiography was compared with that at digital subtraction angiography (DSA) (n = 13) and computed tomographic (CT) angiography (n = 12) with Spearman rank correlation coefficient (R(s)). Arterial stenoses were detected with contrast-enhanced MR angiography in 208 (reader 1) and 218 (reader 2) segments, with excellent interobserver agreement (kappa = 0.80). There was a significant correlation between contrast-enhanced MR angiography and CT angiography (R(s) = 0.95, reader 1; R(s) = 0.87, reader 2) and between contrast-enhanced MR angiography and DSA (R(s) = 0.94, reader 1; R(s) = 0.92, reader 2) for the degree of stenosis. Sensitivity and specificity of contrast-enhanced MR angiography for detection of arterial stenoses greater than 50% were 94% and 98% for reader 1 and 100% and 98% for reader 2, with DSA as the standard of reference. Vascular image quality was sufficient for diagnosis or excellent for 97% of arterial segments evaluated. By using highly accelerated parallel
SPIRiT: Iterative Self-consistent Parallel Imaging Reconstruction from Arbitrary k-Space
Lustig, Michael; Pauly, John M.
2010-01-01
A new approach to autocalibrating, coil-by-coil parallel imaging reconstruction is presented. It is a generalized reconstruction framework based on self consistency. The reconstruction problem is formulated as an optimization that yields the most consistent solution with the calibration and acquisition data. The approach is general and can accurately reconstruct images from arbitrary k-space sampling patterns. The formulation can flexibly incorporate additional image priors such as off-resonance correction and regularization terms that appear in compressed sensing. Several iterative strategies to solve the posed reconstruction problem in both image and k-space domain are presented. These are based on a projection over convex sets (POCS) and a conjugate gradient (CG) algorithms. Phantom and in-vivo studies demonstrate efficient reconstructions from undersampled Cartesian and spiral trajectories. Reconstructions that include off-resonance correction and nonlinear ℓ1-wavelet regularization are also demonstrated. PMID:20665790
Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.
1995-05-01
Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.
Dinç, Erdal; Ertekin, Zehra Ceren
2016-01-01
An application of parallel factor analysis (PARAFAC) and three-way partial least squares (3W-PLS1) regression models to ultra-performance liquid chromatography-photodiode array detection (UPLC-PDA) data with co-eluted peaks in the same wavelength and time regions was described for the multicomponent quantitation of hydrochlorothiazide (HCT) and olmesartan medoxomil (OLM) in tablets. Three-way dataset of HCT and OLM in their binary mixtures containing telmisartan (IS) as an internal standard was recorded with a UPLC-PDA instrument. Firstly, the PARAFAC algorithm was applied for the decomposition of three-way UPLC-PDA data into the chromatographic, spectral and concentration profiles to quantify the concerned compounds. Secondly, 3W-PLS1 approach was subjected to the decomposition of a tensor consisting of three-way UPLC-PDA data into a set of triads to build 3W-PLS1 regression for the analysis of the same compounds in samples. For the proposed three-way analysis methods in the regression and prediction steps, the applicability and validity of PARAFAC and 3W-PLS1 models were checked by analyzing the synthetic mixture samples, inter-day and intra-day samples, and standard addition samples containing HCT and OLM. Two different three-way analysis methods, PARAFAC and 3W-PLS1, were successfully applied to the quantitative estimation of the solid dosage form containing HCT and OLM. Regression and prediction results provided from three-way analysis were compared with those obtained by traditional UPLC method.
Wald, Ingo; Ize, Santiago
2015-07-28
Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
Parallel machines: Parallel machine languages
Iannucci, R.A. )
1990-01-01
This book presents a framework for understanding the tradeoffs between the conventional view and the dataflow view with the objective of discovering the critical hardware structures which must be present in any scalable, general-purpose parallel computer to effectively tolerate latency and synchronization costs. The author presents an approach to scalable general purpose parallel computation. Linguistic Concerns, Compiling Issues, Intermediate Language Issues, and hardware/technological constraints are presented as a combined approach to architectural Develoement. This book presents the notion of a parallel machine language.
Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.
1995-09-01
In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.
Custom-fitted 16-channel bilateral breast coil for bidirectional parallel imaging.
Nnewihe, Anderson N; Grafendorfer, Thomas; Daniel, Bruce L; Calderon, Paul; Alley, Marcus T; Robb, Fraser; Hargreaves, Brian A
2011-07-01
A 16-channel receive-only, closely fitted array coil is described and tested in vivo for bilateral breast imaging at 3 T. The primary purpose of this coil is to provide high signal-to-noise ratio and parallel imaging acceleration in two directions for breast MRI. Circular coil elements (7.5-cm diameter) were placed on a closed "cup-shaped" platform, and nearest neighbor coils were decoupled through geometric overlap. Comparisons were made between the 16-channel custom coil and a commercially available 8-channel coil. SENSitivity Encoding (SENSE) parallel imaging noise amplification (g-factor) was evaluated in phantom scans. In healthy volunteers, we compared signal-to-noise ratio, parallel imaging in one and two directions, Autocalibrating Reconstruction for Cartesian sampling (ARC) g-factor, and high spatial resolution imaging. When compared with a commercially available 8-channel coil, the 16-channel custom coil shows 3.6× higher mean signal-to-noise ratio in the breast and higher quality accelerated images. In patients, the 16-channel custom coil has facilitated high-quality, high-resolution images with bidirectional acceleration of R = 6.3. Copyright © 2010 Wiley-Liss, Inc.
Saugel, Bernd; Meidert, Agnes S; Langwieser, Nicolas; Wagner, Julia Y; Fassio, Florian; Hapfelmeier, Alexander; Prechtl, Luisa M; Huber, Wolfgang; Schmid, Roland M; Gödje, Oliver
2014-08-01
We aimed to describe and evaluate an autocalibrating algorithm for determination of cardiac output (CO) based on the analysis of an arterial pressure (AP) waveform recorded using radial artery applanation tonometry (AT) in a continuous non-invasive manner. To exemplarily describe and evaluate the CO algorithm, we deliberately selected 22 intensive care unit patients with impeccable AP waveforms from a database including AP data obtained with AT (T-Line system; Tensys Medical Inc.). When recording AP data for this prospectively maintained database, we had simultaneously noted CO measurements obtained from just calibrated pulse contour analysis (PiCCO system; Pulsion Medical Systems) every minute. We applied the autocalibrating CO algorithm to the AT-derived AP waveforms and noted the computed CO values every minute during a total of 15 min of data recording per patient (3 × 5-min intervals). These 330 AT-derived CO (AT-CO) values were then statistically compared to the corresponding pulse contour CO (PC-CO) values. Mean ± standard deviation for PC-CO and AT-CO was 7.0 ± 2.0 and 6.9 ± 2.1 L/min, respectively. The coefficient of variation for PC-CO and AT-CO was 0.280 and 0.299, respectively. Bland-Altman analysis demonstrated a bias of +0.1 L/min (standard deviation 0.8 L/min; 95% limits of agreement -1.5 to 1.7 L/min, percentage error 23%). CO can be computed based on the analysis of the AP waveform recorded with AT. In the selected patients included in this pilot analysis, a percentage error of 23% indicates clinically acceptable agreement between AT-CO and PC-CO.
Li, Shu; Chan, Cheong; Stockmann, Jason P; Tagare, Hemant; Adluru, Ganesh; Tam, Leo K; Galiana, Gigi; Constable, R Todd; Kozerke, Sebastian; Peters, Dana C
2015-04-01
To investigate algebraic reconstruction technique (ART) for parallel imaging reconstruction of radial data, applied to accelerated cardiac cine. A graphics processing unit (GPU)-accelerated ART reconstruction was implemented and applied to simulations, point spread functions and in 12 subjects imaged with radial cardiac cine acquisitions. Cine images were reconstructed with radial ART at multiple undersampling levels (192 Nr × Np = 96 to 16). Images were qualitatively and quantitatively analyzed for sharpness and artifacts, and compared to filtered back-projection, and conjugate gradient SENSE. Radial ART provided reduced artifacts and mainly preserved spatial resolution, for both simulations and in vivo data. Artifacts were qualitatively and quantitatively less with ART than filtered back-projection using 48, 32, and 24 Np , although filtered back-projection provided quantitatively sharper images at undersampling levels of 48-24 Np (all P < 0.05). Use of undersampled radial data for generating auto-calibrated coil-sensitivity profiles resulted in slightly reduced quality. ART was comparable to conjugate gradient SENSE. GPU-acceleration increased ART reconstruction speed 15-fold, with little impact on the images. GPU-accelerated ART is an alternative approach to image reconstruction for parallel radial MR imaging, providing reduced artifacts while mainly maintaining sharpness compared to filtered back-projection, as shown by its first application in cardiac studies. © 2014 Wiley Periodicals, Inc.
Li, Shu; Chan, Cheong; Stockmann, Jason P.; Tagare, Hemant; Adluru, Ganesh; Tam, Leo K.; Galiana, Gigi; Constable, R. Todd; Kozerke, Sebastian; Peters, Dana C.
2014-01-01
Purpose To investigate algebraic reconstruction technique (ART) for parallel imaging reconstruction of radial data, applied to accelerated cardiac cine. Methods A GPU-accelerated ART reconstruction was implemented and applied to simulations, point spread functions (PSF) and in twelve subjects imaged with radial cardiac cine acquisitions. Cine images were reconstructed with radial ART at multiple undersampling levels (192 Nr x Np = 96 to 16). Images were qualitatively and quantitatively analyzed for sharpness and artifacts, and compared to filtered back-projection (FBP), and conjugate gradient SENSE (CG SENSE). Results Radial ART provided reduced artifacts and mainly preserved spatial resolution, for both simulations and in vivo data. Artifacts were qualitatively and quantitatively less with ART than FBP using 48, 32, and 24 Np, although FBP provided quantitatively sharper images at undersampling levels of 48-24 Np (all p<0.05). Use of undersampled radial data for generating auto-calibrated coil-sensitivity profiles resulted in slightly reduced quality. ART was comparable to CG SENSE. GPU-acceleration increased ART reconstruction speed 15-fold, with little impact on the images. Conclusion GPU-accelerated ART is an alternative approach to image reconstruction for parallel radial MR imaging, providing reduced artifacts while mainly maintaining sharpness compared to FBP, as shown by its first application in cardiac studies. PMID:24753213
Parallel processing of natural language
Chang, H.O.
1986-01-01
Two types of parallel natural language processing are studied in this work: (1) the parallelism between syntactic and nonsyntactic processing and (2) the parallelism within syntactic processing. It is recognized that a syntactic category can potentially be attached to more than one node in the syntactic tree of a sentence. Even if all the attachments are syntactically well-formed, nonsyntactic factors such as semantic and pragmatic consideration may require one particular attachment. Syntactic processing must synchronize and communicate with nonsyntactic processing. Two syntactic processing algorithms are proposed for use in a parallel environment: Early's algorithm and the LR(k) algorithm. Conditions are identified to detect the syntactic ambiguity and the algorithms are augmented accordingly. It is shown that by using nonsyntactic information during syntactic processing, backtracking can be reduced, and the performance of the syntactic processor is improved. For the second type of parallelism, it is recognized that one portion of a grammar can be isolated from the rest of the grammar and be processed by a separate processor. A partial grammar of a larger grammar is defined. Parallel syntactic processing is achieved by using two processors concurrently: the main processor (mp) and the two processors concurrently: the main processor (mp) and the auxiliary processor (ap).
Parallel pivoting combined with parallel reduction
NASA Technical Reports Server (NTRS)
Alaghband, Gita
1987-01-01
Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.
Wong, Kevin; Levi, Jessica R
2017-03-01
Evaluate the content and readability of health information regarding partial tonsillectomy. A web search was performed using the term partial tonsillectomy in Google, Yahoo!, and Bing. The first 50 websites from each search were evaluated using HONcode standards for quality and content. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease, Gunning-Fog Index, Coleman-Liau Index, Automated Readability Index, and SMOG score. The Freeman-Halton extension of Fisher's exact test was used to compare categorical differences between engines. Less than half of the websites mentioned patient eligibility criteria (43.3%), referenced peer-reviewed literature (43.3%), or provided a procedure description (46.7%). Twenty-two websites (14.7%) were unrelated to partial tonsillectomy, and over half contained advertisements (52%). These finding were consistent across search engines and search terms. The mean FKGL was 11.6 ± 0.11, Gunning-Fog Index was 15.1 ± 0.13, Coleman-Liau Index was 14.6 ± 0.11, ARI was 12.9 ± 0.13, and SMOG grade was 14.0 ± 0.1. All readability levels exceeded the abilities of the average American adult. Current online information regarding partial tonsillectomy may not provide adequate information and may be written at a level too difficult for the average adult reader.
Tolerant (parallel) Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Special parallel processing workshop
1994-12-01
This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.
Iterative algorithms for large sparse linear systems on parallel computers
NASA Technical Reports Server (NTRS)
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Parallel rendering techniques for massively parallel visualization
Hansen, C.; Krogh, M.; Painter, J.
1995-07-01
As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.
Calibrationless Parallel Imaging Reconstruction Based on Structured Low-Rank Matrix Completion
Shin, Peter J.; Larson, Peder E.Z.; Ohliger, Michael A.; Elad, Michael; Pauly, John M.; Vigneron, Daniel B.; Lustig, Michael
2013-01-01
Purpose A calibrationless parallel imaging reconstruction method, termed simultaneous auto-calibrating and k-space estimation (SAKE), is presented. It is a data-driven, coil-by-coil reconstruction method that does not require a separate calibration step for estimating coil sensitivity information. Methods In SAKE, an under-sampled multi-channel dataset is structured into a single data matrix. Then the reconstruction is formulated as a structured low-rank matrix completion problem. An iterative solution that implements a projection-onto-sets algorithm with singular value thresholding is described. Results Reconstruction results are demonstrated for retrospectively and prospectively under-sampled, multi-channel Cartesian data having no calibration signals. Additionally, non-Cartesian data reconstruction is presented. Finally, improved image quality is demonstrated by combining SAKE with wavelet-based compressed sensing. Conclusion As estimation of coil sensitivity information is not needed, the proposed method could potentially benefit MR applications where acquiring accurate calibration data is limiting or not possible at all. PMID:24248734
On the parallel solution of parabolic equations
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Youcef
1989-01-01
Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
Multilist Scheduling. A New Parallel Programming Model.
1993-07-30
fluid simulation [531; differential equation solving such as weather prediction [24, 25]; digital circuit simulation such as gate-level simulation [201...Champaign, 1986. [53] Johnson, C. Numerical Solutions of Partial Differential Equations by the Finite Element Method. Cambridge University Press, 1987. 131...Ortega, J. and Voigt, R. Solution of Partial Differential Equations on Vector and Parallel Computers. SIAM Review, vol. 27 (1985), pp. 149-240. [73
Parallel-In-Time For Moving Meshes
Falgout, R. D.; Manteuffel, T. A.; Southworth, B.; Schroder, J. B.
2016-02-04
With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
Parallel flow diffusion battery
Yeh, Hsu-Chi; Cheng, Yung-Sung
1984-08-07
A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
Parallel flow diffusion battery
Yeh, H.C.; Cheng, Y.S.
1984-01-01
A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
Fan, W.C.; Halbleib, J.A. Sr.
1996-09-01
This report provides a users` guide for parallel processing ITS on a UNIX workstation network, a shared-memory multiprocessor or a massively-parallel processor. The parallelized version of ITS is based on a master/slave model with message passing. Parallel issues such as random number generation, load balancing, and communication software are briefly discussed. Timing results for example problems are presented for demonstration purposes.
Introduction to parallel programming
Brawer, S. )
1989-01-01
This book describes parallel programming and all the basic concepts illustrated by examples in a simplified FORTRAN. Concepts covered include: The parallel programming model; The creation of multiple processes; Memory sharing; Scheduling; Data dependencies. In addition, a number of parallelized applications are presented, including a discrete-time, discrete-event simulator, numerical integration, Gaussian elimination, and parallelized versions of the traveling salesman problem and the exploration of a maze.
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Research in parallel computing
NASA Technical Reports Server (NTRS)
Ortega, James M.; Henderson, Charles
1994-01-01
This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
Parallel algorithm development
Adams, T.F.
1996-06-01
Rapid changes in parallel computing technology are causing significant changes in the strategies being used for parallel algorithm development. One approach is simply to write computer code in a standard language like FORTRAN 77 or with the expectation that the compiler will produce executable code that will run in parallel. The alternatives are: (1) to build explicit message passing directly into the source code; or (2) to write source code without explicit reference to message passing or parallelism, but use a general communications library to provide efficient parallel execution. Application of these strategies is illustrated with examples of codes currently under development.
Parallel Atomistic Simulations
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Parallel Adaptive Mesh Refinement
Diachin, L; Hornung, R; Plassmann, P; WIssink, A
2005-03-04
As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the
Holmes, James H; Beatty, Philip J; Rowley, Howard A; Li, Zhiqiang; Gaddipati, Ajeetkumar; Zhao, Xiaoli; Busse, Reed F; Brittain, Jean H
2012-12-01
Patient motion is a common challenge in the clinical setting and fast spin echo longitudinal relaxation time fluid attenuating inversion recovery imaging method with motion correction would be highly desirable. The motion correction provided by transverse relaxation time- and diffusion-weighted periodically rotated overlapping parallel lines with enhanced reconstruction methods has seen significant clinical adoption. However, periodically rotated overlapping parallel lines with enhanced reconstruction with fast spin echo longitudinal relaxation time fluid attenuating inversion recovery-weighting has proved challenging since motion correction requires wide blades that are difficult to acquire while also maintaining short echo train lengths that are optimal for longitudinal relaxation time fluid attenuating inversion recovery-weighting. Parallel imaging provides an opportunity to increase the effective blade width for a given echo train lengths. Coil-by-coil data-driven autocalibrated parallel imaging methods provide greater robustness in the event of motion compared to techniques relying on accurate coil sensitivity maps. However, conventional internally calibrated data-driven parallel imaging methods limit the effective acceleration possible for each blade. We present a method to share a single calibration dataset over all imaging blades on a slice by slice basis using the APPEAR non-cartesian parallel imaging method providing an effective blade width increase of 2.45×, enabling robust motion correction. Results comparing the proposed technique to conventional cartesian and periodically rotated overlapping parallel lines with enhanced reconstruction methods demonstrate a significant improvement during subject motion and maintaining high image quality when no motion is present in normal and clinical volunteers. Copyright © 2012 Wiley Periodicals, Inc.
Visualization and Tracking of Parallel CFD Simulations
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kremenetsky, Mark
1995-01-01
We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Parallel digital forensics infrastructure.
Liebrock, Lorie M.; Duggan, David Patrick
2009-10-01
This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.
Languages for parallel architectures
Bakker, J.W.
1989-01-01
This book presents mathematical methods for modelling parallel computer architectures, based on the results of ESPRIT's project 415 on computer languages for parallel architectures. Presented are investigations incorporating a wide variety of programming styles, including functional,logic, and object-oriented paradigms. Topics cover include Philips's parallel object-oriented language POOL, lazy-functional languages, the languages IDEAL, K-LEAF, FP2, and Petri-net semantics for the AADL language.
Introduction to Parallel Computing
1992-05-01
Topology C, Ada, C++, Data-parallel FORTRAN, 2D mesh of node boards, each node FORTRAN-90 (late 1992) board has 1 application processor Devopment Tools ...parallel machines become the wave of the present, tools are increasingly needed to assist programmers in creating parallel tasks and coordinating...their activities. Linda was designed to be such a tool . Linda was designed with three important goals in mind: to be portable, efficient, and easy to use
Parallel Wolff Cluster Algorithms
NASA Astrophysics Data System (ADS)
Bae, S.; Ko, S. H.; Coddington, P. D.
The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.
Application Portable Parallel Library
NASA Technical Reports Server (NTRS)
Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott
1995-01-01
Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Application Portable Parallel Library
NASA Technical Reports Server (NTRS)
Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott
1995-01-01
Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan
1994-01-01
A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Parallel Algorithms and Patterns
Robey, Robert W.
2016-06-16
This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Parallel preconditioning techniques for sparse CG solvers
Basermann, A.; Reichel, B.; Schelthoff, C.
1996-12-31
Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.
Parallel and Distributed Computing.
1986-12-12
program was devoted to parallel and distributed computing . Support for this part of the program was obtained from the present Army contract and a...Umesh Vazirani. A workshop on parallel and distributed computing was held from May 19 to May 23, 1986 and drew 141 participants. Keywords: Mathematical programming; Protocols; Randomized algorithms. (Author)
Weening, J.S.
1988-05-01
CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.
ERIC Educational Resources Information Center
Mugleston, William F.
2000-01-01
Believes that by focusing on the recurrent situations and problems, or parallels, throughout history, students will understand the relevance of history to their own times and lives. Provides suggestions for parallels in history that may be introduced within lectures or as a means to class discussions. (CMK)
Not Available
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Massively parallel mathematical sieves
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Totally parallel multilevel algorithms
NASA Technical Reports Server (NTRS)
Frederickson, Paul O.
1988-01-01
Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.
... brachytherapy; Accelerated partial breast irradiation - brachytherapy; Partial breast radiation therapy - brachytherapy; Permanent breast seed implant; PBSI; Low-dose radiotherapy - breast; High-dose radiotherapy - breast; Electronic balloon ...
Parallel adaptive wavelet collocation method for PDEs
Nejadmalayeri, Alireza; Vezolainen, Alexei; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allows fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.
Parallel adaptive wavelet collocation method for PDEs
NASA Astrophysics Data System (ADS)
Nejadmalayeri, Alireza; Vezolainen, Alexei; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allows fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 20483 using as many as 2048 CPU cores.
Bilingual parallel programming
Foster, I.; Overbeek, R.
1990-01-01
Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach provides and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.
A parallel Lanczos method for symmetric generalized eigenvalue problems
Wu, K.; Simon, H.D.
1997-12-01
Lanczos algorithm is a very effective method for finding extreme eigenvalues of symmetric matrices. It requires less arithmetic operations than similar algorithms, such as, the Arnoldi method. In this paper, the authors present their parallel version of the Lanczos method for symmetric generalized eigenvalue problem, PLANSO. PLANSO is based on a sequential package called LANSO which implements the Lanczos algorithm with partial re-orthogonalization. It is portable to all parallel machines that support MPI and easy to interface with most parallel computing packages. Through numerical experiments, they demonstrate that it achieves similar parallel efficiency as PARPACK, but uses considerably less time.
Some fast elliptic solvers on parallel architectures and their complexities
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Youcef
1989-01-01
The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR.
Some fast elliptic solvers on parallel architectures and their complexities
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Y.
1989-01-01
The discretization of separable elliptic partial differential equations leads to linear systems with special block tridiagonal matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconstant coefficients. A method was recently proposed to parallelize and vectorize BCR. In this paper, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational compelxity lower than that of parallel BCR.
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
ERIC Educational Resources Information Center
Rogers, Pat
1972-01-01
Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)
NASA Astrophysics Data System (ADS)
2014-10-01
Adam Nelson and Stuart Warriner, from the University of Leeds, talk with Nature Chemistry about their work to develop viable synthetic strategies for preparing new chemical structures in parallel with the identification of desirable biological activity.
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
Foster, I.; Tuecke, S.
1991-09-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, a set of tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory at info.mcs.anl.gov.
ERIC Educational Resources Information Center
Rogers, Pat
1972-01-01
Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)
Scalable parallel communications
NASA Technical Reports Server (NTRS)
Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.
1992-01-01
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth
Revisiting and parallelizing SHAKE
NASA Astrophysics Data System (ADS)
Weinbach, Yael; Elber, Ron
2005-10-01
An algorithm is presented for running SHAKE in parallel. SHAKE is a widely used approach to compute molecular dynamics trajectories with constraints. An essential step in SHAKE is the solution of a sparse linear problem of the type Ax = b, where x is a vector of unknowns. Conjugate gradient minimization (that can be done in parallel) replaces the widely used iteration process that is inherently serial. Numerical examples present good load balancing and are limited only by communication time.
NASA Technical Reports Server (NTRS)
Reif, John H.
1987-01-01
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
A 2D MTF approach to evaluate and guide dynamic imaging developments.
Chao, Tzu-Cheng; Chung, Hsiao-Wen; Hoge, W Scott; Madore, Bruno
2010-02-01
As the number and complexity of partially sampled dynamic imaging methods continue to increase, reliable strategies to evaluate performance may prove most useful. In the present work, an analytical framework to evaluate given reconstruction methods is presented. A perturbation algorithm allows the proposed evaluation scheme to perform robustly without requiring knowledge about the inner workings of the method being evaluated. A main output of the evaluation process consists of a two-dimensional modulation transfer function, an easy-to-interpret visual rendering of a method's ability to capture all combinations of spatial and temporal frequencies. Approaches to evaluate noise properties and artifact content at all spatial and temporal frequencies are also proposed. One fully sampled phantom and three fully sampled cardiac cine datasets were subsampled (R = 4 and 8) and reconstructed with the different methods tested here. A hybrid method, which combines the main advantageous features observed in our assessments, was proposed and tested in a cardiac cine application, with acceleration factors of 3.5 and 6.3 (skip factors of 4 and 8, respectively). This approach combines features from methods such as k-t sensitivity encoding, unaliasing by Fourier encoding the overlaps in the temporal dimension-sensitivity encoding, generalized autocalibrating partially parallel acquisition, sensitivity profiles from an array of coils for encoding and reconstruction in parallel, self, hybrid referencing with unaliasing by Fourier encoding the overlaps in the temporal dimension and generalized autocalibrating partially parallel acquisition, and generalized autocalibrating partially parallel acquisition-enhanced sensitivity maps for sensitivity encoding reconstructions.
Sublattice parallel replica dynamics
NASA Astrophysics Data System (ADS)
Martínez, Enrique; Uberuaga, Blas P.; Voter, Arthur F.
2014-06-01
Exascale computing presents a challenge for the scientific community as new algorithms must be developed to take full advantage of the new computing paradigm. Atomistic simulation methods that offer full fidelity to the underlying potential, i.e., molecular dynamics (MD) and parallel replica dynamics, fail to use the whole machine speedup, leaving a region in time and sample size space that is unattainable with current algorithms. In this paper, we present an extension of the parallel replica dynamics algorithm [A. F. Voter, Phys. Rev. B 57, R13985 (1998), 10.1103/PhysRevB.57.R13985] by combining it with the synchronous sublattice approach of Shim and Amar [Y. Shim and J. G. Amar, Phys. Rev. B 71, 125432 (2005), 10.1103/PhysRevB.71.125432], thereby exploiting event locality to improve the algorithm scalability. This algorithm is based on a domain decomposition in which events happen independently in different regions in the sample. We develop an analytical expression for the speedup given by this sublattice parallel replica dynamics algorithm and compare it with parallel MD and traditional parallel replica dynamics. We demonstrate how this algorithm, which introduces a slight additional approximation of event locality, enables the study of physical systems unreachable with traditional methodologies and promises to better utilize the resources of current high performance and future exascale computers.
Parallel time integration software
2014-07-01
This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.
Parallel architectures for vision
Maresca, M. ); Lavin, M.A. ); Li, H. )
1988-08-01
Vision computing involves the execution of a large number of operations on large sets of structured data. Sequential computers cannot achieve the speed required by most of the current applications and therefore parallel architectural solutions have to be explored. In this paper the authors examine the options that drive the design of a vision oriented computer, starting with the analysis of the basic vision computation and communication requirements. They briefly review the classical taxonomy for parallel computers, based on the multiplicity of the instruction and data stream, and apply a recently proposed criterion, the degree of autonomy of each processor, to further classify fine-grain SIMD massively parallel computers. They identify three types of processor autonomy, namely operation autonomy, addressing autonomy, and connection autonomy. For each type they give the basic definitions and show some examples. They focus on the concept of connection autonomy, which they believe is a key point in the development of massively parallel architectures for vision. They show two examples of parallel computers featuring different types of connection autonomy - the Connection Machine and the Polymorphic-Torus - and compare their cost and benefit.
Parallel algorithms for the spectral transform method
Foster, I.T.; Worley, P.H.
1994-04-01
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, we describe these different parallel algorithms and report on computational experiments that we have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations or a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. We focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional FFTs and other parallel transforms.
Parallel algorithms for the spectral transform method
Foster, I.T.; Worley, P.H.
1997-05-01
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, the authors describe these different parallel algorithms and report on computational experiments that they have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations on a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. The authors focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but they also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional fast Fourier transforms (FFTs) and other parallel transforms.
A Minimal Solution to Radial Distortion Autocalibration.
Kukelova, Zuzana; Pajdla, Tomas
2011-12-01
Simultaneous estimation of radial distortion, epipolar geometry, and relative camera pose can be formulated as a minimal problem and solved from a minimal number of image points. Finding the solution to this problem leads to solving a system of algebraic equations. In this paper, we provide two different solutions to the problem of estimating radial distortion and epipolar geometry from eight point correspondences in two images. Unlike previous algorithms which were able to solve the problem from nine correspondences only, we enforce the determinant of the fundamental matrix be zero. This leads to a system of eight quadratic and one cubic equation in nine variables. We first simplify this system by eliminating six of these variables and then solve the system by two alternative techniques. The first one is based on the Gröbner basis method and the second one on the polynomial eigenvalue computation. We demonstrate that our solutions are efficient, robust, and practical by experiments on synthetic and real data.
Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A
2014-05-20
An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.
NASA Technical Reports Server (NTRS)
Vranish, John M. (Inventor)
2010-01-01
A partial gear bearing including an upper half, comprising peak partial teeth, and a lower, or bottom, half, comprising valley partial teeth. The upper half also has an integrated roller section between each of the peak partial teeth with a radius equal to the gear pitch radius of the radially outwardly extending peak partial teeth. Conversely, the lower half has an integrated roller section between each of the valley half teeth with a radius also equal to the gear pitch radius of the peak partial teeth. The valley partial teeth extend radially inwardly from its roller section. The peak and valley partial teeth are exactly out of phase with each other, as are the roller sections of the upper and lower halves. Essentially, the end roller bearing of the typical gear bearing has been integrated into the normal gear tooth pattern.
... Jacksonian seizure; Seizure - partial (focal); Temporal lobe seizure; Epilepsy - partial seizures ... Abou-Khalil BW, Gallagher MJ, Macdonald RL. Epilepsies. In: Daroff ... Practice . 7th ed. Philadelphia, PA: Elsevier; 2016:chap 101. ...
Parallel architectures for iterative methods on adaptive, block structured grids
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1983-01-01
A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Khabibrakhmanov, I.K. ); Galeev, A.A.; Galinsky, V.L. )
1993-02-01
A collisionless parallel shock model is presented which is based on solitary-type solutions of the modified derivative nonlinear Schrodinger equation (MDNLS) for parallel Alfven waves. We generalize the standard derivative nonlinear Schrodinger equation in order to include the possible anisotropy of the plasma distribution function and higher-order Korteweg-de Vies type dispersion. Stationary solutions of MDNLS are discussed. The new mechanism, which can be called [open quote]adiabatic[close quote] of ion reflection from the magnetic mirror of the parallel shock structure is the natural and essential feature of the parallel shock that introduces the irreversible properties into the nonlinear wave structure and may significantly contribute to the plasma heating upstream as well as downstream of the shock. The anisotropic nature of [open quotes]adiabatic[close quotes] reflections leads to the asymmetric particle distribution in the upstream as well in the downstream regions of the shock. As a result, nonzero heat flux appears near the front of the shock. It is shown that this causes the stochastic behavior of the nonlinear waves which can significantly contribute to the shock thermalization. The number of adiabaticaly reflected ions define the threshold conditions of the fire-hose and mirror type instabilities in the downstream and upstream regions and thus determine a parameter region in which the described laminar parallel shock structure can exist. The threshold conditions for the fire hose and mirror-type instabilities in the downstream and upstream regions of the shock are defined by the number of reflected particles and thus determine a parameter region in which the described laminar parallel shock structure can exist. 29 refs., 4 figs.
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Among the highly parallel computing architectures required for advanced scientific computation, those designated 'MIMD' and 'SIMD' have yielded the best results to date. The present development status evaluation of such architectures shown neither to have attained a decisive advantage in most near-homogeneous problems' treatment; in the cases of problems involving numerous dissimilar parts, however, such currently speculative architectures as 'neural networks' or 'data flow' machines may be entailed. Data flow computers are the most practical form of MIMD fine-grained parallel computers yet conceived; they automatically solve the problem of assigning virtual processors to the real processors in the machine.
NASA Astrophysics Data System (ADS)
Ji, Jeong-Young; Lee, Hankyu Q.; Held, Eric D.
2017-02-01
Ion parallel closures are obtained for arbitrary atomic weights and charge numbers. For arbitrary collisionality, the heat flow and viscosity are expressed as kernel-weighted integrals of the temperature and flow-velocity gradients. Simple, fitted kernel functions are obtained from the 1600 parallel moment solution and the asymptotic behavior in the collisionless limit. The fitted kernel parameters are tabulated for various temperature ratios of ions to electrons. The closures can be used conveniently without solving the kinetic equation or higher order moment equations in closing ion fluid equations.
Kok, J.
1988-01-01
To the human programmer the ease of coding distributed computing is highly dependent on the suitability of the employed programming language. But with a particular language it is also important whether the possibilities of one or more parallel architectures can efficiently be addressed by available language constructs. In this paper the possibilities are discussed of the high-level language Ada and in particular of its tasking concept as a descriptional tool for the design and implementation of numerical and other algorithms that allow execution of parts in parallel. Language tools are explained and their use for common applications is shown. Conclusions are drawn about the usefulness of several Ada concepts.
Bailey, David H.
2009-11-15
The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage
Speeding up parallel processing
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
Shumaker, Dana E.; Steefel, Carl I.
2016-06-21
The code CRUNCH_PARALLEL is a parallel version of the CRUNCH code. CRUNCH code version 2.0 was previously released by LLNL, (UCRL-CODE-200063). Crunch is a general purpose reactive transport code developed by Carl Steefel and Yabusake (Steefel Yabsaki 1996). The code handles non-isothermal transport and reaction in one, two, and three dimensions. The reaction algorithm is generic in form, handling an arbitrary number of aqueous and surface complexation as well as mineral dissolution/precipitation. A standardized database is used containing thermodynamic and kinetic data. The code includes advective, dispersive, and diffusive transport.
Adaptive parallel logic networks
NASA Technical Reports Server (NTRS)
Martinez, Tony R.; Vidal, Jacques J.
1988-01-01
Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Massively parallel processor computer
NASA Technical Reports Server (NTRS)
Fung, L. W. (Inventor)
1983-01-01
An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Wang, Lin-Wang
2004-10-21
This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.
Parallel hierarchical radiosity rendering
Carter, Michael
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Foster, I.; Tuecke, S.
1993-01-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.
Mühldorfer-Fodor, M; Hohendorff, B; Prommersberger, K-J; van Schoonhoven, J
2011-04-01
For shortening osteotomy, two exactly parallel osteotomies are needed to assure a congruent adaption of the shortened bone after segment resection. This is required for regular bone healing. In addition, it is difficult to shorten a bone to a precise distance using an oblique segment resection. A mobile spacer between two saw blades keeps the distance of the blades exactly parallel during an osteotomy cut. The parallel saw blades from Synthes® are designed for 2, 2.5, 3, 4, and 5 mm shortening distances. Two types of blades are available (e.g., for transverse or oblique osteotomies) to assure precise shortening. Preoperatively, the desired type of osteotomy (transverse or oblique) and the shortening distance has to be determined. Then, the appropriate parallel saw blade is chosen, which is compatible to Synthes® Colibri with an oscillating saw attachment. During the osteotomy cut, the spacer should be kept as close to the bone as possible. Excessive force that may deform the blades should be avoided. Before manipulating the bone ends, it is important to determine that the bone is completely dissected by both saw blades to prevent fracturing of the corticalis with bony spurs. The shortening osteotomy is mainly fixated by plate osteosynthesis. For compression of the bone ends, the screws should be placed eccentrically in the plate holes. For an oblique osteotomy, an additional lag screw should be used.
ERIC Educational Resources Information Center
Friedlander, Alex; And Others
1982-01-01
Several methods of numerical mappings other than the usual cartesian coordinate system are considered. Some examples using parallel axes representation, which are seen to lead to aesthetically pleasing or interesting configurations, are presented. Exercises with alternative representations can stimulate pupil imagination and exploration in…
Parallel Dislocation Simulator
2006-10-30
ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.
Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan
2010-01-01
We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.
Progress in parallelizing XOOPIC
NASA Astrophysics Data System (ADS)
Mardahl, Peter; Verboncoeur, J. P.
1997-11-01
XOOPIC (Object Orient Particle in Cell code for X11-based Unix workstations) is presently a serial 2-D 3v particle-in-cell plasma simulation (J.P. Verboncoeur, A.B. Langdon, and N.T. Gladd, ``An object-oriented electromagnetic PIC code.'' Computer Physics Communications 87 (1995) 199-211.). The present effort focuses on using parallel and distributed processing to optimize the simulation for large problems. The benefits include increased capacity for memory intensive problems, and improved performance for processor-intensive problems. The MPI library is used to enable the parallel version to be easily ported to massively parallel, SMP, and distributed computers. The philosophy employed here is to spatially decompose the system into computational regions separated by 'virtual boundaries', objects which contain the local data and algorithms to perform the local field solve and particle communication between regions. This implementation will reduce the changes required in the rest of the program by parallelization. Specific implementation details such as the hiding of communication latency behind local computation will also be discussed.
Parallel hierarchical global illumination
Snell, Quinn O.
1997-10-08
Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
High performance parallel architectures
Anderson, R.E. )
1989-09-01
In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.
Parallel Multigrid Equation Solver
Adams, Mark
2001-09-07
Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.
Parallel multilevel preconditioners
Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.
1989-01-01
In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.
Homology, convergence and parallelism.
Ghiselin, Michael T
2016-01-05
Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. © 2015 The Author(s).
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Parallel Anisotropic Tetrahedral Adaptation
NASA Technical Reports Server (NTRS)
Park, Michael A.; Darmofal, David L.
2008-01-01
An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.
Homology, convergence and parallelism
Ghiselin, Michael T.
2016-01-01
Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721
Parallel Subconvolution Filtering Architectures
NASA Technical Reports Server (NTRS)
Gray, Andrew A.
2003-01-01
These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.
Adapting implicit methods to parallel processors
Reeves, L.; McMillin, B.; Okunbor, D.; Riggins, D.
1994-12-31
When numerically solving many types of partial differential equations, it is advantageous to use implicit methods because of their better stability and more flexible parameter choice, (e.g. larger time steps). However, since implicit methods usually require simultaneous knowledge of the entire computational domain, these methods axe difficult to implement directly on distributed memory parallel processors. This leads to infrequent use of implicit methods on parallel/distributed systems. The usual implementation of implicit methods is inefficient due to the nature of parallel systems where it is common to take the computational domain and distribute the grid points over the processors so as to maintain a relatively even workload per processor. This creates a problem at the locations in the domain where adjacent points are not on the same processor. In order for the values at these points to be calculated, messages have to be exchanged between the corresponding processors. Without special adaptation, this will result in idle processors during part of the computation, and as the number of idle processors increases, the lower the effective speed improvement by using a parallel processor.
Xyce parallel electronic simulator.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Parallel Computing in Optimization.
1984-10-01
include : Heller [1978] and Sameh [1977] (surveys of algorithms), Duff [1983], Fong and Jordan [1977]. Jordan [1979]. and Rodrigue [1982] (all mainly...constrained concave function by partition of feasible domain", Mathematics of Operations Research 8, pp. A. Sameh [1977, "Numerical parallel algorithms...a survey", in High Speed Computer and Algorithm Organization, D. Kuck, D. Lawrie, and A. Sameh , eds., Academic Press, pp. 207-228. 1,. J. Siegel
2013-09-01
C en te r Paul R. Eller , Jing-Ru C. Cheng, Aaron R. Byrd, Charles W. Downer, and Nawa Pradhan September 2013 Approved for public release...Program ERDC TR-13-8 September 2013 Development of Parallel GSSHA Paul R. Eller and Jing-Ru C. Cheng Information Technology Laboratory US Army Engineer...5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Paul Eller , Ruth Cheng, Aaron Byrd, Chuck Downer, and Nawa Pradhan 5d. PROJECT NUMBER
Parallel unstructured grid generation
NASA Technical Reports Server (NTRS)
Loehner, Rainald; Camberos, Jose; Merriam, Marshal
1991-01-01
A parallel unstructured grid generation algorithm is presented and implemented on the Hypercube. Different processor hierarchies are discussed, and the appropraite hierarchies for mesh generation and mesh smoothing are selected. A domain-splitting algorithm for unstructured grids which tries to minimize the surface-to-volume ratio of each subdomain is described. This splitting algorithm is employed both for grid generation and grid smoothing. Results obtained on the Hypercube demonstrate the effectiveness of the algorithms developed.
Implementation of Parallel Algorithms
1993-06-30
their socia ’ relations or to achieve some goals. For example, we define a pair-wise force law of i epulsion and attraction for a group of identical...quantization based compression schemes. Photo-refractive crystals, which provide high density recording in real time, are used as our holographic media . The...of Parallel Algorithms (J. Reif, ed.). Kluwer Academic Pu’ ishers, 1993. (4) "A Dynamic Separator Algorithm", D. Armon and J. Reif. To appear in
Extendability of parallel sections in vector bundles
NASA Astrophysics Data System (ADS)
Kirschner, Tim
2016-01-01
I address the following question: Given a differentiable manifold M, what are the open subsets U of M such that, for all vector bundles E over M and all linear connections ∇ on E, any ∇-parallel section in E defined on U extends to a ∇-parallel section in E defined on M? For simply connected manifolds M (among others) I describe the entirety of all such sets U which are, in addition, the complement of a C1 submanifold, boundary allowed, of M. This delivers a partial positive answer to a problem posed by Antonio J. Di Scala and Gianni Manno (2014). Furthermore, in case M is an open submanifold of Rn, n ≥ 2, I prove that the complement of U in M, not required to be a submanifold now, can have arbitrarily large n-dimensional Lebesgue measure.
Trajectory optimization using parallel shooting method on parallel computer
Wirthman, D.J.; Park, S.Y.; Vadali, S.R.
1995-03-01
The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.
Resistor Combinations for Parallel Circuits.
ERIC Educational Resources Information Center
McTernan, James P.
1978-01-01
To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)
Status of TRANSP Parallel Services
NASA Astrophysics Data System (ADS)
Indireshkumar, K.; Andre, Robert; McCune, Douglas; Randerson, Lewis
2006-10-01
The PPPL TRANSP code suite has been used successfully over many years to carry out time dependent simulations of tokamak plasmas. However, accurately modeling certain phenomena such as RF heating and fast ion behavior using TRANSP requires extensive computational power and will benefit from parallelization. Parallelizing all of TRANSP is not required and parts will run sequentially while other parts run parallelized. To efficiently use a site's parallel services, the parallelized TRANSP modules are deployed to a shared ``parallel service'' on a separate cluster. The PPPL Monte Carlo fast ion module NUBEAM and the MIT RF module TORIC are the first TRANSP modules to be so deployed. This poster will show the performance scaling of these modules within the parallel server. Communications between the serial client and the parallel server will be described in detail, and measurements of startup and communications overhead will be shown. Physics modeling benefits for TRANSP users will be assessed.
Asynchronous interpretation of parallel microprograms
Bandman, O.L.
1984-03-01
In this article, the authors demonstrate how to pass from a given synchronous interpretation of a parallel microprogram to an equivalent asynchronous interpretation, and investigate the cost associated with the rejection of external synchronization in parallel microprogram structures.
The Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.
Resistor Combinations for Parallel Circuits.
ERIC Educational Resources Information Center
McTernan, James P.
1978-01-01
To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)
Evidence for parallel elongated structures in the mesosphere
NASA Technical Reports Server (NTRS)
Adams, G. W.; Brosnahan, J. W.; Walden, D. C.
1983-01-01
The physical cause of partial reflection from the mesosphere is of interest. Data are presented from an image-forming radar at Brighton, Colorado, that suggest that some of the radar scattering is caused by parallel elongated structures lying almost directly overhead. Possible physical sources for such structures include gravity waves and roll vortices.
The Structure of Parallel Algorithms.
1979-08-01
parallel architectures and parallel algorithms see [Anderson and Jensen 75, Stone 75, Kung 76, Enslow 77, Kuck 77, Ramamoorthy and Li 77, Sameh 77, Heller...the Routing Time on a Parallel Computer with a Fixed Interconnection Network, In Kuck., D. J., Lawrie, D.H. and Sameh , A.H., editor, High Speed...Letters 5(4):107-112, October 1976. [ Sameh 77] Sameh , A.H. Numerical Parallel Algorithms -- A Survey. In Hifh Speed Computer and AlgorLthm Organization
Parallel Debugging Using Graphical Views
1988-03-01
Voyeur , a prototype system for creating graphical views of parallel programs, provid(s a cost-effective way to construct such views for any parallel...programming system. We illustrate Voyeur by discussing four views created for debugging Poker programs. One is a vteneral trace facility for any Poker...Graphical views are essential for debugging parallel programs because of the large quan- tity of state information contained in parallel programs. Voyeur
Parallel Pascal - An extended Pascal for parallel computers
NASA Technical Reports Server (NTRS)
Reeves, A. P.
1984-01-01
Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.
Parallel Pascal - An extended Pascal for parallel computers
NASA Technical Reports Server (NTRS)
Reeves, A. P.
1984-01-01
Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.
Shendure, Jay; Fields, Stanley
2016-06-01
Human genetics has historically depended on the identification of individuals whose natural genetic variation underlies an observable trait or disease risk. Here we argue that new technologies now augment this historical approach by allowing the use of massively parallel assays in model systems to measure the functional effects of genetic variation in many human genes. These studies will help establish the disease risk of both observed and potential genetic variants and to overcome the problem of "variants of uncertain significance." Copyright © 2016 by the Genetics Society of America.
Parallel Eclipse Project Checkout
NASA Technical Reports Server (NTRS)
Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.
2011-01-01
Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.
Roo: A parallel theorem prover
Lusk, E.L.; McCune, W.W.; Slaney, J.K.
1991-11-01
We describe a parallel theorem prover based on the Argonne theorem-proving system OTTER. The parallel system, called Roo, runs on shared-memory multiprocessors such as the Sequent Symmetry. We explain the parallel algorithm used and give performance results that demonstrate near-linear speedups on large problems.
CSM parallel structural methods research
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1989-01-01
Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Benchmarking massively parallel architectures
Lubeck, O.; Moore, J.; Simmons, M.; Wasserman, H.
1993-01-01
The purpose of this paper is to summarize some initial experiences related to measuring the performance of massively parallel processors (MPPs) at Los Alamos National Laboratory (LANL). Actually, the range of MPP architectures the authors have used is rather limited, being confined mostly to the Thinking Machines Corporation (TMC) Connection Machine CM-2 and CM-5. Some very preliminary work has been carried out on the Kendall Square KSR-1, and efforts related to other machines, such as the Intel Paragon and the soon-to-be-released CRAY T3D are planned. This paper will concentrate more on methodology rather than discuss specific architectural strengths and weaknesses; the latter is expected to be the subject of future reports. MPP benchmarking is a field in critical need of structure and definition. As the authors have stated previously, such machines have enormous potential, and there is certainly a dire need for orders of magnitude computational power over current supercomputers. However, performance reports for MPPs must emphasize actual sustainable performance from real applications in a careful, responsible manner. Such has not always been the case. A recent paper has described in some detail, the problem of potentially misleading performance reporting in the parallel scientific computing field. Thus, in this paper, the authors briefly offer a few general ideas on MPP performance analysis.
Parallelizing quantum circuit synthesis
NASA Astrophysics Data System (ADS)
Di Matteo, Olivia; Mosca, Michele
2016-03-01
Quantum circuit synthesis is the process in which an arbitrary unitary operation is decomposed into a sequence of gates from a universal set, typically one which a quantum computer can implement both efficiently and fault-tolerantly. As physical implementations of quantum computers improve, the need is growing for tools that can effectively synthesize components of the circuits and algorithms they will run. Existing algorithms for exact, multi-qubit circuit synthesis scale exponentially in the number of qubits and circuit depth, leaving synthesis intractable for circuits on more than a handful of qubits. Even modest improvements in circuit synthesis procedures may lead to significant advances, pushing forward the boundaries of not only the size of solvable circuit synthesis problems, but also in what can be realized physically as a result of having more efficient circuits. We present a method for quantum circuit synthesis using deterministic walks. Also termed pseudorandom walks, these are walks in which once a starting point is chosen, its path is completely determined. We apply our method to construct a parallel framework for circuit synthesis, and implement one such version performing optimal T-count synthesis over the Clifford+T gate set. We use our software to present examples where parallelization offers a significant speedup on the runtime, as well as directly confirm that the 4-qubit 1-bit full adder has optimal T-count 7 and T-depth 3.
Parallel Eigenvalue extraction
NASA Technical Reports Server (NTRS)
Akl, Fred A.
1989-01-01
A new numerical algorithm for the solution of large-order eigenproblems typically encountered in linear elastic finite element systems is presented. The architecture of parallel processing is utilized in the algorithm to achieve increased speed and efficiency of calculations. The algorithm is based on the frontal technique for the solution of linear simultaneous equations and the modified subspace eigenanalysis method for the solution of the eigenproblem. Assembly, elimination and back-substitution of degrees of freedom are performed concurrently, using a number of fronts. All fronts converge to and diverge from a predefined global front during elimination and back-substitution, respectively. In the meantime, reduction of the stiffness and mass matrices required by the modified subspace method can be completed during the convergence/divergence cycle and an estimate of the required eigenpairs obtained. Successive cycles of convergence and divergence are repeated until the desired accuracy of calculations is achieved. The advantages of this new algorithm in parallel computer architecture are discussed.
Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G
2007-04-11
The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.
Parallel ptychographic reconstruction
Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris
2014-01-01
Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174
Parallel ptychographic reconstruction
Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris
2014-12-19
Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.
Applied Parallel Metadata Indexing
Jacobi, Michael R
2012-08-01
The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.
NASA Astrophysics Data System (ADS)
Olmedo, Oscar; Zhang, J.
2010-05-01
Flux ropes are now generally accepted to be the magnetic configuration of Coronal Mass Ejections (CMEs), which may be formed prior or during solar eruptions. In this study, we model the flux rope as a current-carrying partial torus loop with its two footpoints anchored in the photosphere, and investigate its instability in the context of the torus instability (TI). Previous studies on TI have focused on the configuration of a circular torus and revealed the existence of a critical decay index. Our study reveals that the critical index is a function of the fractional number of the partial torus, defined by the ratio between the arc length of the partial torus above the photosphere and the circumference of a circular torus of equal radius. We refer to this finding the partial torus instability (PTI). It is found that a partial torus with a smaller fractional number has a smaller critical index, thus requiring a more gradually decreasing magnetic field to stabilize the flux rope. On the other hand, the partial torus with a larger fractional number has a larger critical index. In the limit of a circular torus when the fractional number approaches one, the critical index goes to a maximum value that depends on the distribution of the external magnetic field. We demonstrate that the partial torus instability helps us to understand the confinement, growth, and eventual eruption of a flux rope CME.
A systolic array parallelizing compiler
Tseng, P.S. )
1990-01-01
This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.
DeHart, Mark D; Williams, Mark L; Bowman, Stephen M
2010-01-01
The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement
Twisted partially pure spinors
NASA Astrophysics Data System (ADS)
Herrera, Rafael; Tellez, Ivan
2016-08-01
Motivated by the relationship between orthogonal complex structures and pure spinors, we define twisted partially pure spinors in order to characterize spinorially subspaces of Euclidean space endowed with a complex structure.
Parallel Polarization State Generation
NASA Astrophysics Data System (ADS)
She, Alan; Capasso, Federico
2016-05-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Parallel Polarization State Generation.
She, Alan; Capasso, Federico
2016-05-17
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Toward Parallel Document Clustering
Mogill, Jace A.; Haglin, David J.
2011-09-01
A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.
Parallel tridiagonal equation solvers
NASA Technical Reports Server (NTRS)
Stone, H. S.
1974-01-01
Three parallel algorithms were compared for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC 4 and CDC STAR. For array computers similar to ILLIAC 4, cyclic odd-even reduction has the least operation count for highly structured sets of equations, and recursive doubling has the least count for relatively unstructured sets of equations. Since the difference in operation counts for these two algorithms is not substantial, their relative running times may be more related to overhead operations, which are not measured in this paper. The third algorithm, based on Buneman's Poisson solver, has more arithmetic operations than the others, and appears to be the least favorable. For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases.
Parallel Polarization State Generation
She, Alan; Capasso, Federico
2016-01-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813
Partially coherent nonparaxial beams.
Duan, Kailiang; Lü, Baida
2004-04-15
The concept of a partially coherent nonparaxial beam is proposed. A closed-form expression for the propagation of nonparaxial Gaussian Schell model (GSM) beams in free space is derived and applied to study the propagation properties of nonparaxial GSM beams. It is shown that for partially coherent nonparaxial beams a new parameter f(sigma) has to be introduced, which together with the parameter f, determines the beam nonparaxiality.
Parallel imaging microfluidic cytometer.
Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching
2011-01-01
By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take.
Olmedo, Oscar; Zhang Jie
2010-07-20
Flux ropes are now generally accepted to be the magnetic configuration of coronal mass ejections (CMEs), which may be formed prior to or during solar eruptions. In this study, we model the flux rope as a current-carrying partial torus loop with its two footpoints anchored in the photosphere, and investigate its stability in the context of the torus instability (TI). Previous studies on TI have focused on the configuration of a circular torus and revealed the existence of a critical decay index of the overlying constraining magnetic field. Our study reveals that the critical index is a function of the fractional number of the partial torus, defined by the ratio between the arc length of the partial torus above the photosphere and the circumference of a circular torus of equal radius. We refer to this finding as the partial torus instability (PTI). It is found that a partial torus with a smaller fractional number has a smaller critical index, thus requiring a more gradually decreasing magnetic field to stabilize the flux rope. On the other hand, a partial torus with a larger fractional number has a larger critical index. In the limit of a circular torus when the fractional number approaches 1, the critical index goes to a maximum value. We demonstrate that the PTI helps us to understand the confinement, growth, and eventual eruption of a flux-rope CME.
NASA Astrophysics Data System (ADS)
Olmedo, Oscar; Zhang, Jie
2010-07-01
Flux ropes are now generally accepted to be the magnetic configuration of coronal mass ejections (CMEs), which may be formed prior to or during solar eruptions. In this study, we model the flux rope as a current-carrying partial torus loop with its two footpoints anchored in the photosphere, and investigate its stability in the context of the torus instability (TI). Previous studies on TI have focused on the configuration of a circular torus and revealed the existence of a critical decay index of the overlying constraining magnetic field. Our study reveals that the critical index is a function of the fractional number of the partial torus, defined by the ratio between the arc length of the partial torus above the photosphere and the circumference of a circular torus of equal radius. We refer to this finding as the partial torus instability (PTI). It is found that a partial torus with a smaller fractional number has a smaller critical index, thus requiring a more gradually decreasing magnetic field to stabilize the flux rope. On the other hand, a partial torus with a larger fractional number has a larger critical index. In the limit of a circular torus when the fractional number approaches 1, the critical index goes to a maximum value. We demonstrate that the PTI helps us to understand the confinement, growth, and eventual eruption of a flux-rope CME.
A parallel programming environment supporting multiple data-parallel modules
Seevers, B.K.; Quinn, M.J. ); Hatcher, P.J. )
1992-10-01
We describe a system that allows programmers to take advantage of both control and data parallelism through multiple intercommunicating data-parallel modules. This programming environment extends C-type stream I/O to include intermodule communication channels. The progammer writes each module as a separate data-parallel program, then develops a channel linker specification describing how to connect the modules together. A channel linker we have developed loads the separate modules on the parallel machine and binds the communication channels together as specified. We present performance data that demonstrates a mixed control- and data-parallel solution can yield better performance than a strictly data-parallel solution. The system described currently runs on the Intel iWarp multicomputer.
Combinatorial parallel and scientific computing.
Pinar, Ali; Hendrickson, Bruce Alan
2005-04-01
Combinatorial algorithms have long played a pivotal enabling role in many applications of parallel computing. Graph algorithms in particular arise in load balancing, scheduling, mapping and many other aspects of the parallelization of irregular applications. These are still active research areas, mostly due to evolving computational techniques and rapidly changing computational platforms. But the relationship between parallel computing and discrete algorithms is much richer than the mere use of graph algorithms to support the parallelization of traditional scientific computations. Important, emerging areas of science are fundamentally discrete, and they are increasingly reliant on the power of parallel computing. Examples include computational biology, scientific data mining, and network analysis. These applications are changing the relationship between discrete algorithms and parallel computing. In addition to their traditional role as enablers of high performance, combinatorial algorithms are now customers for parallel computing. New parallelization techniques for combinatorial algorithms need to be developed to support these nontraditional scientific approaches. This chapter will describe some of the many areas of intersection between discrete algorithms and parallel scientific computing. Due to space limitations, this chapter is not a comprehensive survey, but rather an introduction to a diverse set of techniques and applications with a particular emphasis on work presented at the Eleventh SIAM Conference on Parallel Processing for Scientific Computing. Some topics highly relevant to this chapter (e.g. load balancing) are addressed elsewhere in this book, and so we will not discuss them here.
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Lau, Sonie
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Parallel processor engine model program
NASA Technical Reports Server (NTRS)
Mclaughlin, P.
1984-01-01
The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.
Trajectories in parallel optics.
Klapp, Iftach; Sochen, Nir; Mendlovic, David
2011-10-01
In our previous work we showed the ability to improve the optical system's matrix condition by optical design, thereby improving its robustness to noise. It was shown that by using singular value decomposition, a target point-spread function (PSF) matrix can be defined for an auxiliary optical system, which works parallel to the original system to achieve such an improvement. In this paper, after briefly introducing the all optics implementation of the auxiliary system, we show a method to decompose the target PSF matrix. This is done through a series of shifted responses of auxiliary optics (named trajectories), where a complicated hardware filter is replaced by postprocessing. This process manipulates the pixel confined PSF response of simple auxiliary optics, which in turn creates an auxiliary system with the required PSF matrix. This method is simulated on two space variant systems and reduces their system condition number from 18,598 to 197 and from 87,640 to 5.75, respectively. We perform a study of the latter result and show significant improvement in image restoration performance, in comparison to a system without auxiliary optics and to other previously suggested hybrid solutions. Image restoration results show that in a range of low signal-to-noise ratio values, the trajectories method gives a significant advantage over alternative approaches. A third space invariant study case is explored only briefly, and we present a significant improvement in the matrix condition number from 1.9160e+013 to 34,526.
High Performance Parallel Architectures
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek; Kaewpijit, Sinthop
1998-01-01
Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.
High Performance Parallel Architectures
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek; Kaewpijit, Sinthop
1998-01-01
Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.
Parallel Programming in the Age of Ubiquitous Parallelism
NASA Astrophysics Data System (ADS)
Pingali, Keshav
2014-04-01
Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs
Methanol partial oxidation reformer
Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael
1999-01-01
A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.
Methanol partial oxidation reformer
Ahmed, S.; Kumar, R.; Krumpelt, M.
1999-08-17
A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.
Methanol partial oxidation reformer
Ahmed, S.; Kumar, R.; Krumpelt, M.
1999-08-24
A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.
Methanol partial oxidation reformer
Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael
2001-01-01
A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.
Oxygen partial pressure sensor
Dees, Dennis W.
1994-01-01
A method for detecting oxygen partial pressure and an oxygen partial pressure sensor are provided. The method for measuring oxygen partial pressure includes contacting oxygen to a solid oxide electrolyte and measuring the subsequent change in electrical conductivity of the solid oxide electrolyte. A solid oxide electrolyte is utilized that contacts both a porous electrode and a nonporous electrode. The electrical conductivity of the solid oxide electrolyte is affected when oxygen from an exhaust stream permeates through the porous electrode to establish an equilibrium of oxygen anions in the electrolyte, thereby displacing electrons throughout the electrolyte to form an electron gradient. By adapting the two electrodes to sense a voltage potential between them, the change in electrolyte conductivity due to oxygen presence can be measured.
Oxygen partial pressure sensor
Dees, D.W.
1994-09-06
A method for detecting oxygen partial pressure and an oxygen partial pressure sensor are provided. The method for measuring oxygen partial pressure includes contacting oxygen to a solid oxide electrolyte and measuring the subsequent change in electrical conductivity of the solid oxide electrolyte. A solid oxide electrolyte is utilized that contacts both a porous electrode and a nonporous electrode. The electrical conductivity of the solid oxide electrolyte is affected when oxygen from an exhaust stream permeates through the porous electrode to establish an equilibrium of oxygen anions in the electrolyte, thereby displacing electrons throughout the electrolyte to form an electron gradient. By adapting the two electrodes to sense a voltage potential between them, the change in electrolyte conductivity due to oxygen presence can be measured. 1 fig.
Methanol partial oxidation reformer
Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael
1999-01-01
A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.
Partial Arc Curvilinear Direct Drive Servomotor
NASA Technical Reports Server (NTRS)
Sun, Xiuhong (Inventor)
2014-01-01
A partial arc servomotor assembly having a curvilinear U-channel with two parallel rare earth permanent magnet plates facing each other and a pivoted ironless three phase coil armature winding moves between the plates. An encoder read head is fixed to a mounting plate above the coil armature winding and a curvilinear encoder scale is curved to be co-axis with the curvilinear U-channel permanent magnet track formed by the permanent magnet plates. Driven by a set of miniaturized power electronics devices closely looped with a positioning feedback encoder, the angular position and velocity of the pivoted payload is programmable and precisely controlled.
Partially strong WW scattering
Cheung Kingman; Chiang Chengwei; Yuan Tzuchiang
2008-09-01
What if only a light Higgs boson is discovered at the CERN LHC? Conventional wisdom tells us that the scattering of longitudinal weak gauge bosons would not grow strong at high energies. However, this is generally not true. In some composite models or general two-Higgs-doublet models, the presence of a light Higgs boson does not guarantee complete unitarization of the WW scattering. After partial unitarization by the light Higgs boson, the WW scattering becomes strongly interacting until it hits one or more heavier Higgs bosons or other strong dynamics. We analyze how LHC experiments can reveal this interesting possibility of partially strong WW scattering.
Partially orthogonal resonators for magnetic resonance imaging
NASA Astrophysics Data System (ADS)
Chacon-Caldera, Jorge; Malzacher, Matthias; Schad, Lothar R.
2017-02-01
Resonators for signal reception in magnetic resonance are traditionally planar to restrict coil material and avoid coil losses. Here, we present a novel concept to model resonators partially in a plane with maximum sensitivity to the magnetic resonance signal and partially in an orthogonal plane with reduced signal sensitivity. Thus, properties of individual elements in coil arrays can be modified to optimize physical planar space and increase the sensitivity of the overall array. A particular case of the concept is implemented to decrease H-field destructive interferences in planar concentric in-phase arrays. An increase in signal to noise ratio of approximately 20% was achieved with two resonators placed over approximately the same planar area compared to common approaches at a target depth of 10 cm at 3 Tesla. Improved parallel imaging performance of this configuration is also demonstrated. The concept can be further used to increase coil density.
Partially orthogonal resonators for magnetic resonance imaging
Chacon-Caldera, Jorge; Malzacher, Matthias; Schad, Lothar R.
2017-01-01
Resonators for signal reception in magnetic resonance are traditionally planar to restrict coil material and avoid coil losses. Here, we present a novel concept to model resonators partially in a plane with maximum sensitivity to the magnetic resonance signal and partially in an orthogonal plane with reduced signal sensitivity. Thus, properties of individual elements in coil arrays can be modified to optimize physical planar space and increase the sensitivity of the overall array. A particular case of the concept is implemented to decrease H-field destructive interferences in planar concentric in-phase arrays. An increase in signal to noise ratio of approximately 20% was achieved with two resonators placed over approximately the same planar area compared to common approaches at a target depth of 10 cm at 3 Tesla. Improved parallel imaging performance of this configuration is also demonstrated. The concept can be further used to increase coil density. PMID:28186135
Partially orthogonal resonators for magnetic resonance imaging.
Chacon-Caldera, Jorge; Malzacher, Matthias; Schad, Lothar R
2017-02-10
Resonators for signal reception in magnetic resonance are traditionally planar to restrict coil material and avoid coil losses. Here, we present a novel concept to model resonators partially in a plane with maximum sensitivity to the magnetic resonance signal and partially in an orthogonal plane with reduced signal sensitivity. Thus, properties of individual elements in coil arrays can be modified to optimize physical planar space and increase the sensitivity of the overall array. A particular case of the concept is implemented to decrease H-field destructive interferences in planar concentric in-phase arrays. An increase in signal to noise ratio of approximately 20% was achieved with two resonators placed over approximately the same planar area compared to common approaches at a target depth of 10 cm at 3 Tesla. Improved parallel imaging performance of this configuration is also demonstrated. The concept can be further used to increase coil density.
Parallel Computational Protein Design
Zhou, Yichao; Donald, Bruce R.; Zeng, Jianyang
2016-01-01
Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab [1] to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE [2] and DEEPer [3] to also consider continuous backbone and side-chain flexibility. PMID:27914056
Net radiation method for enclosure systems involving partially transparent walls
NASA Technical Reports Server (NTRS)
Siegel, R.
1973-01-01
The net radiation method is developed for analyzing radiation heat transfer in enclosure systems involving partially transparent walls. One such system is an enclosure with windows in it. The conventional net radiation method was developed for enclosures having opaque walls. If a partially transparent wall is present, it will permit radiation to enter and leave the enclosure. The net radiation equations are developed here for gray and semigray enclosures with one or more windows. Another system of interest, such as in a flat plate solar collector, consists of a series of parallel transparent layers. The transmission characteristics of such window systems are obtained by the net radiation method, and the technique appears to be more convenient than the ray tracing method which has been used in the past. Relations are developed for windows consisting of any number of parallel layers having differing absorption coefficients and differing surface reflectivities, and for systems composed of parallel transmitting layers and opaque plates.
Sequential and Parallel Matrix Computations.
1985-11-01
Theory" published by the American Math Society. (C) Jointly with A. Sameh of University of Illinois, a parallel algorithm for the single-input pole...an M.Sc. thesis at Northern Illinois University by Ava Chun and, the results were compared with parallel Q-R algorithm of Sameh and Kuck and the
Parallel pseudospectral domain decomposition techniques
NASA Technical Reports Server (NTRS)
Gottlieb, David; Hirsh, Richard S.
1988-01-01
The influence of interface boundary conditions on the ability to parallelize pseudospectral multidomain algorithms is investigated. Using the properties of spectral expansions, a novel parallel two domain procedure is generalized to an arbitrary number of domains each of which can be solved on a separate processor. This interface boundary condition considerably simplifies influence matrix techniques.
Parallel pseudospectral domain decomposition techniques
NASA Technical Reports Server (NTRS)
Gottlieb, David; Hirsch, Richard S.
1989-01-01
The influence of interface boundary conditions on the ability to parallelize pseudospectral multidomain algorithms is investigated. Using the properties of spectral expansions, a novel parallel two domain procedure is generalized to an arbitrary number of domains each of which can be solved on a separate processor. This interface boundary condition considerably simplifies influence matrix techniques.
A Parallel Particle Swarm Optimizer
2003-01-01
by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based...concurrent computation. The parallelization of the Particle Swarm Optimization (PSO) algorithm is detailed and its performance and characteristics demonstrated for the biomechanical system identification problem as example.
NASA Technical Reports Server (NTRS)
Capps, Stephen; Lorandos, Jason; Akhidime, Eval; Bunch, Michael; Lund, Denise; Moore, Nathan; Murakawa, Kiosuke
1989-01-01
The purpose of this study is to investigate comprehensive design requirements associated with designing habitats for humans in a partial gravity environment, then to apply them to a lunar base design. Other potential sites for application include planetary surfaces such as Mars, variable-gravity research facilities, and a rotating spacecraft. Design requirements for partial gravity environments include locomotion changes in less than normal earth gravity; facility design issues, such as interior configuration, module diameter, and geometry; and volumetric requirements based on the previous as well as psychological issues involved in prolonged isolation. For application to a lunar base, it is necessary to study the exterior architecture and configuration to insure optimum circulation patterns while providing dual egress; radiation protection issues are addressed to provide a safe and healthy environment for the crew; and finally, the overall site is studied to locate all associated facilities in context with the habitat. Mission planning is not the purpose of this study; therefore, a Lockheed scenario is used as an outline for the lunar base application, which is then modified to meet the project needs. The goal of this report is to formulate facts on human reactions to partial gravity environments, derive design requirements based on these facts, and apply the requirements to a partial gravity situation which, for this study, was a lunar base.
Hashmi, Syed; Walter, John; Smith, Wendy; Latis, Sergios
2004-01-01
Swallowed or inhaled partial dentures can present a diagnostic challenge. Three new cases are described, one of them near-fatal because of vascular erosion and haemorrhage. The published work points to the importance of good design and proper maintenance. The key to early recognition is awareness of the hazard by denture-wearers, carers and clinicians. PMID:14749401
Dilemmas of partial cooperation.
Stark, Hans-Ulrich
2010-08-01
Related to the often applied cooperation models of social dilemmas, we deal with scenarios in which defection dominates cooperation, but an intermediate fraction of cooperators, that is, "partial cooperation," would maximize the overall performance of a group of individuals. Of course, such a solution comes at the expense of cooperators that do not profit from the overall maximum. However, because there are mechanisms accounting for mutual benefits after repeated interactions or through evolutionary mechanisms, such situations can constitute "dilemmas" of partial cooperation. Among the 12 ordinally distinct, symmetrical 2 x 2 games, three (barely considered) variants are correspondents of such dilemmas. Whereas some previous studies investigated particular instances of such games, we here provide the unifying framework and concisely relate it to the broad literature on cooperation in social dilemmas. Complementing our argumentation, we study the evolution of partial cooperation by deriving the respective conditions under which coexistence of cooperators and defectors, that is, partial cooperation, can be a stable outcome of evolutionary dynamics in these scenarios. Finally, we discuss the relevance of such models for research on the large biodiversity and variation in cooperative efforts both in biological and social systems.
NASA Technical Reports Server (NTRS)
Title, A. M. (Inventor)
1978-01-01
A birefringent filter module comprises, in seriatum. (1) an entrance polarizer, (2) a first birefringent crystal responsive to optical energy exiting the entrance polarizer, (3) a partial polarizer responsive to optical energy exiting the first polarizer, (4) a second birefringent crystal responsive to optical energy exiting the partial polarizer, and (5) an exit polarizer. The first and second birefringent crystals have fast axes disposed + or -45 deg from the high transmitivity direction of the partial polarizer. Preferably, the second crystal has a length 1/2 that of the first crystal and the high transmitivity direction of the partial polarizer is nine times as great as the low transmitivity direction. To provide tuning, the polarizations of the energy entering the first crystal and leaving the second crystal are varied by either rotating the entrance and exit polarizers, or by sandwiching the entrance and exit polarizers between pairs of half wave plates that are rotated relative to the polarizers. A plurality of the filter modules may be cascaded.
Partial wave analysis using graphics processing units
NASA Astrophysics Data System (ADS)
Berger, Niklaus; Beijiang, Liu; Jike, Wang
2010-04-01
Partial wave analysis is an important tool for determining resonance properties in hadron spectroscopy. For large data samples however, the un-binned likelihood fits employed are computationally very expensive. At the Beijing Spectrometer (BES) III experiment, an increase in statistics compared to earlier experiments of up to two orders of magnitude is expected. In order to allow for a timely analysis of these datasets, additional computing power with short turnover times has to be made available. It turns out that graphics processing units (GPUs) originally developed for 3D computer games have an architecture of massively parallel single instruction multiple data floating point units that is almost ideally suited for the algorithms employed in partial wave analysis. We have implemented a framework for tensor manipulation and partial wave fits called GPUPWA. The user writes a program in pure C++ whilst the GPUPWA classes handle computations on the GPU, memory transfers, caching and other technical details. In conjunction with a recent graphics processor, the framework provides a speed-up of the partial wave fit by more than two orders of magnitude compared to legacy FORTRAN code.
Parallel contingency statistics with Titan.
Thompson, David C.; Pebay, Philippe Pierre
2009-09-01
This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.
The Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Problem size, parallel architecture and optimal speedup
NASA Technical Reports Server (NTRS)
Nicol, David M.; Willard, Frank H.
1987-01-01
The communication and synchronization overhead inherent in parallel processing can lead to situations where adding processors to the solution method actually increases execution time. Problem type, problem size, and architecture type all affect the optimal number of processors to employ. The numerical solution of an elliptic partial differential equation is examined in order to study the relationship between problem size and architecture. The equation's domain is discretized into n sup 2 grid points which are divided into partitions and mapped onto the individual processor memories. The relationships between grid size, stencil type, partitioning strategy, processor execution time, and communication network type are analytically quantified. In so doing, the optimal number of processors was determined to assign to the solution, and identified (1) the smallest grid size which fully benefits from using all available processors, (2) the leverage on performance given by increasing processor speed or communication network speed, and (3) the suitability of various architectures for large numerical problems.
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Lau, Sonie; Yan, Jerry C.
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.
Parallel NPARC: Implementation and Performance
NASA Technical Reports Server (NTRS)
Townsend, S. E.
1996-01-01
Version 3 of the NPARC Navier-Stokes code includes support for large-grain (block level) parallelism using explicit message passing between a heterogeneous collection of computers. This capability has the potential for significant performance gains, depending upon the block data distribution. The parallel implementation uses a master/worker arrangement of processes. The master process assigns blocks to workers, controls worker actions, and provides remote file access for the workers. The processes communicate via explicit message passing using an interface library which provides portability to a number of message passing libraries, such as PVM (Parallel Virtual Machine). A Bourne shell script is used to simplify the task of selecting hosts, starting processes, retrieving remote files, and terminating a computation. This script also provides a simple form of fault tolerance. An analysis of the computational performance of NPARC is presented, using data sets from an F/A-18 inlet study and a Rocket Based Combined Cycle Engine analysis. Parallel speedup and overall computational efficiency were obtained for various NPARC run parameters on a cluster of IBM RS6000 workstations. The data show that although NPARC performance compares favorably with the estimated potential parallelism, typical data sets used with previous versions of NPARC will often need to be reblocked for optimum parallel performance. In one of the cases studied, reblocking increased peak parallel speedup from 3.2 to 11.8.
Parallel processing for control applications
Telford, J. W.
2001-01-01
Parallel processing has been a topic of discussion in computer science circles for decades. Using more than one single computer to control a process has many advantages that compensate for the additional cost. Initially multiple computers were used to attain higher speeds. A single cpu could not perform all of the operations necessary for real time operation. As technology progressed and cpu's became faster, the speed issue became less significant. The additional processing capabilities however continue to make high speeds an attractive element of parallel processing. Another reason for multiple processors is reliability. For the purpose of this discussion, reliability and robustness will be the focal paint. Most contemporary conceptions of parallel processing include visions of hundreds of single computers networked to provide 'computing power'. Indeed our own teraflop machines are built from large numbers of computers configured in a network (and thus limited by the network). There are many approaches to parallel configfirations and this presentation offers something slightly different from the contemporary networked model. In the world of embedded computers, which is a pervasive force in contemporary computer controls, there are many single chip computers available. If one backs away from the PC based parallel computing model and considers the possibilities of a parallel control device based on multiple single chip computers, a new area of possibilities becomes apparent. This study will look at the use of multiple single chip computers in a parallel configuration with emphasis placed on maximum reliability.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens; Inglett, Todd Alan
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS
F. PETRINI; W. FENG
1999-09-01
We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.
Parallel integer sorting with medium and fine-scale parallelism
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Partially coherent ultrafast spectrography
Bourassin-Bouchet, C.; Couprie, M.-E.
2015-01-01
Modern ultrafast metrology relies on the postulate that the pulse to be measured is fully coherent, that is, that it can be completely described by its spectrum and spectral phase. However, synthesizing fully coherent pulses is not always possible in practice, especially in the domain of emerging ultrashort X-ray sources where temporal metrology is strongly needed. Here we demonstrate how frequency-resolved optical gating (FROG), the first and one of the most widespread techniques for pulse characterization, can be adapted to measure partially coherent pulses even down to the attosecond timescale. No modification of experimental apparatuses is required; only the processing of the measurement changes. To do so, we take our inspiration from other branches of physics where partial coherence is routinely dealt with, such as quantum optics and coherent diffractive imaging. This will have important and immediate applications, such as enabling the measurement of X-ray free-electron laser pulses despite timing jitter. PMID:25744080
Laparoscopic partial splenic resection.
Uranüs, S; Pfeifer, J; Schauer, C; Kronberger, L; Rabl, H; Ranftl, G; Hauser, H; Bahadori, K
1995-04-01
Twenty domestic pigs with an average weight of 30 kg were subjected to laparoscopic partial splenic resection with the aim of determining the feasibility, reliability, and safety of this procedure. Unlike the human spleen, the pig spleen is perpendicular to the body's long axis, and it is long and slender. The parenchyma was severed through the middle third, where the organ is thickest. An 18-mm trocar with a 60-mm Endopath linear cutter was used for the resection. The tissue was removed with a 33-mm trocar. The operation was successfully concluded in all animals. No capsule tears occurred as a result of applying the stapler. Optimal hemostasis was achieved on the resected edges in all animals. Although these findings cannot be extended to human surgery without reservations, we suggest that diagnostic partial resection and minor cyst resections are ideal initial indications for this minimally invasive approach.
Partially coherent ultrafast spectrography
NASA Astrophysics Data System (ADS)
Bourassin-Bouchet, C.; Couprie, M.-E.
2015-03-01
Modern ultrafast metrology relies on the postulate that the pulse to be measured is fully coherent, that is, that it can be completely described by its spectrum and spectral phase. However, synthesizing fully coherent pulses is not always possible in practice, especially in the domain of emerging ultrashort X-ray sources where temporal metrology is strongly needed. Here we demonstrate how frequency-resolved optical gating (FROG), the first and one of the most widespread techniques for pulse characterization, can be adapted to measure partially coherent pulses even down to the attosecond timescale. No modification of experimental apparatuses is required; only the processing of the measurement changes. To do so, we take our inspiration from other branches of physics where partial coherence is routinely dealt with, such as quantum optics and coherent diffractive imaging. This will have important and immediate applications, such as enabling the measurement of X-ray free-electron laser pulses despite timing jitter.
Hierarchical partial order ranking.
Carlsen, Lars
2008-09-01
Assessing the potential impact on environmental and human health from the production and use of chemicals or from polluted sites involves a multi-criteria evaluation scheme. A priori several parameters are to address, e.g., production tonnage, specific release scenarios, geographical and site-specific factors in addition to various substance dependent parameters. Further socio-economic factors may be taken into consideration. The number of parameters to be included may well appear to be prohibitive for developing a sensible model. The study introduces hierarchical partial order ranking (HPOR) that remedies this problem. By HPOR the original parameters are initially grouped based on their mutual connection and a set of meta-descriptors is derived representing the ranking corresponding to the single groups of descriptors, respectively. A second partial order ranking is carried out based on the meta-descriptors, the final ranking being disclosed though average ranks. An illustrative example on the prioritization of polluted sites is given.
Partially integrated exhaust manifold
Hayman, Alan W; Baker, Rodney E
2015-01-20
A partially integrated manifold assembly is disclosed which improves performance, reduces cost and provides efficient packaging of engine components. The partially integrated manifold assembly includes a first leg extending from a first port and terminating at a mounting flange for an exhaust gas control valve. Multiple additional legs (depending on the total number of cylinders) are integrally formed with the cylinder head assembly and extend from the ports of the associated cylinder and terminate at an exit port flange. These additional legs are longer than the first leg such that the exit port flange is spaced apart from the mounting flange. This configuration provides increased packaging space adjacent the first leg for any valving that may be required to control the direction and destination of exhaust flow in recirculation to an EGR valve or downstream to a catalytic converter.
Activated partial thromboplastin time.
Ignjatovic, Vera
2013-01-01
Activated partial thromboplastin time (APTT) is a commonly used coagulation assay that is easy to perform, is affordable, and is therefore performed in most coagulation laboratories, both clinical and research, worldwide. The APTT is based on the principle that in citrated plasma, the addition of a platelet substitute, factor XII activator, and CaCl2 allows for formation of a stable clot. The time required for the formation of a stable clot is recorded in seconds and represents the actual APTT result.
Parallel, Implicit, Finite Element Solver
NASA Astrophysics Data System (ADS)
Lowrie, Weston; Shumlak, Uri; Meier, Eric; Marklin, George
2007-11-01
A parallel, implicit, finite element solver is described for solutions to the ideal MHD equations and the Pseudo-1D Euler equations. The solver uses the conservative flux source form of the equations. This helps simplify the discretization of the finite element method by keeping the specification of the physics separate. An implicit time advance is used to allow sufficiently large time steps. The Portable Extensible Toolkit for Scientific Computation (PETSc) is implemented for parallel matrix solvers and parallel data structures. Results for several test cases are described as well as accuracy of the method.
Multigrid on massively parallel architectures
Falgout, R D; Jones, J E
1999-09-17
The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.
Parallel Architecture For Robotics Computation
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1990-01-01
Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.
IOPA: I/O-aware parallelism adaption for parallel programs
Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei
2017-01-01
With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
Conjugate Gradients Parallelized on the Hypercube
NASA Astrophysics Data System (ADS)
Basermann, Achim
For the solution of discretized ordinary or partial differential equations it is necessary to solve systems of equations with coefficient matrices of different sparsity pattern, depending on the discretization method; using the finite element method (FE) results in largely unstructured systems of equations. A frequently used iterative solver for systems of equations is the method of conjugate gradients (CG) with different preconditioners. On a multiprocessor system with distributed memory, in particular the data distribution and the communication scheme depending on the used data struture are of greatest importance for the efficient execution of this method. Here, a data distribution and a communication scheme are presented which are based on the analysis of the column indices of the non-zero matrix elements. The performance of the developed parallel CG-method was measured on the distributed-memory-system INTEL iPSC/860 of the Research Centre Jülich with systems of equations from FE-models. The parallel CG-algorithm has been shown to be well suited for both regular and irregular discretization meshes, i.e. for coefficient matrices of very different sparsity pattern.
Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library
2015-02-19
ParFELAG is a parallel distributed memory C++ library for numerical upscaling of finite element discretizations. It provides optimal complesity algorithms ro build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured mesh (under the assumption that the topology of the agglomerated entities is correct). Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.
Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library
2015-02-19
ParFELAG is a parallel distributed memory C++ library for numerical upscaling of finite element discretizations. It provides optimal complesity algorithms ro build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured mesh (under the assumption that the topology of the agglomerated entities is correct). Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations
NASA Technical Reports Server (NTRS)
Chrisochoides, Nikos
1995-01-01
We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
Automating the parallel processing of fluid and structural dynamics calculations
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.; Cole, Gary L.
1987-01-01
The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilties to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.
Automating the parallel processing of fluid and structural dynamics calculations
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.; Cole, Gary L.
1987-01-01
The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.
Solving unstructured grid problems on massively parallel computers
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1990-01-01
A highly parallel graph mapping technique that enables one to efficiently solve unstructured grid problems on massively parallel computers is presented. Many implicit and explicit methods for solving discretized partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The cost of this communication can negate the high performance promised by massively parallel computing. To eliminate this bottleneck, the graph of the irregular problem is mapped into the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. It is shown that using the heuristic mapping algorithm significantly reduces the communication time compared to a naive assignment of processes to processors.
Appendix E: Parallel Pascal development system
NASA Technical Reports Server (NTRS)
1985-01-01
The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.
Appendix E: Parallel Pascal development system
NASA Technical Reports Server (NTRS)
1985-01-01
The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.
"Feeling" Series and Parallel Resistances.
ERIC Educational Resources Information Center
Morse, Robert A.
1993-01-01
Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)
Distinguishing serial and parallel parsing.
Gibson, E; Pearlmutter, N J
2000-03-01
This paper discusses ways of determining whether the human parser is serial maintaining at most, one structural interpretation at each parse state, or whether it is parallel, maintaining more than one structural interpretation in at least some circumstances. We make four points. The first two counterclaims made by Lewis (2000): (1) that the availability of alternative structures should not vary as a function of the disambiguating material in some ranked parallel models; and (2) that parallel models predict a slow down during the ambiguous region for more syntactically ambiguous structures. Our other points concern potential methods for seeking experimental evidence relevant to the serial/parallel question. We discuss effects of the plausibility of a secondary structure in the ambiguous region (Pearlmutter & Mendelsohn, 1999) and suggest examining the distribution of reaction times in the disambiguating region.
Demonstrating Forces between Parallel Wires.
ERIC Educational Resources Information Center
Baker, Blane
2000-01-01
Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)
Parallel programming of industrial applications
Heroux, M; Koniges, A; Simon, H
1998-07-21
In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from these applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Demonstrating Forces between Parallel Wires.
ERIC Educational Resources Information Center
Baker, Blane
2000-01-01
Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)
Parallel hierarchical method in networks
NASA Astrophysics Data System (ADS)
Malinochka, Olha; Tymchenko, Leonid
2007-09-01
This method of parallel-hierarchical Q-transformation offers new approach to the creation of computing medium - of parallel -hierarchical (PH) networks, being investigated in the form of model of neurolike scheme of data processing [1-5]. The approach has a number of advantages as compared with other methods of formation of neurolike media (for example, already known methods of formation of artificial neural networks). The main advantage of the approach is the usage of multilevel parallel interaction dynamics of information signals at different hierarchy levels of computer networks, that enables to use such known natural features of computations organization as: topographic nature of mapping, simultaneity (parallelism) of signals operation, inlaid cortex, structure, rough hierarchy of the cortex, spatially correlated in time mechanism of perception and training [5].
"Feeling" Series and Parallel Resistances.
ERIC Educational Resources Information Center
Morse, Robert A.
1993-01-01
Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Debugging in a parallel environment
Wasserman, H.J.; Griffin, J.H.
1985-01-01
This paper describes the preliminary results of a project investigating approaches to dynamic debugging in parallel processing systems. Debugging programs in a multiprocessing environment is particularly difficult because of potential errors in synchronization of tasks, data dependencies, sharing of data among tasks, and irreproducibility of specific machine instruction sequences from one job to the next. The basic methodology involved in predicate-based debuggers is given as well as other desirable features of dynamic parallel debugging. 13 refs.
Parallel Algorithms for Image Analysis.
1982-06-01
8217 _ _ _ _ _ _ _ 4. TITLE (aid Subtitle) S. TYPE OF REPORT & PERIOD COVERED PARALLEL ALGORITHMS FOR IMAGE ANALYSIS TECHNICAL 6. PERFORMING O4G. REPORT NUMBER TR-1180...Continue on reverse side it neceesary aid Identlfy by block number) Image processing; image analysis ; parallel processing; cellular computers. 20... IMAGE ANALYSIS TECHNICAL 6. PERFORMING ONG. REPORT NUMBER TR-1180 - 7. AUTHOR(&) S. CONTRACT OR GRANT NUMBER(s) Azriel Rosenfeld AFOSR-77-3271 9
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Architectures for reasoning in parallel
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.
1989-01-01
The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Efficiency of parallel direct optimization.
Janies, D A; Wheeler, W C
2001-03-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size.
Partial Southwest Elevation Mill #5 West (Part 3), Partial ...
Partial Southwest Elevation - Mill #5 West (Part 3), Partial Southwest Elevation - Mill #5 West (with Section of Courtyard) (Parts 1 & 2) - Boott Cotton Mills, John Street at Merrimack River, Lowell, Middlesex County, MA
Paternalism and partial autonomy.
O'Neill, O
1984-01-01
A contrast is often drawn between standard adult capacities for autonomy, which allow informed consent to be given or withheld, and patients' reduced capacities, which demand paternalistic treatment. But patients may not be radically different from the rest of us, in that all human capacities for autonomous action are limited. An adequate account of paternalism and the role that consent and respect for persons can play in medical and other practice has to be developed within an ethical theory that does not impose an idealised picture of unlimited autonomy but allows for the variable and partial character of actual human autonomy. PMID:6520849
Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Experts' Understanding of Partial Derivatives Using the Partial Derivative Machine
ERIC Educational Resources Information Center
Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.
2015-01-01
Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of…
Experts' Understanding of Partial Derivatives Using the Partial Derivative Machine
ERIC Educational Resources Information Center
Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.
2015-01-01
Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of…
Parallel Implicit Algorithms for CFD
NASA Technical Reports Server (NTRS)
Keyes, David E.
1998-01-01
The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.
The economics of parallel trade.
Danzon, P M
1998-03-01
The potential for parallel trade in the European Union (EU) has grown with the accession of low price countries and the harmonisation of registration requirements. Parallel trade implies a conflict between the principle of autonomy of member states to set their own pharmaceutical prices, the principle of free trade and the industrial policy goal of promoting innovative research and development (R&D). Parallel trade in pharmaceuticals does not yield the normal efficiency gains from trade because countries achieve low pharmaceutical prices by aggressive regulation, not through superior efficiency. In fact, parallel trade reduces economic welfare by undermining price differentials between markets. Pharmaceutical R&D is a global joint cost of serving all consumers worldwide; it accounts for roughly 30% of total costs. Optimal (welfare maximising) pricing to cover joint costs (Ramsey pricing) requires setting different prices in different markets, based on inverse demand elasticities. By contrast, parallel trade and regulation based on international price comparisons tend to force price convergence across markets. In response, manufacturers attempt to set a uniform 'euro' price. The primary losers from 'euro' pricing will be consumers in low income countries who will face higher prices or loss of access to new drugs. In the long run, even higher income countries are likely to be worse off with uniform prices, because fewer drugs will be developed. One policy option to preserve price differentials is to exempt on-patent products from parallel trade. An alternative is confidential contracting between individual manufacturers and governments to provide country-specific ex post discounts from the single 'euro' wholesale price, similar to rebates used by managed care in the US. This would preserve differentials in transactions prices even if parallel trade forces convergence of wholesale prices.
Unilateral removable partial dentures.
Goodall, W A; Greer, A C; Martin, N
2017-01-27
Removable partial dentures (RPDs) are widely used to replace missing teeth in order to restore both function and aesthetics for the partially dentate patient. Conventional RPD design is frequently bilateral and consists of a major connector that bridges both sides of the arch. Some patients cannot and will not tolerate such an extensive appliance. For these patients, bridgework may not be a predictable option and it is not always possible to provide implant-retained restorations. This article presents unilateral RPDs as a potential treatment modality for such patients and explores indications and contraindications for their use, including factors relating to patient history, clinical presentation and patient wishes. Through case examples, design, material and fabrication considerations will be discussed. While their use is not widespread, there are a number of patients who benefit from the provision of unilateral RPDs. They are a useful treatment to have in the clinician's armamentarium, but a highly-skilled dental team and a specific patient presentation is required in order for them to be a reasonable and predictable prosthetic option.
Is Titan Partially Differentiated?
NASA Astrophysics Data System (ADS)
Mitri, G.; Pappalardo, R. T.; Stevenson, D. J.
2009-12-01
The recent measurement of the gravity coefficients from the Radio Doppler data of the Cassini spacecraft has improved our knowledge of the interior structure of Titan (Rappaport et al. 2008 AGU, P21A-1343). The measured gravity field of Titan is dominated by near hydrostatic quadrupole components. We have used the measured gravitational coefficients, thermal models and the hydrostatic equilibrium theory to derive Titan's interior structure. The axial moment of inertia gives us an indication of the degree of the interior differentiation. The inferred axial moment of inertia, calculated using the quadrupole gravitational coefficients and the Radau-Darwin approximation, indicates that Titan is partially differentiated. If Titan is partially differentiated then the interior must avoid melting of the ice during its evolution. This suggests a relatively late formation of Titan to avoid the presence of short-lived radioisotopes (Al-26). This also suggests the onset of convection after accretion to efficiently remove the heat from the interior. The outer layer is likely composed mainly of water in solid phase. Thermal modeling indicates that water could be present also in liquid phase forming a subsurface ocean between an outer ice I shell and a high pressure ice layer. Acknowledgments: This work was conducted at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration.
Furnace brazing under partial vacuum
NASA Technical Reports Server (NTRS)
Mckown, R. D.
1979-01-01
Brazing furnace utilizing partial-vacuum technique reduces tooling requirements and produces better bond. Benefit in that partial vacuum helps to dissociate metal oxides that inhibit metal flow and eliminates heavy tooling required to hold parts together during brazing.
Furnace brazing under partial vacuum
NASA Technical Reports Server (NTRS)
Mckown, R. D.
1979-01-01
Brazing furnace utilizing partial-vacuum technique reduces tooling requirements and produces better bond. Benefit in that partial vacuum helps to dissociate metal oxides that inhibit metal flow and eliminates heavy tooling required to hold parts together during brazing.
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)
NASA Technical Reports Server (NTRS)
Straeter, T. A.; Markos, A. T.
1975-01-01
A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
Computing contingency statistics in parallel.
Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre
2010-09-01
Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.
Parallelizing Timed Petri Net simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1993-01-01
The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
Parallel, semiparallel, and serial processing of visual hyperacuity
NASA Astrophysics Data System (ADS)
Fahle, Manfred W.
1990-10-01
Humans can discriminate between certain elementary stimulus features in parallel, i.e., simultaneously over the visual field. I present evidence that, in man, vernier rnisalignments in the hyperacuity-range, i.e., below the photoreceptor diameter, can also be detected in parallel. This indicates that the visUal system performs some form of spatial interpolation beyond the photoreceptor spacing simultaneously over the visual field. Vernier offsets are detected in parallel even when orientation cues are masked: deviation from straightness is an elementary feature of visual perception. However, the identification process, that classifies each vernier in a stimulus as being offset to the right (versus to the left) is serial and has to scan the visual field sequentially if orientation cues are masked. Therefore, reaction times and thresholds in vernier acuity tasks increase with the number of verniers presented simultaneously if classification of different features is required. Furthermore, when approaching vernier threshold, simple vernier detection is no longer parallel but becomes partially serial, or semi-parallel.
Electrical conductivity anisotropy of partially molten peridotite under shear deformation
NASA Astrophysics Data System (ADS)
Zhang, B.; Yoshino, T.; Yamazaki, D.; Manthilake, G. M.; Katsura, T.
2013-12-01
Recent ocean bottom magnetotelluric investigations have revealed a high-conductivity layer (HCL) with high anisotropy characterized by higher conductivity values in the direction parallel to the plate motion beneath the southern East Pacific Rise (Evans et al., 2005) and beneath the edge of the Cocos plate at the Middle America trench offshore of Nicaragua (Naif et al., 2013). These geophysical observations have been attributed to either hydration (water) of mantle minerals or the presence of partial melt. Currently, aligned partial melt has been regarded as the most preferable candidate for explaining the conductivity anisotropy because of the implausibility of proton conduction (Yoshino et al., 2006). In this study, we report development of the conductivity anisotropy between parallel and normal to shear direction on the shear plane in partial molten peridotite as a function of time and shear strain. Starting samples were pre-synthesized partial molten peridotite, showing homogeneous melt distribution. The partially molten peridotite samples were deformed in simple shear geometry at 1 GPa and 1723 K in a DIA-type apparatus with uniaxial deformation facility. Conductivity difference between parallel and normal to shear direction reached one order, which is equivalent to that observed beneath asthenosphere. In contrast, such anisotropic behavior was not found in the melt-free samples, suggesting that development of the conductivity anisotropy was generated under shear stress. Microstructure of the deformed partial molten peridotite shows partial melt tends to preferentially locate grain boundaries parallel to shear direction, and forms continuously thin melt layer sub-parallel to the shear direction, whereas apparently isolated distribution was observed on the section perpendicular to the shear direction. The resultant melt morphology can be approximated by tube like geometry parallel to the shear direction. This observation suggests that the development of
Partially segmented deformable mirror
Bliss, E.S.; Smith, J.R.; Salmon, J.T.; Monjes, J.A.
1991-05-21
A partially segmented deformable mirror is formed with a mirror plate having a smooth and continuous front surface and a plurality of actuators to its back surface. The back surface is divided into triangular areas which are mutually separated by grooves. The grooves are deep enough to make the plate deformable and the actuators for displacing the mirror plate in the direction normal to its surface are inserted in the grooves at the vertices of the triangular areas. Each actuator includes a transducer supported by a receptacle with outer shells having outer surfaces. The vertices have inner walls which are approximately perpendicular to the mirror surface and make planar contacts with the outer surfaces of the outer shells. The adhesive which is used on these contact surfaces tends to contract when it dries but the outer shells can bend and serve to minimize the tendency of the mirror to warp. 5 figures.
Partially segmented deformable mirror
Bliss, Erlan S.; Smith, James R.; Salmon, J. Thaddeus; Monjes, Julio A.
1991-01-01
A partially segmented deformable mirror is formed with a mirror plate having a smooth and continuous front surface and a plurality of actuators to its back surface. The back surface is divided into triangular areas which are mutually separated by grooves. The grooves are deep enough to make the plate deformable and the actuators for displacing the mirror plate in the direction normal to its surface are inserted in the grooves at the vertices of the triangular areas. Each actuator includes a transducer supported by a receptacle with outer shells having outer surfaces. The vertices have inner walls which are approximately perpendicular to the mirror surface and make planar contacts with the outer surfaces of the outer shells. The adhesive which is used on these contact surfaces tends to contract when it dries but the outer shells can bend and serve to minimize the tendency of the mirror to warp.
Krumpelt, Michael; Ahmed, Shabbir; Kumar, Romesh; Doshi, Rajiv
2000-01-01
A two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion. The dehydrogenation portion is a group VIII metal and the oxide-ion conducting portion is selected from a ceramic oxide crystallizing in the fluorite or perovskite structure. There is also disclosed a method of forming a hydrogen rich gas from a source of hydrocarbon fuel in which the hydrocarbon fuel contacts a two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion at a temperature not less than about 400.degree. C. for a time sufficient to generate the hydrogen rich gas while maintaining CO content less than about 5 volume percent. There is also disclosed a method of forming partially oxidized hydrocarbons from ethanes in which ethane gas contacts a two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion for a time and at a temperature sufficient to form an oxide.
Visualizing Parallel Computer System Performance
NASA Technical Reports Server (NTRS)
Malony, Allen D.; Reed, Daniel A.
1988-01-01
Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.
Massively parallel MRI detector arrays.
Keil, Boris; Wald, Lawrence L
2013-04-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called "ultimate" SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. Copyright © 2013 Elsevier Inc. All rights reserved.
Features in Continuous Parallel Coordinates.
Lehmann, Dirk J; Theisel, Holger
2011-12-01
Continuous Parallel Coordinates (CPC) are a contemporary visualization technique in order to combine several scalar fields, given over a common domain. They facilitate a continuous view for parallel coordinates by considering a smooth scalar field instead of a finite number of straight lines. We show that there are feature curves in CPC which appear to be the dominant structures of a CPC. We present methods to extract and classify them and demonstrate their usefulness to enhance the visualization of CPCs. In particular, we show that these feature curves are related to discontinuities in Continuous Scatterplots (CSP). We show this by exploiting a curve-curve duality between parallel and Cartesian coordinates, which is a generalization of the well-known point-line duality. Furthermore, we illustrate the theoretical considerations. Concluding, we discuss relations and aspects of the CPC's/CSP's features concerning the data analysis.
Parallel integrated frame synchronizer chip
NASA Technical Reports Server (NTRS)
Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)
2000-01-01
A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.
PARAVT: Parallel Voronoi tessellation code
NASA Astrophysics Data System (ADS)
González, R. E.
2016-10-01
In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.
Parallel Adaptive Mesh Refinement Library
NASA Technical Reports Server (NTRS)
Mac-Neice, Peter; Olson, Kevin
2005-01-01
Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.
Massively Parallel MRI Detector Arrays
Keil, Boris; Wald, Lawrence L
2013-01-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758
Fast data parallel polygon rendering
Ortega, F.A.; Hansen, C.D.
1993-09-01
This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.
Hybrid parallel programming with MPI and Unified Parallel C.
Dinan, J.; Balaji, P.; Lusk, E.; Sadayappan, P.; Thakur, R.; Mathematics and Computer Science; The Ohio State Univ.
2010-01-01
The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.
Parallel algorithms for mapping pipelined and parallel computations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Gang scheduling a parallel machine
Gorda, B.C.; Brooks, E.D. III.
1991-03-01
Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.
Gang scheduling a parallel machine
Gorda, B.C.; Brooks, E.D. III.
1991-12-01
Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processes. User programs and their gangs of processes are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantum are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory.
Medipix2 parallel readout system
NASA Astrophysics Data System (ADS)
Fanti, V.; Marzeddu, R.; Randaccio, P.
2003-08-01
A fast parallel readout system based on a PCI board has been developed in the framework of the Medipix collaboration. The readout electronics consists of two boards: the motherboard directly interfacing the Medipix2 chip, and the PCI board with digital I/O ports 32 bits wide. The device driver and readout software have been developed at low level in Assembler to allow fast data transfer and image reconstruction. The parallel readout permits a transfer rate up to 64 Mbytes/s. http://medipix.web.cern ch/MEDIPIX/
Parallelization of the SIR code
NASA Astrophysics Data System (ADS)
Thonhofer, S.; Bellot Rubio, L. R.; Utz, D.; Jurčak, J.; Hanslmeier, A.; Piantschitsch, I.; Pauritsch, J.; Lemmerer, B.; Guttenbrunner, S.
A high-resolution 3-dimensional model of the photospheric magnetic field is essential for the investigation of small-scale solar magnetic phenomena. The SIR code is an advanced Stokes-inversion code that deduces physical quantities, e.g. magnetic field vector, temperature, and LOS velocity, from spectropolarimetric data. We extended this code by the capability of directly using large data sets and inverting the pixels in parallel. Due to this parallelization it is now feasible to apply the code directly on extensive data sets. Besides, we included the possibility to use different initial model atmospheres for the inversion, which enhances the quality of the results.
The Complexity of Parallel Algorithms,
1985-11-01
Much of this work was done in collaboration with my advisor, Ernst Mayr . He was also supported in part by ONR contract N00014-85-C-0731. F ’. Table...Helinbold and Mayr in their algorithn to compute an optimal two processor schedule [HM2]. One of the promising developments in parallel algorithms is that...lei can be solved by it fast parallel algorithmmmi if the nmlmmmibers are smiall. llehmibold and Mayr JIlM I] have slhowm that. if Ole job timies are
Robot-assisted partial nephrectomy: Superiority over laparoscopic partial nephrectomy.
Shiroki, Ryoichi; Fukami, Naohiko; Fukaya, Kosuke; Kusaka, Mamoru; Natsume, Takahiro; Ichihara, Takashi; Toyama, Hiroshi
2016-02-01
Nephron-sparing surgery has been proven to positively impact the postoperative quality of life for the treatment of small renal tumors, possibly leading to functional improvements. Laparoscopic partial nephrectomy is still one of the most demanding procedures in urological surgery. Laparoscopic partial nephrectomy sometimes results in extended warm ischemic time and severe complications, such as open conversion, postoperative hemorrhage and urine leakage. Robot-assisted partial nephrectomy exploits the advantages offered by the da Vinci Surgical System to laparoscopic partial nephrectomy, equipped with 3-D vision and a better degree in the freedom of surgical instruments. The introduction of the da Vinci Surgical System made nephron-sparing surgery, specifically robot-assisted partial nephrectomy, safe with promising results, leading to the shortening of warm ischemic time and a reduction in perioperative complications. Even for complex and challenging tumors, robotic assistance is expected to provide the benefit of minimally-invasive surgery with safe and satisfactory renal function. Warm ischemic time is the modifiable factor during robot-assisted partial nephrectomy to affect postoperative kidney function. We analyzed the predictive factors for extended warm ischemic time from our robot-assisted partial nephrectomy series. The surface area of the tumor attached to the kidney parenchyma was shown to significantly affect the extended warm ischemic time during robot-assisted partial nephrectomy. In cases with tumor-attached surface area more than 15 cm(2) , we should consider switching robot-assisted partial nephrectomy to open partial nephrectomy under cold ischemia if it is imperative. In Japan, a nationwide prospective study has been carried out to show the superiority of robot-assisted partial nephrectomy to laparoscopic partial nephrectomy in improving warm ischemic time and complications. By facilitating robotic technology, robot-assisted partial nephrectomy
Parallel multiscale simulations of a brain aneurysm.
Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκαr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2012-01-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Parallel multiscale simulations of a brain aneurysm
NASA Astrophysics Data System (ADS)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical
Tutorial: Parallel Simulation on Supercomputers
Perumalla, Kalyan S
2012-01-01
This tutorial introduces typical hardware and software characteristics of extant and emerging supercomputing platforms, and presents issues and solutions in executing large-scale parallel discrete event simulation scenarios on such high performance computing systems. Covered topics include synchronization, model organization, example applications, and observed performance from illustrative large-scale runs.
Sequential and Parallel Matrix Computations.
1984-10-01
value decomposition and learnt square solutions, Numer. Math. 14 (1970), 403-420. 22o J. Greer and A. Sameh , On certain parallel Toeplitz linear system...Zur Stabilitatsfrag bei Matrizen-EigenweCe-Problemn, Z. Angun. Hath. Phys. (1956). 473-500. 36. D. L. Slotnick and A. H. Sameh , Numerical calculation
Parallel Algorithms for PDE Solvers
1988-07-15
This report lists all of the 39 scientific publications , these, technical reports and conference presentations supported by the grant AFOSR 84-0385. The principal focus of the results are in 1) The Collocation Method: New versions developed for parallel machines, new results on the convergence and new
Fast, Massively Parallel Data Processors
NASA Technical Reports Server (NTRS)
Heaton, Robert A.; Blevins, Donald W.; Davis, ED
1994-01-01
Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Optical Interferometric Parallel Data Processor
NASA Technical Reports Server (NTRS)
Breckinridge, J. B.
1987-01-01
Image data processed faster than in present electronic systems. Optical parallel-processing system effectively calculates two-dimensional Fourier transforms in time required by light to travel from plane 1 to plane 8. Coherence interferometer at plane 4 splits light into parts that form double image at plane 6 if projection screen placed there.
[Falsified medicines in parallel trade].
Muckenfuß, Heide
2017-09-13
The number of falsified medicines on the German market has distinctly increased over the past few years. In particular, stolen pharmaceutical products, a form of falsified medicines, have increasingly been introduced into the legal supply chain via parallel trading. The reasons why parallel trading serves as a gateway for falsified medicines are most likely the complex supply chains and routes of transport. It is hardly possible for national authorities to trace the history of a medicinal product that was bought and sold by several intermediaries in different EU member states. In addition, the heterogeneous outward appearance of imported and relabelled pharmaceutical products facilitates the introduction of illegal products onto the market. Official batch release at the Paul-Ehrlich-Institut offers the possibility of checking some aspects that might provide an indication of a falsified medicine. In some circumstances, this may allow the identification of falsified medicines before they come onto the German market. However, this control is only possible for biomedicinal products that have not received a waiver regarding official batch release. For improved control of parallel trade, better networking among the EU member states would be beneficial. European-wide regulations, e. g., for disclosure of the complete supply chain, would help to minimise the risks of parallel trading and hinder the marketing of falsified medicines.
Parallel distributed computing using Python
NASA Astrophysics Data System (ADS)
Dalcin, Lisandro D.; Paz, Rodrigo R.; Kler, Pablo A.; Cosimo, Alejandro
2011-09-01
This work presents two software components aimed to relieve the costs of accessing high-performance parallel computing resources within a Python programming environment: MPI for Python and PETSc for Python. MPI for Python is a general-purpose Python package that provides bindings for the Message Passing Interface (MPI) standard using any back-end MPI implementation. Its facilities allow parallel Python programs to easily exploit multiple processors using the message passing paradigm. PETSc for Python provides access to the Portable, Extensible Toolkit for Scientific Computation (PETSc) libraries. Its facilities allow sequential and parallel Python applications to exploit state of the art algorithms and data structures readily available in PETSc for the solution of large-scale problems in science and engineering. MPI for Python and PETSc for Python are fully integrated to PETSc-FEM, an MPI and PETSc based parallel, multiphysics, finite elements code developed at CIMEC laboratory. This software infrastructure supports research activities related to simulation of fluid flows with applications ranging from the design of microfluidic devices for biochemical analysis to modeling of large-scale stream/aquifer interactions.
Parallel coprocessors speed graphics system
Mcewan, C.
1983-05-26
Up to five parallel coprocessors, a pipelined architecture and display-list data structures combine to create Ramtek Corporation's fast, modular/raster graphics system, which is upgradable with software. It is stated that the system meets the needs of most CAD/CAM and simulation graphics applications. A 32-bit Vmebus structure is used.
Matpar: Parallel Extensions for MATLAB
NASA Technical Reports Server (NTRS)
Springer, P. L.
1998-01-01
Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.
Matpar: Parallel Extensions for MATLAB
NASA Technical Reports Server (NTRS)
Springer, P. L.
1998-01-01
Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.
Parallel, Distributed Scripting with Python
Miller, P J
2002-05-24
Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI library gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.
File concepts for parallel I/O
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1989-01-01
The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
PALM: a Parallel Dynamic Coupler
NASA Astrophysics Data System (ADS)
Thevenin, A.; Morel, T.
2008-12-01
In order to efficiently represent complex systems, numerical modeling has to rely on many physical models at a time: an ocean model coupled with an atmospheric model is at the basis of climate modeling. The continuity of the solution is granted only if these models can constantly exchange information. PALM is a coupler allowing the concurrent execution and the intercommunication of programs not having been especially designed for that. With PALM, the dynamic coupling approach is introduced: a coupled component can be launched and can release computers' resources upon termination at any moment during the simulation. In order to exploit as much as possible computers' possibilities, the PALM coupler handles two levels of parallelism. The first level concerns the components themselves. While managing the resources, PALM allocates the number of processes which are necessary to any coupled component. These models can be parallel programs based on domain decomposition with MPI or applications multithreaded with OpenMP. The second level of parallelism is a task parallelism: one can define a coupling algorithm allowing two or more programs to be executed in parallel. PALM applications are implemented via a Graphical User Interface called PrePALM. In this GUI, the programmer initially defines the coupling algorithm then he describes the actual communications between the models. PALM offers a very high flexibility for testing different coupling techniques and for reaching the best load balance in a high performance computer. The transformation of computational independent code is almost straightforward. The other qualities of PALM are its easy set-up, its flexibility, its performances, the simple updates and evolutions of the coupled application and the many side services and functions that it offers.
Partial disassembly of peroxisomes
1985-01-01
Rat liver peroxisomes were subjected to a variety of procedures intended to partially disassemble or damage them; the effects were analyzed by recentrifugation into sucrose gradients, enzyme analyses, electron microscopy, and SDS PAGE. Freezing and thawing or mild sonication released some matrix proteins and produced apparently intact peroxisomal "ghosts" with crystalloid cores and some fuzzy fibrillar content. Vigorous sonication broke open the peroxisomes but the membranes remained associated with cores and fibrillar and amorphous matrix material. The density of both ghosts and more severely damaged peroxisomes was approximately 1.23. Pyrophosphate (pH 9) treatment solubilized the fibrillar content, yielding ghosts that were empty except for cores. Some matrix proteins such as catalase and thiolase readily leak from peroxisomes. Other proteins were identified that remain in mechanically damaged peroxisomes but are neither core nor membrane proteins because they can be released by pyrophosphate treatment. These constitute a class of poorly soluble matrix proteins that appear to correspond to the fibrillar material observed morphologically. All of the peroxisomal beta-oxidation enzymes are located in the matrix, but they vary greatly in how easily they leak out. Palmitoyl coenzyme A synthetase is in the membrane, based on its co-distribution with the 22-kilodalton integral membrane polypeptide. PMID:2989301
NASA Astrophysics Data System (ADS)
The evolution of magmas is a topic of considerable importance in geology and geophysics because it affects volcanology, igneous petrology, geothermal energy sources, mantle convection, and the thermaland chemical evolution of the earth. The dynamics and evolution of magmas are strongly affected by the presence of solid crystals that occur either in suspension in liquid or as a rigid porous matrix through which liquid magma can percolate. Such systems are physically complex and difficult to model mathematically. Similar physical situations are encountered by metallurgists who study the solidification of molten alloys, and applied mathematicians have long been interested in such moving boundary problems. Clearly, it would be of mutual benefit to bring together scientists, engineers, and mathematicians with a common interest in such systems. Such a meeting is being organized as a North Atlantic Treaty Organization (NATO) Advanced Research Workshop on the Structure and Dynamics of Partially Solidified Systems, to be held at Stanford University's Fallen Leaf Lodge at Tahoe, Calif., May 12-16, 1986 The invited speakers and their topics are
Removable partial denture occlusion.
Ivanhoe, John R; Plummer, Kevin D
2004-07-01
No single occlusal morphology, scheme, or material will successfully treat all patients. Many patients have been treated, both successfully and unsuccessfully, using widely varying theories of occlusion, choices of posterior tooth form, and restorative materials. Therefore, experience has demonstrated that there is no one righ r way to restore the occlusion of all patients. Partially edentulous patients have many and varied needs. Clinicians must understand the healthy physiologic gnathostomatic system and properly diagnose what is or may become pathologic. Henderson [3] stated that the occlusion of the successfully treated patient allows the masticating mechanism to carry out its physiologic functions while the temporomandibular joints, the neuromuscular mechanism, the teeth and their supporting structures remain in a good state of health. Skills in diagnosis and treatment planning are of utmost importance in treating these patients, for whom the clinician's goals are not only an esthetic and functional restoration but also a lasting harmonious state. Perhaps this was best state by DeVan [55] more than 60 years ago in his often-quoted objective. "The patient's fundamental need is the continued meticulous restoration of what is missing, since what is lost is in a sense irretrievably lost." Because it is clear that there is no one method, no one occlusal scheme, or one material that guarantees success for all patients, recommendations for consideration when establishing or reestablishing occlusal schemes have been presented. These recommendations must be used in conjunction with other diagnostic and technical skills.
[Post traumatic partial seizures].
Carvajal, P; Almárcegui, C; Pablo, M J; Peralta, P; Bernal, M; Valdizán, J R
Post traumatic epilepsy represents 4% of the prevalence of the disorder and is one of the sequelas which is most difficult to prevent. Risk factors have been described to predict the appearance of seizures. A seven year old boy with a severe head injury was admitted to the Intensive Care Unit. On neuroimaging studies there were multiple foci of contusion, mainly in the left hemisphere, and blood in the III and IV ventricles and frontal horn of the left lateral ventricle. The patient had severe sequelae of head injury with a right spastic hemiplegia and hemiparesia with hypertony of the left side, together with complete blindness of both eyes due to bilateral atrophy of the optic nerve. Serial EEG were done, in which a recording showed alternating periods of hypervoltage grapho elements superimposed on a trace of very low voltage, with continuous activity of low voltage and low frequency. There were no grapho elements with acute morphology. However, the patient had a first partial seizure a year and a half after his head injury. On the EEG an epileptogenic focus was identified in the left hemisphere. Within two years of his head injury he had seven seizures. He had not received prophylactic antiepileptic treatment after the head injury. We report a case of epilepsy secondary to a head injury, in which the first seizure occurred one and a half years after injury. In view of the risk factors, we discuss whether prophylactic anti epileptic treatment might have been beneficial.
[A HPF application to parallelize a 2-D PDE model].
Contreras, Xiómara; Hernández, Emilio
2003-01-01
Many practical numerical applications would require a parallel implementation in order to obtain a satisfactory response in a reasonable amount of time. In this sense, this work shows a parallel implementation of an explicit scheme of finite difference (FD) proposed by Kelly et. al., to solve the Partial Differential Equation (PDE / EDDP) of the Wave Propagation problem in an elastic, homogeneous or heterogeneous, two-dimensional medium. High-Performance-Fortran (HPF) will be used here for this purpose. This report shows measures of time on a PC-Cluster using 1, 2, and 4 processors with different sizes of data grid. In addition, a comparative test is included in which the cluster was initially connected using a Fast-Ethernet card, and then connected by a Myrinet card, using a grid size of 2500 x 2500 in both cases. The execution time achieved with two processors was highly satisfactory for all cases. In analogous conditions, the performance obtained with a Myrinet interconnection was better than the one obtained with a Fast-Ethernet interconnection. The scheme mentioned above has showed an excellent numerical result as it could be seen on the images included in this work. Key words: Partial differential equation, wave equation, explicite finite differences scheme, parallel scheme.
Vandewalle, S.
1994-12-31
Time-stepping methods for parabolic partial differential equations are essentially sequential. This prohibits the use of massively parallel computers unless the problem on each time-level is very large. This observation has led to the development of algorithms that operate on more than one time-level simultaneously; that is to say, on grids extending in space and in time. The so-called parabolic multigrid methods solve the time-dependent parabolic PDE as if it were a stationary PDE discretized on a space-time grid. The author has investigated the use of multigrid waveform relaxation, an algorithm developed by Lubich and Ostermann. The algorithm is based on a multigrid acceleration of waveform relaxation, a highly concurrent technique for solving large systems of ordinary differential equations. Another method of this class is the time-parallel multigrid method. This method was developed by Hackbusch and was recently subject of further study by Horton. It extends the elliptic multigrid idea to the set of equations that is derived by discretizing a parabolic problem in space and in time.
Partial breast radiation therapy - external beam
Carcinoma of the breast - partial radiation therapy; Partial external beam radiation - breast; Intensity-modulated radiation therapy - breast cancer; IMRT - breast cancer WBRT; Adjuvant partial breast - IMRT; APBI - ...
Partial proximal tibia fractures
Raschke, Michael J.; Kittl, Christoph; Domnick, Christoph
2017-01-01
Partial tibial plateau fractures may occur as a consequence of either valgus or varus trauma combined with a rotational and axial compression component. High-energy trauma may result in a more complex and multi-fragmented fracture pattern, which occurs predominantly in young people. Conversely, a low-energy mechanism may lead to a pure depression fracture in the older population with weaker bone density. Pre-operative classification of these fractures, by Müller AO, Schatzker or novel CT-based methods, helps to understand the fracture pattern and choose the surgical approach and treatment strategy in accordance with estimated bone mineral density and the individual history of each patient. Non-operative treatment may be considered for non-displaced intra-articular fractures of the lateral tibial condyle. Intra-articular joint displacement ⩾ 2 mm, open fractures or fractures of the medial condyle should be reduced and fixed operatively. Autologous, allogenic and synthetic bone substitutes can be used to fill bone defects. A variety of minimally invasive approaches, temporary osteotomies and novel techniques (e.g. arthroscopically assisted reduction or ‘jail-type’ screw osteosynthesis) offer a range of choices for the individual and are potentially less invasive treatments. Rehabilitation protocols should be carefully planned according to the degree of stability achieved by internal fixation, bone mineral density and other patient-specific factors (age, compliance, mobility). To avoid stiffness, early functional mobilisation plays a major role in rehabilitation. In the elderly, low-energy trauma and impression fractures are indicators for the further screening and treatment of osteoporosis. Cite this article: EFORT Open Rev 2017;2. DOI: 10.1302/2058-5241.2.160067. Originally published online at www.efortopenreviews.org PMID:28630761
Removable partial dentures without rests.
Meinig, D A
1994-04-01
Ever since Bonwill recommended the use of rests on removable partial dentures in 1899, rests have been universally considered inviolate and have gone unchallenged and untested. The author claims that removable partial dentures without rests may not cause the adverse conditions usually predicted, such as gingival stripping, gingival inflammation, mutilated residual ridges, or extensive and rapid resorption of the alveolar ridges. In removable partial dentures made by the author for several patients, the residual ridge remained stable and in physiologic equilibrium when rests were not used. A history of the long-term effect on patients wearing partial dentures with and without rests is presented.
Trigonometric Integrals via Partial Fractions
ERIC Educational Resources Information Center
Chen, H.; Fulford, M.
2005-01-01
Parametric differentiation is used to derive the partial fractions decompositions of certain rational functions. Those decompositions enable us to integrate some new combinations of trigonometric functions.
Trigonometric Integrals via Partial Fractions
ERIC Educational Resources Information Center
Chen, H.; Fulford, M.
2005-01-01
Parametric differentiation is used to derive the partial fractions decompositions of certain rational functions. Those decompositions enable us to integrate some new combinations of trigonometric functions.
Bridging the gap between parallel file systems and local file systems : a case study with PVFS.
Gu, P.; Wang, J.; Ross, R.; Mathematics and Computer Science; Univ. of Central Florida
2008-09-01
Parallel I/O plays an increasingly important role in today's data intensive computing applications. While much attention has been paid to parallel read performance, most of this work has focused on the parallel file system, middleware, or application layers, ignoring the potential for improvement through more effective use of local storage. In this paper, we present the design and implementation of segment-structured on-disk data grouping and prefetching (SOGP), a technique that leverages additional local storage to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. Parallel virtual file system (PVFS) is chosen as an example. Our experiments show that an SOGP-enhanced PVFS prototype system can outperform a traditional Linux-Ext3-based PVFS for many applications and benchmarks, in some tests by as much as 230% in terms of I/O bandwidth.
Experts' understanding of partial derivatives using the partial derivative machine
NASA Astrophysics Data System (ADS)
Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.
2015-12-01
[This paper is part of the Focused Collection on Upper Division Physics Courses.] Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of students in STEM disciplines. In this paper, we report on an initial study of expert understanding of partial derivatives across three disciplines: physics, engineering, and mathematics. We report on the central research question of how disciplinary experts understand partial derivatives, and how their concept images of partial derivatives differ, with a focus on experimentally measured quantities. Using the partial derivative machine (PDM), we probed expert understanding of partial derivatives in an experimental context without a known functional form. In particular, we investigated which representations were cued by the experts' interactions with the PDM. Whereas the physicists and engineers were quick to use measurements to find a numeric approximation for a derivative, the mathematicians repeatedly returned to speculation as to the functional form; although they were comfortable drawing qualitative conclusions about the system from measurements, they were reluctant to approximate the derivative through measurement. On a theoretical front, we found ways in which existing frameworks for the concept of derivative could be expanded to include numerical approximation.
Parallel supercomputing with commodity components
NASA Technical Reports Server (NTRS)
Warren, M. S.; Goda, M. P.; Becker, D. J.
1997-01-01
We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10(sup 15) floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.
Parallel supercomputing with commodity components
Warren, M.S.; Goda, M.P.; Becker, D.J.
1997-09-01
We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10{sup 15} floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.
Parallel multiplex laser feedback interferometry
Zhang, Song; Tan, Yidong; Zhang, Shulian
2013-12-15
We present a parallel multiplex laser feedback interferometer based on spatial multiplexing which avoids the signal crosstalk in the former feedback interferometer. The interferometer outputs two close parallel laser beams, whose frequencies are shifted by two acousto-optic modulators by 2Ω simultaneously. A static reference mirror is inserted into one of the optical paths as the reference optical path. The other beam impinges on the target as the measurement optical path. Phase variations of the two feedback laser beams are simultaneously measured through heterodyne demodulation with two different detectors. Their subtraction accurately reflects the target displacement. Under typical room conditions, experimental results show a resolution of 1.6 nm and accuracy of 7.8 nm within the range of 100 μm.
A generalized parallel replica dynamics
Binder, Andrew; Lelièvre, Tony; Simpson, Gideon
2015-03-01
Metastability is a common obstacle to performing long molecular dynamics simulations. Many numerical methods have been proposed to overcome it. One method is parallel replica dynamics, which relies on the rapid convergence of the underlying stochastic process to a quasi-stationary distribution. Two requirements for applying parallel replica dynamics are knowledge of the time scale on which the process converges to the quasi-stationary distribution and a mechanism for generating samples from this distribution. By combining a Fleming–Viot particle system with convergence diagnostics to simultaneously identify when the process converges while also generating samples, we can address both points. This variation on the algorithm is illustrated with various numerical examples, including those with entropic barriers and the 2D Lennard-Jones cluster of seven atoms.
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
ASP: a parallel computing technology
NASA Astrophysics Data System (ADS)
Lea, R. M.
1990-09-01
ASP modules constitute the basis of a parallel computing technology platform for the rapid development of a broad range of numeric and symbolic information processing systems. Based on off-the-shelf general-purpose hardware and software modules ASP technology is intended to increase productivity in the development (and competitiveness in the marketing) of cost-effective low-MIMD/high-SIMD Massively Parallel Processor (MPPs). The paper discusses ASP module philosophy and demonstrates how ASP modules can satisfy the market algorithmic architectural and engineering requirements of such MPPs. In particular two specific ASP modules based on VLSI and WSI technologies are studied as case examples of ASP technology the latter reporting 1 TOPS/fl3 1 GOPS/W and 1 MOPS/$ as ball-park figures-of-merit of cost-effectiveness.
Parallel processing spacecraft communication system
NASA Technical Reports Server (NTRS)
Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)
1998-01-01
An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.
A generalized parallel replica dynamics
NASA Astrophysics Data System (ADS)
Binder, Andrew; Lelièvre, Tony; Simpson, Gideon
2015-03-01
Metastability is a common obstacle to performing long molecular dynamics simulations. Many numerical methods have been proposed to overcome it. One method is parallel replica dynamics, which relies on the rapid convergence of the underlying stochastic process to a quasi-stationary distribution. Two requirements for applying parallel replica dynamics are knowledge of the time scale on which the process converges to the quasi-stationary distribution and a mechanism for generating samples from this distribution. By combining a Fleming-Viot particle system with convergence diagnostics to simultaneously identify when the process converges while also generating samples, we can address both points. This variation on the algorithm is illustrated with various numerical examples, including those with entropic barriers and the 2D Lennard-Jones cluster of seven atoms.
Parallel supercomputing with commodity components
NASA Technical Reports Server (NTRS)
Warren, M. S.; Goda, M. P.; Becker, D. J.
1997-01-01
We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10(sup 15) floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
High performance parallel implicit CFD.
Gropp, W. D.; Kaushik, D. K.; Keyes, D. E.; Smith, B. F.; Mathematics and Computer Science; Old Dominion Univ.
2001-03-01
Fluid dynamical simulations based on finite discretizations on (quasi-)static grids scale well in parallel, but execute at a disappointing percentage of per-processor peak floating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDE-based codes can be understood with surprisingly simple models, for which we quote earlier work, presenting primarily experimental results. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prize-winning simulation on ASCI computers.
Time parallelization of plasma simulations using the parareal algorithm
Samaddar, D.; Houlberg, Wayne A; Berry, Lee A; Elwasif, Wael R; Huysmans, G; Batchelor, Donald B
2011-01-01
Simulation of fusion plasmas involve a broad range of timescales. In magnetically confined plasmas, such as in ITER, the timescale associated with the microturbulence responsible for transport and confinement timescales vary by an order of 10^6 10^9. Simulating this entire range of timescales is currently impossible, even on the most powerful supercomputers available. Space parallelization has so far been the most common approach to solve partial differential equations. Space parallelization alone has led to computational saturation for fluid codes, which means that the walltime for computaion does not linearly decrease with the increasing number of processors used. The application of the parareal algorithm to simulations of fusion plasmas ushers in a new avenue of parallelization, namely temporal parallelization. The algorithm has been successfully applied to plasma turbulence simulations, prior to which it has been applied to other relatively simpler problems. This work explores the extension of the applicability of the parareal algorithm to ITER relevant problems, starting with a diffusion-convection model.
Accuracy of different impression materials in parallel and nonparallel implants
Vojdani, Mahroo; Torabi, Kianoosh; Ansarifard, Elham
2015-01-01
Background: A precise impression is mandatory to obtain passive fit in implant-supported prostheses. The aim of this study was to compare the accuracy of three impression materials in both parallel and nonparallel implant positions. Materials and Methods: In this experimental study, two partial dentate maxillary acrylic models with four implant analogues in canines and lateral incisors areas were used. One model was simulating the parallel condition and the other nonparallel one, in which implants were tilted 30° bucally and 20° in either mesial or distal directions. Thirty stone casts were made from each model using polyether (Impregum), additional silicone (Monopren) and vinyl siloxanether (Identium), with open tray technique. The distortion values in three-dimensions (X, Y and Z-axis) were measured by coordinate measuring machine. Two-way analysis of variance (ANOVA), one-way ANOVA and Tukey tests were used for data analysis (α = 0.05). Results: Under parallel condition, all the materials showed comparable, accurate casts (P = 0.74). In the presence of angulated implants, while Monopren showed more accurate results compared to Impregum (P = 0.01), Identium yielded almost similar results to those produced by Impregum (P = 0.27) and Monopren (P = 0.26). Conclusion: Within the limitations of this study, in parallel conditions, the type of impression material cannot affect the accuracy of the implant impressions; however, in nonparallel conditions, polyvinyl siloxane is shown to be a better choice, followed by vinyl siloxanether and polyether respectively. PMID:26288620
Dynamic Load Balancing Strategies for Parallel Reacting Flow Simulations
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Meneses, Esteban; Givi, Peyman
2014-11-01
Load balancing in parallel computing aims at distributing the work as evenly as possible among the processors. This is a critical issue in the performance of parallel, time accurate, flow simulators. The constraint of time accuracy requires that all processes must be finished with their calculation for a given time step before any process can begin calculation of the next time step. Thus, an irregularly balanced compute load will result in idle time for many processes for each iteration and thus increased walltimes for calculations. Two existing, dynamic load balancing approaches are applied to the simplified case of a partially stirred reactor for methane combustion. The first is Zoltan, a parallel partitioning, load balancing, and data management library developed at the Sandia National Laboratories. The second is Charm++, which is its own machine independent parallel programming system developed at the University of Illinois at Urbana-Champaign. The performance of these two approaches is compared, and the prospects for their application to full 3D, reacting flow solvers is assessed.
Numerical computation on massively parallel hypercubes. [Connection machine
McBryan, O.A.
1986-01-01
We describe numerical computations on the Connection Machine, a massively parallel hypercube architecture with 65,536 single-bit processors and 32 Mbytes of memory. A parallel extension of COMMON LISP, provides access to the processors and network. The rich software environment is further enhanced by a powerful virtual processor capability, which extends the degree of fine-grained parallelism beyond 1,000,000. We briefly describe the hardware and indicate the principal features of the parallel programming environment. We then present implementations of SOR, multigrid and pre-conditioned conjugate gradient algorithms for solving partial differential equations on the Connection Machine. Despite the lack of floating point hardware, computation rates above 100 megaflops have been achieved in PDE solution. Virtual processors prove to be a real advantage, easing the effort of software development while improving system performance significantly. The software development effort is also facilitated by the fact that hypercube communications prove to be fast and essentially independent of distance. 29 refs., 4 figs.
Toward an automated parallel computing environment for geosciences
NASA Astrophysics Data System (ADS)
Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping
2007-08-01
Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.
Task parallelism and high-performance languages
Foster, I.
1996-03-01
The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages.
Parallel Symmetric Eigenvalue Problem Solvers
2015-05-01
Park, NC 27709-2211 Trace minimization, saddle-point problems, lowest eigenpairs, sampling the spectrum REPORT DOCUMENTATION PAGE 11. SPONSOR...eigenpairs or seeking a large number of eigenpairs in any interval of the spectrum . Numerical experiments demonstrate clearly that Trace Minimization is a...the Fiedler vector . . . . . . . . . . . . . . . . . 59 6.1.5 Computing interior eigenpairs via spectrum folding . . . . . 60 6.1.6 My parallel
CSRD Parallel Service Machine Enhancement
1989-11-30
problems by C. Kamath and A. Sameh in [Kama86]. Several theoretical and numerical results have been obtained for RP methods in the last year. Among them...145-163, 19(1979). [BrSa88] R. Bramley and A. Sameh , A Robust Parallel Solver for Block Tridiagonal Systems, CSRD Technical Report #806, Center for...Supercomputing Research and Development, University of Illinois - Urbana, 1988. [BrSa89a] R. Bramley and A. Sameh , Row Projection Methods for Large
National Combustion Code: Parallel Performance
NASA Technical Reports Server (NTRS)
Babrauckas, Theresa
2001-01-01
This report discusses the National Combustion Code (NCC). The NCC is an integrated system of codes for the design and analysis of combustion systems. The advanced features of the NCC meet designers' requirements for model accuracy and turn-around time. The fundamental features at the inception of the NCC were parallel processing and unstructured mesh. The design and performance of the NCC are discussed.
Parallelism in Manipulator Dynamics. Revision.
1983-12-01
excessive, and a VLSI implementation architecutre is suggested. We indicate possible appli- cations to incorporating dynamical considerations into...Inverse Dynamics problem. It investigates the high degree of parallelism inherent in the computations , and presents two "mathematically exact" formulations...and a 3 b Cases ............. ... 109 5 .9-- i 0. OVERVIEW The Inverse Dynamics problem consists (loosely) of computing the motor torques necessary to
Lightweight Specifications for Parallel Correctness
2012-12-05
this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204... George Necula Professor David Wessel Fall 2012 1 Abstract Lightweight Specifications for Parallel Correctness by Jacob Samuels Burnim Doctor of Philosophy...enthusiasm and endless flow of ideas, and for his keen research sense. I would also like to thank George Necula for chairing my qualifying exam committee and
Parallel strategies for SAR processing
NASA Astrophysics Data System (ADS)
Segoviano, Jesus A.
2004-12-01
This article proposes a series of strategies for improving the computer process of the Synthetic Aperture Radar (SAR) signal treatment, following the three usual lines of action to speed up the execution of any computer program. On the one hand, it is studied the optimization of both, the data structures and the application architecture used on it. On the other hand it is considered a hardware improvement. For the former, they are studied both, the usually employed SAR process data structures, proposing the use of parallel ones and the way the parallelization of the algorithms employed on the process is implemented. Besides, the parallel application architecture classifies processes between fine/coarse grain. These are assigned to individual processors or separated in a division among processors, all of them in their corresponding architectures. For the latter, it is studied the hardware employed on the computer parallel process used in the SAR handling. The improvement here refers to several kinds of platforms in which the SAR process is implemented, shared memory multicomputers, and distributed memory multiprocessors. A comparison between them gives us some guidelines to follow in order to get a maximum throughput with a minimum latency and a maximum effectiveness with a minimum cost, all together with a limited complexness. It is concluded and described, that the approach consisting of the processing of the algorithms in a GNU/Linux environment, together with a Beowulf cluster platform offers, under certain conditions, the best compromise between performance and cost, and promises the major development in the future for the Synthetic Aperture Radar computer power thirsty applications in the next years.
Parallel Power Grid Simulation Toolkit
Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol
2015-09-14
ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.
Data Communication in Parallel Architectures.
1986-03-01
proposed by Sameh 1141. (D) Two- and Three-Dimensional Mesh Connected arrays. Two-dimensional and three-dimensional arrays are popular among partial...Dept Computer Science. Vale University, 1985 14 A.H. Sameh , Solving the Linear Least Squares Problem on a Linear Array of Processors. Pur- due
Highly parallel sparse Cholesky factorization
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Parallelism in integrated fluidic circuits
NASA Astrophysics Data System (ADS)
Bousse, Luc J.; Kopf-Sill, Anne R.; Parce, J. W.
1998-04-01
Many research groups around the world are working on integrated microfluidics. The goal of these projects is to automate and integrate the handling of liquid samples and reagents for measurement and assay procedures in chemistry and biology. Ultimately, it is hoped that this will lead to a revolution in chemical and biological procedures similar to that caused in electronics by the invention of the integrated circuit. The optimal size scale of channels for liquid flow is determined by basic constraints to be somewhere between 10 and 100 micrometers . In larger channels, mixing by diffusion takes too long; in smaller channels, the number of molecules present is so low it makes detection difficult. At Caliper, we are making fluidic systems in glass chips with channels in this size range, based on electroosmotic flow, and fluorescence detection. One application of this technology is rapid assays for drug screening, such as enzyme assays and binding assays. A further challenge in this area is to perform multiple functions on a chip in parallel, without a large increase in the number of inputs and outputs. A first step in this direction is a fluidic serial-to-parallel converter. Fluidic circuits will be shown with the ability to distribute an incoming serial sample stream to multiple parallel channels.
Parallel Environment for Quantum Computing
NASA Astrophysics Data System (ADS)
Tabakin, Frank; Diaz, Bruno Julia
2009-03-01
To facilitate numerical study of noise and decoherence in QC algorithms,and of the efficacy of error correction schemes, we have developed a Fortran 90 quantum computer simulator with parallel processing capabilities. It permits rapid evaluation of quantum algorithms for a large number of qubits and for various ``noise'' scenarios. State vectors are distributed over many processors, to employ a large number of qubits. Parallel processing is implemented by the Message-Passing Interface protocol. A description of how to spread the wave function components over many processors, along with how to efficiently describe the action of general one- and two-qubit operators on these state vectors will be delineated.Grover's search and Shor's factoring algorithms with noise will be discussed as examples. A major feature of this work is that concurrent versions of the algorithms can be evaluated with each version subject to diverse noise effects, corresponding to solving a stochastic Schrodinger equation. The density matrix for the ensemble of such noise cases is constructed using parallel distribution methods to evaluate its associated entropy. Applications of this powerful tool is made to delineate the stability and correction of QC processes using Hamiltonian based dynamics.
Parallel processing of genomics data
NASA Astrophysics Data System (ADS)
Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario
2016-10-01
The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Parallel Markov chain Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Ren, Ruichao; Orkoulas, G.
2007-06-01
With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.
Parallel Markov chain Monte Carlo simulations.
Ren, Ruichao; Orkoulas, G
2007-06-07
With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.
Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring
NASA Technical Reports Server (NTRS)
Padovan, Joe; Kwang, Abel
1994-01-01
This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
Lu Liuyan Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.
2009-08-20
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f{sub m}pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
NASA Astrophysics Data System (ADS)
Lu, Liuyan; Lantz, Steven R.; Ren, Zhuyin; Pope, Stephen B.
2009-08-01
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f_mpi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Parallel molecular dynamics: Communication requirements for massively parallel machines
NASA Astrophysics Data System (ADS)
Taylor, Valerie E.; Stevens, Rick L.; Arnold, Kathryn E.
1995-05-01
Molecular mechanics and dynamics are becoming widely used to perform simulations of molecular systems from large-scale computations of materials to the design and modeling of drug compounds. In this paper we address two major issues: a good decomposition method that can take advantage of future massively parallel processing systems for modest-sized problems in the range of 50,000 atoms and the communication requirements needed to achieve 30 to 40% efficiency on MPPs. We analyzed a scalable benchmark molecular dynamics program executing on the Intel Touchstone Deleta parallelized with an interaction decomposition method. Using a validated analytical performance model of the code, we determined that for an MPP with a four-dimensional mesh topology and 400 MHz processors the communication startup time must be at most 30 clock cycles and the network bandwidth must be at least 2.3 GB/s. This configuration results in 30 to 40% efficiency of the MPP for a problem with 50,000 atoms executing on 50,000 processors.
Parallelizing alternating direction implicit solver on GPUs
USDA-ARS?s Scientific Manuscript database
We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...
Parallel computational fluid dynamics - Implementations and results
NASA Technical Reports Server (NTRS)
Simon, Horst D. (Editor)
1992-01-01
The present volume on parallel CFD discusses implementations on parallel machines, numerical algorithms for parallel CFD, and performance evaluation and computer science issues. Attention is given to a parallel algorithm for compressible flows through rotor-stator combinations, a massively parallel Euler solver for unstructured grids, a fast scheme to analyze 3D disk airflow on a parallel computer, and a block implicit multigrid solution of the Euler equations. Topics addressed include a 3D ADI algorithm on distributed memory multiprocessors, clustered element-by-element computations for fluid flow, hypercube FFT and the Fourier pseudospectral method, and an investigation of parallel iterative algorithms for CFD. Also discussed are fluid dynamics using interface methods on parallel processors, sorting for particle flow simulation on the connection machine, a large grain mapping method, and efforts toward a Teraflops capability for CFD.
Implementing clips on a parallel computer
NASA Technical Reports Server (NTRS)
Riley, Gary
1987-01-01
The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.
Global Arrays Parallel Programming Toolkit
Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarría-Miranda, Daniel
2011-01-01
The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving
High Performance Parallel Computational Nanotechnology
NASA Technical Reports Server (NTRS)
Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to
High Performance Parallel Computational Nanotechnology
NASA Technical Reports Server (NTRS)
Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to
Parallel machine architecture and compiler design facilities
NASA Technical Reports Server (NTRS)
Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex
1990-01-01
The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.
Force user's manual: A portable, parallel FORTRAN
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.
1990-01-01
The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.
Parallel multi-computers and artificial intelligence
Uhr, L.
1986-01-01
This book examines the present state and future direction of multicomputer parallel architectures for artificial intelligence research and development of artificial intelligence applications. The book provides a survey of the large variety of parallel architectures, describing the current state of the art and suggesting promising architectures to produce artificial intelligence systems such as intelligence systems such as intelligent robots. This book integrates artificial intelligence and parallel processing research areas and discusses parallel processing from the viewpoint of artificial intelligence.
The electron signature of parallel electric fields
NASA Astrophysics Data System (ADS)
Burch, J. L.; Gurgiolo, C.; Menietti, J. D.
1990-12-01
Dynamics Explorer I High-Altitude Plasma Instrument electron data are presented. The electron distribution functions have characteristics expected of a region of parallel electric fields. The data are consistent with previous test-particle simulations for observations within parallel electric field regions which indicate that typical hole, bump, and loss-cone electron distributions, which contain evidence for parallel potential differences both above and below the point of observation, are not expected to occur in regions containing actual parallel electric fields.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
High Productivity Implantation ''PARTIAL IMPLANT''
Hino, Masayoshi; Miyamoto, Naoki; Sakai, Shigeki; Matsumoto, Takao
2008-11-03
The patterned ion implantation 'PARTIAL IMPLANT' has been developed as a productivity improvement tool. The Partial Implant can form several different ion dose areas on the wafer surface by controlling the speed of wafer moving and the stepwise rotation of twist axis. The Partial Implant system contains two implant methods. One method is 'DIVIDE PARTIAL IMPLANT', that is aimed at reducing the consumption of the wafer. The Divide Partial Implant evenly divides dose area on one wafer surface into two or three different dose part. Any dose can be selected in each area. So the consumption of the wafer for experimental implantation can be reduced. The second method is 'RING PARTIAL IMPLANT' that is aimed at improving yield by correcting electrical characteristic of devices. The Ring Partial Implant can form concentric ion dose areas. The dose of wafer external area can be selected to be within plus or minus 30% of dose of wafer central area. So the electrical characteristic of devices can be corrected by controlling dose at edge side on the wafer.
Heart Fibrillation and Parallel Supercomputers
NASA Technical Reports Server (NTRS)
Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.
1997-01-01
The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.
Scheduling Tasks In Parallel Processing
NASA Technical Reports Server (NTRS)
Price, Camille C.; Salama, Moktar A.
1989-01-01
Algorithms sought to minimize time and cost of computation. Report describes research on scheduling of computations tasks in system of multiple identical data processors operating in parallel. Computational intractability requires use of suboptimal heuristic algorithms. First algorithm called "list heuristic", variation of classical list scheduling. Second algorithm called "cluster heuristic" applied to tightly coupled tasks and consists of four phases. Third algorithm called "exchange heuristic", iterative-improvement algorithm beginning with initial feasible assignment of tasks to processors and periods of time. Fourth algorithm is iterative one for optimal assignment of tasks and based on concept called "simulated annealing" because of mathematical resemblance to aspects of physical annealing processes.
True Shear Parallel Plate Viscometer
NASA Technical Reports Server (NTRS)
Ethridge, Edwin; Kaukler, William
2010-01-01
This viscometer (which can also be used as a rheometer) is designed for use with liquids over a large temperature range. The device consists of horizontally disposed, similarly sized, parallel plates with a precisely known gap. The lower plate is driven laterally with a motor to apply shear to the liquid in the gap. The upper plate is freely suspended from a double-arm pendulum with a sufficiently long radius to reduce height variations during the swing to negligible levels. A sensitive load cell measures the shear force applied by the liquid to the upper plate. Viscosity is measured by taking the ratio of shear stress to shear rate.
Scalable Parallel Algebraic Multigrid Solvers
Bank, R; Lu, S; Tong, C; Vassilevski, P
2005-03-23
The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.
Parallel Assembly of LIGA Components
Christenson, T.R.; Feddema, J.T.
1999-03-04
In this paper, a prototype robotic workcell for the parallel assembly of LIGA components is described. A Cartesian robot is used to press 386 and 485 micron diameter pins into a LIGA substrate and then place a 3-inch diameter wafer with LIGA gears onto the pins. Upward and downward looking microscopes are used to locate holes in the LIGA substrate, pins to be pressed in the holes, and gears to be placed on the pins. This vision system can locate parts within 3 microns, while the Cartesian manipulator can place the parts within 0.4 microns.
Parallel BLAST on split databases.
Mathog, David R
2003-09-22
BLAST programs often run on large SMP machines where multiple threads can work simultaneously and there is enough memory to cache the databases between program runs. A group of programs is described which allows comparable performance to be achieved with a Beowulf configuration in which no node has enough memory to cache a database but the cluster as an aggregate does. To achieve this result, databases are split into equal sized pieces and stored locally on each node. Each query is run on all nodes in parallel and the resultant BLAST output files from all nodes merged to yield the final output. Source code is available from ftp://saf.bio.caltech.edu/
Heart Fibrillation and Parallel Supercomputers
NASA Technical Reports Server (NTRS)
Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.
1997-01-01
The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.
Coherent-mode decomposition of partially polarized, partially coherent sources.
Gori, Franco; Santarsiero, Massimo; Simon, Raja; Piquero, Gemma; Borghi, Riccardo; Guattari, Giorgio
2003-01-01
It is shown that any partially polarized, partially coherent source can be expressed in terms of a suitable superposition of transverse coherent modes with orthogonal polarization states. Such modes are determined through the solution of a system of two coupled integral equations. An example, for which the modal decomposition is obtained in closed form in terms of fully linearly polarized Hermite Gaussian modes, is given.
Experimental generating the partially coherent and partially polarized electromagnetic source.
Ostrovsky, Andrey S; Rodríguez-Zurita, Gustavo; Meneses-Fabián, Cruz; Olvera-Santamaría, Miguel A; Rickenstorff-Parrao, Carolina
2010-06-07
The technique for generating the partially coherent and partially polarized source starting from the completely coherent and completely polarized laser source is proposed and analyzed. This technique differs from the known ones by the simplicity of its physical realization. The efficiency of the proposed technique is illustrated with the results of physical experiment in which an original technique for characterizing the coherence and polarization properties of the generated source is employed.
Parallel Processing at the High School Level.
ERIC Educational Resources Information Center
Sheary, Kathryn Anne
This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…
Parallel Computing Using Web Servers and "Servlets".
ERIC Educational Resources Information Center
Lo, Alfred; Bloor, Chris; Choi, Y. K.
2000-01-01
Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
Reservoir Thermal Recover Simulation on Parallel Computers
NASA Astrophysics Data System (ADS)
Li, Baoyan; Ma, Yuanle
The rapid development of parallel computers has provided a hardware background for massive refine reservoir simulation. However, the lack of parallel reservoir simulation software has blocked the application of parallel computers on reservoir simulation. Although a variety of parallel methods have been studied and applied to black oil, compositional, and chemical model numerical simulations, there has been limited parallel software available for reservoir simulation. Especially, the parallelization study of reservoir thermal recovery simulation has not been fully carried out, because of the complexity of its models and algorithms. The authors make use of the message passing interface (MPI) standard communication library, the domain decomposition method, the block Jacobi iteration algorithm, and the dynamic memory allocation technique to parallelize their serial thermal recovery simulation software NUMSIP, which is being used in petroleum industry in China. The parallel software PNUMSIP was tested on both IBM SP2 and Dawn 1000A distributed-memory parallel computers. The experiment results show that the parallelization of I/O has great effects on the efficiency of parallel software PNUMSIP; the data communication bandwidth is also an important factor, which has an influence on software efficiency. Keywords: domain decomposition method, block Jacobi iteration algorithm, reservoir thermal recovery simulation, distributed-memory parallel computer
Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism
ERIC Educational Resources Information Center
Agarwal, Mayank
2009-01-01
The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…
Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism
ERIC Educational Resources Information Center
Agarwal, Mayank
2009-01-01
The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…
Coordination in serial-parallel image processing
NASA Astrophysics Data System (ADS)
Wójcik, Waldemar; Dubovoi, Vladymyr M.; Duda, Marina E.; Romaniuk, Ryszard S.; Yesmakhanova, Laura; Kozbakova, Ainur
2015-12-01
Serial-parallel systems used to convert the image. The control of their work results with the need to solve coordination problem. The paper summarizes the model of coordination of resource allocation in relation to the task of synchronizing parallel processes; the genetic algorithm of coordination developed, its adequacy verified in relation to the process of parallel image processing.
Zhdanov, V. M. Stepanenko, A. A.
2013-12-15
The influence of resonant charge exchange for ion-atom interaction on the viscosity of partially ionized plasma embedded in the magnetic field is investigated. The general system of equations used to derive the viscosity coefficients for an arbitrary plasma component in the 21-moment approximation of Grad’s method is presented. The expressions for the coefficients of total and partial viscosities of a multicomponent partially ionized plasma in the magnetic field are obtained. As an example, the coefficients of the parallel and transverse viscosities for the ionic and neutral components of the partially ionized hydrogen plasma are calculated. It is shown that the account for resonant charge exchange can lead to a substantial change of the parallel and transverse viscosity of the plasma components in the region of low degrees of ionization on the order of 0.1.
Xyce parallel electronic simulator design.
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
A massively asynchronous, parallel brain
Zeki, Semir
2015-01-01
Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871
Parallel job-scheduling algorithms
Rodger, S.H.
1989-01-01
In this thesis, we consider solving job scheduling problems on the CREW PRAM model. We show how to adapt Cole's pipeline merge technique to yield several efficient parallel algorithms for a number of job scheduling problems and one optimal parallel algorithm for the following job scheduling problem: Given a set of n jobs defined by release times, deadlines and processing times, find a schedule that minimizes the maximum lateness of the jobs and allows preemption when the jobs are scheduled to run on one machine. In addition, we present the first NC algorithm for the following job scheduling problem: Given a set of n jobs defined by release times, deadlines and unit processing times, determine if there is a schedule of jobs on one machine, and calculate the schedule if it exists. We identify the notion of a canonical schedule, which is the type of schedule our algorithm computes if there is a schedule. Our algorithm runs in O((log n){sup 2}) time and uses O(n{sup 2}k{sup 2}) processors, where k is the minimum number of distinct offsets of release times or deadlines.
NASA Astrophysics Data System (ADS)
Galo, J. R.; Albarreal, I. I.; Calzada, M. C.; Cruz, J. L.; Fernández-Cara, E.; Marín, M.
2008-12-01
For the solution of elliptic problems, fractional step methods and in particular alternating directions (ADI) methods are iterative methods where fractional steps are sequential. Therefore, they only accept parallelization at low level. In [T. Lu, P. Neittaanmäki, X.C. Tai, A parallel splitting-up method for partial differential equations and its applications to Navier-Stokes equations, RAIRO Modél. Math. Anal. Numér. 26 (6) (1992) 673-708], Lu et al. proposed a method where the fractional steps can be performed in parallel. We can thus speak of parallel fractional step (PFS) methods and, in particular, simultaneous directions (SDI) methods. In this paper, we perform a detailed analysis of the convergence and optimization of PFS and SDI methods, complementing what was done in [T. Lu, P. Neittaanmäki, X.C. Tai, A parallel splitting-up method for partial differential equations and its applications to Navier-Stokes equations, RAIRO Modél. Math. Anal. Numér. 26 (6) (1992) 673-708]. We describe the behavior of the method and we specify the good choice of the parameters. We also study the efficiency of the parallelization. Some 2D, 3D and high-dimensional tests confirm our results.
Partial-Payload Support Structure
NASA Technical Reports Server (NTRS)
Mitchell, R.; Freeman, M.
1984-01-01
Partial-payload support structure (PPSS) is modular, bridge like structure supporting experiments weighing up to 2 tons. PPSS handles such experiments more economically than standard Spacelab pallet system.
Implementation and performance of parallelized elegant.
Wang, Y.; Borland, M.; Accelerator Systems Division
2008-01-01
The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.
Partial arthrodeses of the wrist.
Marcuzzi, A; Cristiani, G; Castagnini, L; Castagnetti, C; Caroli, A
1995-01-01
The authors report 16 cases of partial arthrodeses of the wrist for the treatment of Kienboeck's disease, pseudarthrosis of the scaphoid, rotatory subluxation of the scaphoid, rheumatoid arthritis, etc. Based on the good results obtained (76.6%) the authors believe that partial arthrodeses constitute the type of treatment indicated for the treatment of pathologies that involve only some of the carpal bones, and they also emphasize that this type of surgery represents a valid alternative to total arthrodesis of the wrist.
Partial metamorphosis in Anomia simplex.
LOOSANOFF, V L
1961-06-30
Many larvae of the common bivalve, Anomia simplex, when grown under laboratory conditions, exhibited a partial metamorphosis. They attained a considerably larger size than that at which larvae normally set. The partial metamorphosis was also characterized by the disappearance of velum, but the retention of a functional foot. Moreover, these organisms were not able to attach to the substratum, and their shells showed a distinct demarcation line between larval and adult portions
Partitioning in parallel processing of production systems
Oflazer, K.
1987-01-01
This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpreter with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.
Parallel approach to incorporating face image information into dialogue processing
NASA Astrophysics Data System (ADS)
Ren, Fuji
2000-10-01
There are many kinds of so-called irregular expressions in natural dialogues. Even if the content of a conversation is the same in words, different meanings can be interpreted by a person's feeling or face expression. To have a good understanding of dialogues, it is required in a flexible dialogue processing system to infer the speaker's view properly. However, it is difficult to obtain the meaning of the speaker's sentences in various scenes using traditional methods. In this paper, a new approach for dialogue processing that incorporates information from the speaker's face is presented. We first divide conversation statements into several simple tasks. Second, we process each simple task using an independent processor. Third, we employ some speaker's face information to estimate the view of the speakers to solve ambiguities in dialogues. The approach presented in this paper can work efficiently, because independent processors run in parallel, writing partial results to a shared memory, incorporating partial results at appropriate points, and complementing each other. A parallel algorithm and a method for employing the face information in a dialogue machine translation will be discussed, and some results will be included in this paper.
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1991-01-01
The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.
Parallel computation of Gaussian processes
NASA Astrophysics Data System (ADS)
Preuss, R.; von Toussaint, U.
2017-06-01
Within the Bayesian framework we utilize Gaussian processes for parametric studies of long running computer codes. Since the simulations are expensive it is necessary to exploit the computational budget in the best possible manner. Employing the sum over variances - being indicators for the quality of the fit - as the utility function we established an optimized and automated sequential parameter selection procedure. However, often it is also desirable to utilize the parallel running capabilities of present computer technology and abandon the sequential parameter selection for a faster overall turn-around time (wall-clock time). The paper proposes to achieve this by marginalizing over the expected outcomes at optimized test points in order to set up a pool of starting values for batch execution.
Device for balancing parallel strings
Mashikian, Matthew S.
1985-01-01
A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.
Information hiding in parallel programs
Foster, I.
1992-01-30
A fundamental principle in program design is to isolate difficult or changeable design decisions. Application of this principle to parallel programs requires identification of decisions that are difficult or subject to change, and the development of techniques for hiding these decisions. We experiment with three complex applications, and identify mapping, communication, and scheduling as areas in which decisions are particularly problematic. We develop computational abstractions that hide such decisions, and show that these abstractions can be used to develop elegant solutions to programming problems. In particular, they allow us to encode common structures, such as transforms, reductions, and meshes, as software cells and templates that can reused in different applications. An important characteristic of these structures is that they do not incorporate mapping, communication, or scheduling decisions: these aspects of the design are specified separately, when composing existing structures to form applications. This separation of concerns allows the same cells and templates to be reused in different contexts.
Parallel spinors on flat manifolds
NASA Astrophysics Data System (ADS)
Sadowski, Michał
2006-05-01
Let p(M) be the dimension of the vector space of parallel spinors on a closed spin manifold M. We prove that every finite group G is the holonomy group of a closed flat spin manifold M(G) such that p(M(G))>0. If the holonomy group Hol(M) of M is cyclic, then we give an explicit formula for p(M) another than that given in [R.J. Miatello, R.A. Podesta, The spectrum of twisted Dirac operators on compact flat manifolds, Trans. Am. Math. Soc., in press]. We answer the question when p(M)>0 if Hol(M) is a cyclic group of prime order or dimM≤4.
Hybrid Optimization Parallel Search PACKage
2009-11-10
HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework provides a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.
Parallel Performance Characterization of Columbia
NASA Technical Reports Server (NTRS)
Biswas, Rupak
2004-01-01
Using a collection of benchmark problems of increasing levels of realism and computational effort, we will characterize the strengths and limitations of the 10,240 processor Columbia system to deliver supercomputing value to application scientists. Scientists need to be able to determine if and how they can utilize Columbia to carry extreme workloads, either in terms of ultra-large applications that cannot be run otherwise (capability), or in terms of very large ensembles of medium-scale applications to populate response matrices (capacity). We select existing application benchmarks that scale from a small number of processors to the entire machine, and that highlight different issues in running supercomputing-calss applicaions, such as the various types of memory access, file I/O, inter- and intra-node communications and parallelization paradigms. http://www.nas.nasa.gov/Software/NPB/
Multiparameter Parallel Search Branch Switching
NASA Astrophysics Data System (ADS)
Henderson, Michael E.
A continuation method (sometimes called path following) is a way to compute solution curves of a nonlinear system of equations with a parameter. We derive a simple algorithm for branch switching at bifurcation points for multiple parameter continuation, where surfaces bifurcate along singular curves on a surface. It is a generalization of the parallel search technique used in the continuation code AUTO, and avoids the need for second derivatives and a full analysis of the bifurcation point. The one parameter case is special. While the generalization is not difficult, it is nontrivial, and the geometric interpretation may be of some interest. An additional tangent calculation at a point near the singular point is used to estimate the tangent to the singular set.
Embodied and Distributed Parallel DJing.
Cappelen, Birgitta; Andersson, Anders-Petter
2016-01-01
Everyone has a right to take part in cultural events and activities, such as music performances and music making. Enforcing that right, within Universal Design, is often limited to a focus on physical access to public areas, hearing aids etc., or groups of persons with special needs performing in traditional ways. The latter might be people with disabilities, being musicians playing traditional instruments, or actors playing theatre. In this paper we focus on the innovative potential of including people with special needs, when creating new cultural activities. In our project RHYME our goal was to create health promoting activities for children with severe disabilities, by developing new musical and multimedia technologies. Because of the users' extreme demands and rich contribution, we ended up creating both a new genre of musical instruments and a new art form. We call this new art form Embodied and Distributed Parallel DJing, and the new genre of instruments for Empowering Multi-Sensorial Things.
Parallel network simulations with NEURON.
Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L
2006-10-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.
Parallel Network Simulations with NEURON
Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.
2009-01-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488
Parallel computing in enterprise modeling.
Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.
2008-08-01
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Partial-depth lock-release flows
NASA Astrophysics Data System (ADS)
Khodkar, M. A.; Nasr-Azadani, M. M.; Meiburg, E.
2017-06-01
We extend the vorticity-based modeling concept for stratified flows introduced by Borden and Meiburg [Z. Borden and E. Meiburg, J. Fluid Mech. 726, R1 (2013), 10.1017/jfm.2013.239] to unsteady flow fields that cannot be rendered quasisteady by a change of reference frames. Towards this end, we formulate a differential control volume balance for the conservation of mass and vorticity in the fully unsteady parts of the flow, which we refer to as the differential vorticity model. We furthermore show that with the additional assumptions of locally uniform parallel flow within each layer, the unsteady vorticity modeling approach reproduces the familiar two-layer shallow-water equations. To evaluate its accuracy, we then apply the vorticity model approach to partial-depth lock-release flows. Consistent with the shallow water analysis of Rottman and Simpson [J. W. Rottman and J. E. Simpson, J. Fluid Mech. 135, 95 (1983), 10.1017/S0022112083002979], the vorticity model demonstrates the formation of a quasisteady gravity current front, a fully unsteady expansion wave, and a propagating bore that is present only if the lock depth exceeds half the channel height. When this bore forms, it travels with a velocity that does not depend on the lock height and the interface behind it is always at half the channel depth. We demonstrate that such a bore is energy conserving. The differential vorticity model gives predictions for the height and velocity of the gravity current and the bore, as well as for the propagation velocities of the edges of the expansion fan, as a function of the lock height. All of these predictions are seen to be in good agreement with the direct numerical simulation data and, where available, with experimental results. An energy analysis shows lock-release flows to be energy conserving only for the case of a full lock, whereas they are always dissipative for partial-depth locks.
Domain decomposition methods for the parallel computation of reacting flows
NASA Technical Reports Server (NTRS)
Keyes, David E.
1988-01-01
Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1987-01-01
The development of numerical methods and software tools for parallel processors can be aided through the use of a hardware test-bed. The test-bed architecture must be flexible enough to support investigations into architecture-algorithm interactions. One way to implement a test-bed is to use a commercial parallel processor. Unfortunately, most commercial parallel processors are fixed in their interconnection and/or processor architecture. In this paper, we describe a modified n cube architecture, called the hypercluster, which is a superset of many other processor and interconnection architectures. The hypercluster is intended to support research into parallel processing of computational fluid and structural mechanics problems which may require a number of different architectural configurations. An example of how a typical partial differential equation solution algorithm maps on to the hypercluster is given.
[Parallel PLS algorithm using MapReduce and its aplication in spectral modeling].
Yang, Hui-Hua; Du, Ling-Ling; Li, Ling-Qiao; Tang, Tian-Biao; Guo, Tuo; Liang, Qiong-Lin; Wang, Yi-Ming; Luo, Guo-An
2012-09-01
Partial least squares (PLS) has been widely used in spectral analysis and modeling, and it is computation-intensive and time-demanding when dealing with massive data To solve this problem effectively, a novel parallel PLS using MapReduce is proposed, which consists of two procedures, the parallelization of data standardizing and the parallelization of principal component computing. Using NIR spectral modeling as an example, experiments were conducted on a Hadoop cluster, which is a collection of ordinary computers. The experimental results demonstrate that the parallel PLS algorithm proposed can handle massive spectra, can significantly cut down the modeling time, and gains a basically linear speedup, and can be easily scaled up.
Autocalibrating Tiled Projectors on Piecewise Smooth Vertically Extruded Surfaces.
Sajadi, Behzad; Majumder, Aditi
2011-09-01
In this paper, we present a novel technique to calibrate multiple casually aligned projectors on fiducial-free piecewise smooth vertically extruded surfaces using a single camera. Such surfaces include cylindrical displays and CAVEs, common in immersive virtual reality systems. We impose two priors to the display surface. We assume the surface is a piecewise smooth vertically extruded surface for which the aspect ratio of the rectangle formed by the four corners of the surface is known and the boundary is visible and segmentable. Using these priors, we can estimate the display's 3D geometry and camera extrinsic parameters using a nonlinear optimization technique from a single image without any explicit display to camera correspondences. Using the estimated camera and display properties, the intrinsic and extrinsic parameters of each projector are recovered using a single projected pattern seen by the camera. This in turn is used to register the images on the display from any arbitrary viewpoint making it appropriate for virtual reality systems. The fast convergence and robustness of this method is achieved via a novel dimension reduction technique for camera parameter estimation and a novel deterministic technique for projector property estimation. This simplicity, efficiency, and robustness of our method enable several coveted features for nonplanar projection-based displays. First, it allows fast recalibration in the face of projector, display or camera movements and even change in display shape. Second, this opens up, for the first time, the possibility of allowing multiple projectors to overlap on the corners of the CAVE-a popular immersive VR display system. Finally, this opens up the possibility of easily deploying multiprojector displays on aesthetic novel shapes for edutainment and digital signage applications.
How to: applying and interpreting the SWAT autocalibration tools
USDA-ARS?s Scientific Manuscript database
Watershed-level modelers have expressed a need, through ongoing discussions within the USDA-ARS Conservation Effects Assessment Program and the broader international research community, for a better understanding of uncertainty related to hard-to-measure input parameters and to the remaining interna...
Online camera-gyroscope autocalibration for cell phones.
Jia, Chao; Evans, Brian L
2014-12-01
The gyroscope is playing a key role in helping estimate 3D camera rotation for various vision applications on cell phones, including video stabilization and feature tracking. Successful fusion of gyroscope and camera data requires that the camera, gyroscope, and their relative pose to be calibrated. In addition, the timestamps of gyroscope readings and video frames are usually not well synchronized. Previous paper performed camera-gyroscope calibration and synchronization offline after the entire video sequence has been captured with restrictions on the camera motion, which is unnecessarily restrictive for everyday users to run apps that directly use the gyroscope. In this paper, we propose an online method that estimates all the necessary parameters, whereas a user is capturing video. Our contributions are: 1) simultaneous online camera self-calibration and camera-gyroscope calibration based on an implicit extended Kalman filter and 2) generalization of the multiple-view coplanarity constraint on camera rotation in a rolling shutter camera model for cell phones. The proposed method is able to estimate the needed calibration and synchronization parameters online with all kinds of camera motion and can be embedded in gyro-aided applications, such as video stabilization and feature tracking. Both Monte Carlo simulation and cell phone experiments show that the proposed online calibration and synchronization method converge fast to the ground truth values.
Integrated Task and Data Parallel Programming
NASA Technical Reports Server (NTRS)
Grimshaw, A. S.
1998-01-01
This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated
Partial synchronization and partial amplitude death in mesoscale network motifs
NASA Astrophysics Data System (ADS)
Poel, Winnie; Zakharova, Anna; Schöll, Eckehard
2015-02-01
We study the interplay between network topology and complex space-time patterns and introduce a concept to analytically predict complex patterns in networks of Stuart-Landau oscillators with linear symmetric and instantaneous coupling based solely on the network topology. These patterns consist of partial amplitude death and partial synchronization and are found to exist in large variety for all undirected networks of up to 5 nodes. The underlying concept is proved to be robust with respect to frequency mismatch and can also be extended to larger networks. In addition it directly links the stability of complete in-phase synchronization to only a small subset of topological eigenvalues of a network.
Partial synchronization and partial amplitude death in mesoscale network motifs.
Poel, Winnie; Zakharova, Anna; Schöll, Eckehard
2015-02-01
We study the interplay between network topology and complex space-time patterns and introduce a concept to analytically predict complex patterns in networks of Stuart-Landau oscillators with linear symmetric and instantaneous coupling based solely on the network topology. These patterns consist of partial amplitude death and partial synchronization and are found to exist in large variety for all undirected networks of up to 5 nodes. The underlying concept is proved to be robust with respect to frequency mismatch and can also be extended to larger networks. In addition it directly links the stability of complete in-phase synchronization to only a small subset of topological eigenvalues of a network.
The ParaScope parallel programming environment
NASA Technical Reports Server (NTRS)
Cooper, Keith D.; Hall, Mary W.; Hood, Robert T.; Kennedy, Ken; Mckinley, Kathryn S.; Mellor-Crummey, John M.; Torczon, Linda; Warren, Scott K.
1993-01-01
The ParaScope parallel programming environment, developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by providing a variety of interactive program transformations that are effective in exposing parallelism. The debugging system detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis, program instrumentation, and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in FORTRAN D, a machine-independent parallel programming language intended for use with both distributed-memory and shared-memory parallel computers.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2015-11-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2014-10-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2013-10-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Preliminary results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Parallel processing considerations for image recognition tasks
NASA Astrophysics Data System (ADS)
Simske, Steven J.
2011-01-01
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
Computer-Aided Parallelizer and Optimizer
NASA Technical Reports Server (NTRS)
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
[Complex temporal partial status epilepticus].
Béquet, D; Bodiguel, E; Renard, J L; Goasguen, J
1990-01-01
The cases of non convulsive, complex, partial ailment are a cause of a confusional state. Such a case on an adult is here reported, and its clinical presentation was a unvarying forgetfulness together with elements of a frontal syndrome. The aetiology was most probably a viral meningo-encephalitis. Clinical semiology of these "EMPC" is variable, either made of partial, recurrent attacks, sometimes with automatisms, or made of a continuous, possibly fluctuating, confusional state. Attacks shown on the EEG are partial or generalized with a variable start, sometimes bilateral, even continuous or discontinuous. The onset is most often temporal or frontal. The cause is very rarely found out. The evolution is usually good, but extended deficiencies of memory are described, linked to the duration (more than 12 hours) of EMPC. Therefore, the treatment must be precocious using diazepam or phenytoin.
Landsliding in partially saturated materials
Godt, J.W.; Baum, R.L.; Lu, N.
2009-01-01
[1] Rainfall-induced landslides are pervasive in hillslope environments around the world and among the most costly and deadly natural hazards. However, capturing their occurrence with scientific instrumentation in a natural setting is extremely rare. The prevailing thinking on landslide initiation, particularly for those landslides that occur under intense precipitation, is that the failure surface is saturated and has positive pore-water pressures acting on it. Most analytic methods used for landslide hazard assessment are based on the above perception and assume that the failure surface is located beneath a water table. By monitoring the pore water and soil suction response to rainfall, we observed shallow landslide occurrence under partially saturated conditions for the first time in a natural setting. We show that the partially saturated shallow landslide at this site is predictable using measured soil suction and water content and a novel unified effective stress concept for partially saturated earth materials. Copyright 2009 by the American Geophysical Union.
Partial Priapism Treated with Pentoxifylline
Cooper, Meghan A.; Carrion, Rafael E.; Yang, Christopher
2015-01-01
ABSTRACT Main findings: A 26-year-old man suffering from partial priapism was successfully treated with a regimen including pentoxifylline, a nonspecific phosphodiesterase inhibitor that is often used to conservatively treat Peyronie's disease. Case hypothesis: Partial priapism is an extremely rare urological condition that is characterized by thrombosis within the proximal segment of a single corpus cavernosum. There have only been 36 reported cases to date. Although several factors have been associated with this unusual disorder, such as trauma or bicycle riding, the etiology is still not completely understood. Treatment is usually conservative and consists of a non-steroidal anti-inflammatory and anti-thrombotic. Promising future implications: This case report supports the utilization of pentoxifylline in patients with partial priapism due to its anti-fibrogenic and anti-thrombotic properties. PMID:26401875
Designing successful removable partial dentures.
Daher, Tony; Hall, Dan; Goodacre, Charles J
2006-03-01
In today's busy dental offices, removable partial denture design is often abdicated by dentists, both as a result of a lack of experience and consensus of design and because of educational failure on the part of dental schools. The result is delegation of the clinical design process to the lab technician. The lack of clinical data provided to the dental technician jeopardizes the quality of care. This article will focus on a logical and simple approach to this problem, making removable partial denture design simple and predictably achievable. The clinical evidence related to removable partial denture design will be described, along with a checklist to simplify the process and make it practical and applicable to everyday clinical practice.
[Indications for removable partial dentures].
van Waas, M A J
2009-11-01
Since there are many ways of preserving a natural dentition, if necessary with support of solitary crowns and fixed partial dentures, sometimes on dental implants, removable partial dentures are nowadays primarily indicated in patients with complaints about missing teeth in the aesthetic zone, which cannot be solved in another way. In addition to this, a removable partial denture is indicated in patients with extremely reduced dentitions or large or multiple edentulous areas, in patients with severe periodontitis or excessive loss of alveolar bone, in patients who are physically or emotionally vulnerable, as an interim solution on the way to edentulousness, as a temporary solution waiting for more extensive treatment and for patients who cannot afford an alternative.
Towards Distributed Memory Parallel Program Analysis
Quinlan, D; Barany, G; Panas, T
2008-06-17
This paper presents a parallel attribute evaluation for distributed memory parallel computer architectures where previously only shared memory parallel support for this technique has been developed. Attribute evaluation is a part of how attribute grammars are used for program analysis within modern compilers. Within this work, we have extended ROSE, a open compiler infrastructure, with a distributed memory parallel attribute evaluation mechanism to support user defined global program analysis required for some forms of security analysis which can not be addressed by a file by file view of large scale applications. As a result, user defined security analyses may now run in parallel without the user having to specify the way data is communicated between processors. The automation of communication enables an extensible open-source parallel program analysis infrastructure.
Parallel reactor systems for bioprocess development.
Weuster-Botz, Dirk
2005-01-01
Controlled parallel bioreactor systems allow fed-batch operation at early stages of process development. The characteristics of shaken bioreactors operated in parallel (shake flask, microtiter plate), sparged bioreactors (small-scale bubble column) and stirred bioreactors (stirred-tank, stirred column) are briefly summarized. Parallel fed-batch operation is achieved with an intermittent feeding and pH-control system for up to 16 bioreactors operated in parallel on a scale of 100 ml. Examples of the scale-up and scale-down of pH-controlled microbial fed-batch processes demonstrate that controlled parallel reactor systems can result in more effective bioprocess development. Future developments are also outlined, including units of 48 parallel stirred-tank reactors with individual pH- and pO2-controls and automation as well as liquid handling system, operated on a scale of ml.
Linearly exact parallel closures for slab geometry
NASA Astrophysics Data System (ADS)
Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun
2013-08-01
Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).
Runtime volume visualization for parallel CFD
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu
1995-01-01
This paper discusses some aspects of design of a data distributed, massively parallel volume rendering library for runtime visualization of parallel computational fluid dynamics simulations in a message-passing environment. Unlike the traditional scheme in which visualization is a postprocessing step, the rendering is done in place on each node processor. Computational scientists who run large-scale simulations on a massively parallel computer can thus perform interactive monitoring of their simulations. The current library provides an interface to handle volume data on rectilinear grids. The same design principles can be generalized to handle other types of grids. For demonstration, we run a parallel Navier-Stokes solver making use of this rendering library on the Intel Paragon XP/S. The interactive visual response achieved is found to be very useful. Performance studies show that the parallel rendering process is scalable with the size of the simulation as well as with the parallel computer.
Design considerations for parallel graphics libraries
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
Parallel computation of invariant measures
Ding, J.; Liu, Y.
1995-12-01
A parallel numerical algorithm for computing invariant measures is presented. Let I{sup N} {triple_bond} [0,1]{sup N} be the unit N-cube in R N and let S : I{sup N}{r_arrow} I{sup N} be a nonsingular transformation, that is, S is Borel-measurable and m(A) = 0 implies m(S{sup -1}(A)) = 0, where m is the Lebesgue measure. The motivation of this study is the parallel computation of an absolutely continuous invariant measure {mu} under S, that is, {mu} {much_lt} m and {mu}(A) = {mu}(S{sup -1}(A)) for all Borel sets A {contained_in} I{sup N}. It is well-known that an absolutely continuous finite invariant measure {mu} can be obtained by computing a fixed density of the Frobenius-Perron operator Ps: L{sup 1} (I{sup N}) {r_arrow} L{sup 1}(I{sup N}) associated with S which is defined by (1) {integral}{sub A} P{sub S}fdm = {integral}{sub s-1(A)} fdm, {forall}f {element_of} L{sup 1} (I{sup N}). Using any suitable discretization scheme, the infinite dimensional eigenvector problem P{sub S}f = f in L{sup 1}(I{sup N}) can be approximated by an algebraic eigenvector problem P{sub l}f{sub l} = f{sub l} in {gradient}{sub l}, where P{sub l} is a finite approximation of P{sub s} associated with a finite element subspace {gradient}{sub l} of L{sup l} (I{sup N}) {intersection} L{sup {infinity}} (I{sup N}). It has been shown that for P{sub l} arising from Galerkin`s projection principle or the Markov finite approximation principle, there always exists a eigenvector f{sub l} to P{sub l}, and that a sequence of normalized eigenvectors (f{sub l}) converges to the density of an absolutely continuous probability invariant measure {mu} for a class of piecewise C{sup 2} expanding maps of I{sup N} under which the existence of {mu} is guaranteed by Gora-Boyarsky`s theorem which is reduced to Lasota-Yorke`s thoerem when N = 1.
Partially coherent vectorial nonparaxial beams.
Duan, Kailiang; Lü, Baida
2004-10-01
Generalized vectorial Rayleigh-Sommerfeld diffraction integrals are developed for the cross-spectral-density matrices of spatially partially coherent beams. Using the Gaussian Schell-model (GSM) beam as an example, we derive the expressions for the propagation of cross-spectral-density matrices and intensity of partially coherent vectorial nonparaxial beams, and the corresponding far-field asymptotic forms, beyond the paraxial approximation. The propagation of the vectorial nonparaxial GSM beams are evaluated and analyzed. It is shown that a 3 x 3 cross-spectral-density matrix or a vector theory is required for the exact description of nonparaxial GSM beams.
Partial pressure analysis of plasmas
Dylla, H.F.
1984-11-01
The application of partial pressure analysis for plasma diagnostic measurements is reviewed. A comparison is made between the techniques of plasma flux analysis and partial pressure analysis for mass spectrometry of plasmas. Emphasis is given to the application of quadrupole mass spectrometers (QMS). The interface problems associated with the coupling of a QMS to a plasma device are discussed including: differential-pumping requirements, electromagnetic interferences from the plasma environment, the detection of surface-active species, ion source interactions, and calibration procedures. Example measurements are presented from process monitoring of glow discharge plasmas which are useful for cleaning and conditioning vacuum vessels.
Inverse Kinematics for a Parallel Myoelectric Elbow
2001-10-25
Inverse Kinematics for a Parallel Myoelectric Elbow A. Z. Escudero, Ja. Álvarez, L. Leija. Center of Research and Advanced Studies of the IPN...replacement above elbow are serial mechanisms driven by a DC motor and they include only one active articulation for the elbow [1]. Parallel mechanisms...are rather scarce [2]. The inverse kinematics model of a 3-degree of freedom parallel prosthetic elbow mechanism is reported. The mathematical
Algorithmically Specialized Parallel Architecture For Robotics
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1991-01-01
Computing system called Robot Mathematics Processor (RMP) contains large number of processor elements (PE's) connected in various parallel and serial combinations reconfigurable via software. Special-purpose architecture designed for solving diverse computational problems in robot control, simulation, trajectory generation, workspace analysis, and like. System an MIMD-SIMD parallel architecture capable of exploiting parallelism in different forms and at several computational levels. Major advantage lies in design of cells, which provides flexibility and reconfigurability superior to previous SIMD processors.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.
Parallel Architectures and Algorithms for Image Understanding
1990-05-17
A 11. TITLE (Include Security Classification) -. Parallel Architectures.’and Algorithms for Image Understanding Final Thchnical Report 12. PERSONAL...AUTHOR(S) R. Nevatia and V.K. Prasanna-Kumar 13a. TYPE OF REPORT 113b. TIME COVERED 14. DATE OF REPORT (Year, Month, Day) 15. PAGE COUNT Final Technical...identified. A generic parallel model.. of computation employing electro optical d vices is developed. Parallel techniques for image computations are
A Time-Optimal On-the-Fly Parallel Algorithm for Model Checking of Weak LTL Properties
NASA Astrophysics Data System (ADS)
Barnat, Jiří; Brim, Luboš; Ročkai, Petr
One of the most important open problems of parallel LTL model-checking is to design an on-the-fly scalable parallel algorithm with linear time complexity. Such an algorithm would give the optimality we have in sequential LTL model-checking. In this paper we give a partial solution to the problem. We propose an algorithm that has the required properties for a very rich subset of LTL properties, namely those expressible by weak Büchi automata.
A role for partial endothelial-mesenchymal transitions in angiogenesis?
Welch-Reardon, Katrina M.; Wu, Nan; Hughes, Christopher C.W.
2016-01-01
The contribution of epithelial-to-mesenchymal transitions (EMT) in both developmental and pathological conditions has been widely recognized and studied. In a parallel process, governed by a similar set of signaling and transcription factors, endothelial-to-mesenchymal transitions (EndoMT) contribute to heart valve formation and the generation of cancer-associated-fibroblasts. During angiogenic sprouting endothelial cells express many of the same genes and break down basement membrane, however they retain intercellular junctions and migrate as a connected “train” of cells rather than as individual cells. This has been termed a partial EndoMT. A key regulatory check-point determines whether cells undergo a full or a partial EMT/EndoMT, however, very little is known about how this switch is controlled. Here we discuss these developmental/pathologic pathways, with a particular focus on their role in vascular biology. PMID:25425619
Apparatus for generating partially coherent radiation
Naulleau, Patrick P.
2005-02-22
Techniques for generating partially coherent radiation and particularly for converting effectively coherent radiation from a synchrotron to partially coherent EUV radiation suitable for projection lithography.
Parallel auto-correlative statistics with VTK.
Pebay, Philippe Pierre; Bennett, Janine Camille
2013-08-01
This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Parallel Algorithms for the Exascale Era
Robey, Robert W.
2016-10-19
New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.
A parallel algorithm for global routing
NASA Technical Reports Server (NTRS)
Brouwer, Randall J.; Banerjee, Prithviraj
1990-01-01
A Parallel Hierarchical algorithm for Global Routing (PHIGURE) is presented. The router is based on the work of Burstein and Pelavin, but has many extensions for general global routing and parallel execution. Main features of the algorithm include structured hierarchical decomposition into separate independent tasks which are suitable for parallel execution and adaptive simplex solution for adding feedthroughs and adjusting channel heights for row-based layout. Alternative decomposition methods and the various levels of parallelism available in the algorithm are examined closely. The algorithm is described and results are presented for a shared-memory multiprocessor implementation.
Conformal pure radiation with parallel rays
NASA Astrophysics Data System (ADS)
Leistner, Thomas; Nurowski, Paweł
2012-03-01
We define pure radiation metrics with parallel rays to be n-dimensional pseudo-Riemannian metrics that admit a parallel null line bundle K and whose Ricci tensor vanishes on vectors that are orthogonal to K. We give necessary conditions in terms of the Weyl, Cotton and Bach tensors for a pseudo-Riemannian metric to be conformal to a pure radiation metric with parallel rays. Then, we derive conditions in terms of the tractor calculus that are equivalent to the existence of a pure radiation metric with parallel rays in a conformal class. We also give analogous results for n-dimensional pseudo-Riemannian pp-waves.
Remarks on parallel computations in MATLAB environment
NASA Astrophysics Data System (ADS)
Opalska, Katarzyna; Opalski, Leszek
2013-10-01
The paper attempts to summarize author's investigation of parallel computation capability of MATLAB environment in solving large ordinary differential equations (ODEs). Two MATLAB versions were tested and two parallelization techniques: one used multiple processors-cores, the other - CUDA compatible Graphics Processing Units (GPUs). A set of parameterized test problems was specially designed to expose different capabilities/limitations of the different variants of the parallel computation environment tested. Presented results illustrate clearly the superiority of the newer MATLAB version and, elapsed time advantage of GPU-parallelized computations for large dimensionality problems over the multiple processor-cores (with speed-up factor strongly dependent on the problem structure).
Parallel programming in Split-C
Culler, D.E.; Dusseau, A.; Goldstein, S.C.; Krishnamurthy, A.; Lumetta, S.; Eicken, T. von; Yelick, K.
1993-12-31
The authors introduce the Split-C language, a parallel extension of C intended for high performance programming on distributed memory multiprocessors, and demonstrate the use of the language in optimizing parallel programs. Split-C provides a global address space with a clear concept of locality and unusual assignment operators. These are used as tools to reduce the frequency and cost of remote access. The language allows a mixture of shared memory, message passing, and data parallel programming styles while providing efficient access to the underlying machine. They demonstrate the basic language concepts using regular and irregular parallel programs and give performance results for various stages of program optimization.
Shared-memory parallel programming in C++
Beck, B. )
1990-07-01
This paper discusses how researchers have produced a set of portable parallel-programming constructs for C, implemented in M4 macros. These parallel-programming macros are available under the name Parmacs. The Parmacs macros let one write parallel C programs for shared-memory, distributed-memory, and mixed-memory (shared and distributed) systems. They have been implemented on several machines. Because Parmacs offers useful parallel-programming features, the author has considered how these problems might be overcome or avoided. The author thought that using C++, rather than C, would address these problems adequately, and describes the C++ features exploited. The work described addresses shared-memory constructs.
Parallel computations and control of adaptive structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)
1991-01-01
The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Data-parallel algorithms for image computing
NASA Astrophysics Data System (ADS)
Carlotto, Mark J.
1990-11-01
Data-parallel algorithms for image computing on the Connection Machine are described. After a brief review of some basic programming concepts in *Lip, a parallel extension of Common Lisp, data-parallel programming paradigms based on a local (diffusion-like) model of computation, the scan model of computation, a general interprocessor communications model, and a region-based model are introduced. Algorithms for connected component labeling, distance transformation, Voronoi diagrams, finding minimum cost paths, local means, shape-from-shading, hidden surface calculations, affine transformation, oblique parallel projection, and spatial operations over regions are presented. An new algorithm for interpolating irregularly spaced data via Voronoi diagrams is also described.
Parallel Genetic Algorithm for Alpha Spectra Fitting
NASA Astrophysics Data System (ADS)
García-Orellana, Carlos J.; Rubio-Montero, Pilar; González-Velasco, Horacio
2005-01-01
We present a performance study of alpha-particle spectra fitting using parallel Genetic Algorithm (GA). The method uses a two-step approach. In the first step we run parallel GA to find an initial solution for the second step, in which we use Levenberg-Marquardt (LM) method for a precise final fit. GA is a high resources-demanding method, so we use a Beowulf cluster for parallel simulation. The relationship between simulation time (and parallel efficiency) and processors number is studied using several alpha spectra, with the aim of obtaining a method to estimate the optimal processors number that must be used in a simulation.
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1995-01-01
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
Partially molten magma ocean model
Shirley, D.N.
1983-02-15
The properties of the lunar crust and upper mantle can be explained if the outer 300-400 km of the moon was initially only partially molten rather than fully molten. The top of the partially molten region contained about 20% melt and decreased to 0% at 300-400 km depth. Nuclei of anorthositic crust formed over localized bodies of magma segregated from the partial melt, then grew peripherally until they coverd the moon. Throughout most of its growth period the anorthosite crust floated on a layer of magma a few km thick. The thickness of this layer is regulated by the opposing forces of loss of material by fractional crystallization and addition of magma from the partial melt below. Concentrations of Sr, Eu, and Sm in pristine ferroan anorthosites are found to be consistent with this model, as are trends for the ferroan anorthosites and Mg-rich suites on a diagram of An in plagioclase vs. mg in mafics. Clustering of Eu, Sr, and mg values found among pristine ferroan anorthosites are predicted by this model.
Covert Reinforcement: A Partial Replication.
ERIC Educational Resources Information Center
Ripstra, Constance C.; And Others
A partial replication of an investigation of the effect of covert reinforcement on a perceptual estimation task is described. The study was extended to include an extinction phase. There were five treatment groups: covert reinforcement, neutral scene reinforcement, noncontingent covert reinforcement, and two control groups. Each subject estimated…
Leadership in Partially Distributed Teams
ERIC Educational Resources Information Center
Plotnick, Linda
2009-01-01
Inter-organizational collaboration is becoming more common. When organizations collaborate they often do so in partially distributed teams (PDTs). A PDT is a hybrid team that has at least one collocated subteam and at least two subteams that are geographically distributed and communicate primarily through electronic media. While PDTs share many…