Sample records for autocalibrating partially parallel

  1. Feasibility of through-time spiral generalized autocalibrating partial parallel acquisition for low latency accelerated real-time MRI of speech.

    PubMed

    Lingala, Sajan Goud; Zhu, Yinghua; Lim, Yongwan; Toutios, Asterios; Ji, Yunhua; Lo, Wei-Ching; Seiberlich, Nicole; Narayanan, Shrikanth; Nayak, Krishna S

    2017-12-01

    To evaluate the feasibility of through-time spiral generalized autocalibrating partial parallel acquisition (GRAPPA) for low-latency accelerated real-time MRI of speech. Through-time spiral GRAPPA (spiral GRAPPA), a fast linear reconstruction method, is applied to spiral (k-t) data acquired from an eight-channel custom upper-airway coil. Fully sampled data were retrospectively down-sampled to evaluate spiral GRAPPA at undersampling factors R = 2 to 6. Pseudo-golden-angle spiral acquisitions were used for prospective studies. Three subjects were imaged while performing a range of speech tasks that involved rapid articulator movements, including fluent speech and beat-boxing. Spiral GRAPPA was compared with view sharing, and a parallel imaging and compressed sensing (PI-CS) method. Spiral GRAPPA captured spatiotemporal dynamics of vocal tract articulators at undersampling factors ≤4. Spiral GRAPPA at 18 ms/frame and 2.4 mm 2 /pixel outperformed view sharing in depicting rapidly moving articulators. Spiral GRAPPA and PI-CS provided equivalent temporal fidelity. Reconstruction latency per frame was 14 ms for view sharing and 116 ms for spiral GRAPPA, using a single processor. Spiral GRAPPA kept up with the MRI data rate of 18ms/frame with eight processors. PI-CS required 17 minutes to reconstruct 5 seconds of dynamic data. Spiral GRAPPA enabled 4-fold accelerated real-time MRI of speech with a low reconstruction latency. This approach is applicable to wide range of speech RT-MRI experiments that benefit from real-time feedback while visualizing rapid articulator movement. Magn Reson Med 78:2275-2282, 2017. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.

  2. Simultaneous auto-calibration and gradient delays estimation (SAGE) in non-Cartesian parallel MRI using low-rank constraints.

    PubMed

    Jiang, Wenwen; Larson, Peder E Z; Lustig, Michael

    2018-03-09

    To correct gradient timing delays in non-Cartesian MRI while simultaneously recovering corruption-free auto-calibration data for parallel imaging, without additional calibration scans. The calibration matrix constructed from multi-channel k-space data should be inherently low-rank. This property is used to construct reconstruction kernels or sensitivity maps. Delays between the gradient hardware across different axes and RF receive chain, which are relatively benign in Cartesian MRI (excluding EPI), lead to trajectory deviations and hence data inconsistencies for non-Cartesian trajectories. These in turn lead to higher rank and corrupted calibration information which hampers the reconstruction. Here, a method named Simultaneous Auto-calibration and Gradient delays Estimation (SAGE) is proposed that estimates the actual k-space trajectory while simultaneously recovering the uncorrupted auto-calibration data. This is done by estimating the gradient delays that result in the lowest rank of the calibration matrix. The Gauss-Newton method is used to solve the non-linear problem. The method is validated in simulations using center-out radial, projection reconstruction and spiral trajectories. Feasibility is demonstrated on phantom and in vivo scans with center-out radial and projection reconstruction trajectories. SAGE is able to estimate gradient timing delays with high accuracy at a signal to noise ratio level as low as 5. The method is able to effectively remove artifacts resulting from gradient timing delays and restore image quality in center-out radial, projection reconstruction, and spiral trajectories. The low-rank based method introduced simultaneously estimates gradient timing delays and provides accurate auto-calibration data for improved image quality, without any additional calibration scans. © 2018 International Society for Magnetic Resonance in Medicine.

  3. Parallel Reconstruction Using Null Operations (PRUNO)

    PubMed Central

    Zhang, Jian; Liu, Chunlei; Moseley, Michael E.

    2011-01-01

    A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290

  4. Measuring signal-to-noise ratio in partially parallel imaging MRI

    PubMed Central

    Goerner, Frank L.; Clarke, Geoffrey D.

    2011-01-01

    Purpose: To assess five different methods of signal-to-noise ratio (SNR) measurement for partially parallel imaging (PPI) acquisitions. Methods: Measurements were performed on a spherical phantom and three volunteers using a multichannel head coil a clinical 3T MRI system to produce echo planar, fast spin echo, gradient echo, and balanced steady state free precession image acquisitions. Two different PPI acquisitions, generalized autocalibrating partially parallel acquisition algorithm and modified sensitivity encoding with acceleration factors (R) of 2–4, were evaluated and compared to nonaccelerated acquisitions. Five standard SNR measurement techniques were investigated and Bland–Altman analysis was used to determine agreement between the various SNR methods. The estimated g-factor values, associated with each method of SNR calculation and PPI reconstruction method, were also subjected to assessments that considered the effects on SNR due to reconstruction method, phase encoding direction, and R-value. Results: Only two SNR measurement methods produced g-factors in agreement with theoretical expectations (g ≥ 1). Bland–Altman tests demonstrated that these two methods also gave the most similar results relative to the other three measurements. R-value was the only factor of the three we considered that showed significant influence on SNR changes. Conclusions: Non-signal methods used in SNR evaluation do not produce results consistent with expectations in the investigated PPI protocols. Two of the methods studied provided the most accurate and useful results. Of these two methods, it is recommended, when evaluating PPI protocols, the image subtraction method be used for SNR calculations due to its relative accuracy and ease of implementation. PMID:21978049

  5. A comparison of five standard methods for evaluating image intensity uniformity in partially parallel imaging MRI

    PubMed Central

    Goerner, Frank L.; Duong, Timothy; Stafford, R. Jason; Clarke, Geoffrey D.

    2013-01-01

    Purpose: To investigate the utility of five different standard measurement methods for determining image uniformity for partially parallel imaging (PPI) acquisitions in terms of consistency across a variety of pulse sequences and reconstruction strategies. Methods: Images were produced with a phantom using a 12-channel head matrix coil in a 3T MRI system (TIM TRIO, Siemens Medical Solutions, Erlangen, Germany). Images produced using echo-planar, fast spin echo, gradient echo, and balanced steady state free precession pulse sequences were evaluated. Two different PPI reconstruction methods were investigated, generalized autocalibrating partially parallel acquisition algorithm (GRAPPA) and modified sensitivity-encoding (mSENSE) with acceleration factors (R) of 2, 3, and 4. Additionally images were acquired with conventional, two-dimensional Fourier imaging methods (R = 1). Five measurement methods of uniformity, recommended by the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) were considered. The methods investigated were (1) an ACR method and a (2) NEMA method for calculating the peak deviation nonuniformity, (3) a modification of a NEMA method used to produce a gray scale uniformity map, (4) determining the normalized absolute average deviation uniformity, and (5) a NEMA method that focused on 17 areas of the image to measure uniformity. Changes in uniformity as a function of reconstruction method at the same R-value were also investigated. Two-way analysis of variance (ANOVA) was used to determine whether R-value or reconstruction method had a greater influence on signal intensity uniformity measurements for partially parallel MRI. Results: Two of the methods studied had consistently negative slopes when signal intensity uniformity was plotted against R-value. The results obtained comparing mSENSE against GRAPPA found no consistent difference between GRAPPA and mSENSE with regard to signal intensity uniformity

  6. Autocalibration method for non-stationary CT bias correction.

    PubMed

    Vegas-Sánchez-Ferrero, Gonzalo; Ledesma-Carbayo, Maria J; Washko, George R; Estépar, Raúl San José

    2018-02-01

    Computed tomography (CT) is a widely used imaging modality for screening and diagnosis. However, the deleterious effects of radiation exposure inherent in CT imaging require the development of image reconstruction methods which can reduce exposure levels. The development of iterative reconstruction techniques is now enabling the acquisition of low-dose CT images whose quality is comparable to that of CT images acquired with much higher radiation dosages. However, the characterization and calibration of the CT signal due to changes in dosage and reconstruction approaches is crucial to provide clinically relevant data. Although CT scanners are calibrated as part of the imaging workflow, the calibration is limited to select global reference values and does not consider other inherent factors of the acquisition that depend on the subject scanned (e.g. photon starvation, partial volume effect, beam hardening) and result in a non-stationary noise response. In this work, we analyze the effect of reconstruction biases caused by non-stationary noise and propose an autocalibration methodology to compensate it. Our contributions are: 1) the derivation of a functional relationship between observed bias and non-stationary noise, 2) a robust and accurate method to estimate the local variance, 3) an autocalibration methodology that does not necessarily rely on a calibration phantom, attenuates the bias caused by noise and removes the systematic bias observed in devices from different vendors. The validation of the proposed methodology was performed with a physical phantom and clinical CT scans acquired with different configurations (kernels, doses, algorithms including iterative reconstruction). The results confirmed the suitability of the proposed methods for removing the intra-device and inter-device reconstruction biases. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Rocket measurement of auroral partial parallel distribution functions

    NASA Astrophysics Data System (ADS)

    Lin, C.-A.

    1980-01-01

    The auroral partial parallel distribution functions are obtained by using the observed energy spectra of electrons. The experiment package was launched by a Nike-Tomahawk rocket from Poker Flat, Alaska over a bright auroral band and covered an altitude range of up to 180 km. Calculated partial distribution functions are presented with emphasis on their slopes. The implications of the slopes are discussed. It should be pointed out that the slope of the partial parallel distribution function obtained from one energy spectra will be changed by superposing another energy spectra on it.

  8. A 2D MTF approach to evaluate and guide dynamic imaging developments.

    PubMed

    Chao, Tzu-Cheng; Chung, Hsiao-Wen; Hoge, W Scott; Madore, Bruno

    2010-02-01

    As the number and complexity of partially sampled dynamic imaging methods continue to increase, reliable strategies to evaluate performance may prove most useful. In the present work, an analytical framework to evaluate given reconstruction methods is presented. A perturbation algorithm allows the proposed evaluation scheme to perform robustly without requiring knowledge about the inner workings of the method being evaluated. A main output of the evaluation process consists of a two-dimensional modulation transfer function, an easy-to-interpret visual rendering of a method's ability to capture all combinations of spatial and temporal frequencies. Approaches to evaluate noise properties and artifact content at all spatial and temporal frequencies are also proposed. One fully sampled phantom and three fully sampled cardiac cine datasets were subsampled (R = 4 and 8) and reconstructed with the different methods tested here. A hybrid method, which combines the main advantageous features observed in our assessments, was proposed and tested in a cardiac cine application, with acceleration factors of 3.5 and 6.3 (skip factors of 4 and 8, respectively). This approach combines features from methods such as k-t sensitivity encoding, unaliasing by Fourier encoding the overlaps in the temporal dimension-sensitivity encoding, generalized autocalibrating partially parallel acquisition, sensitivity profiles from an array of coils for encoding and reconstruction in parallel, self, hybrid referencing with unaliasing by Fourier encoding the overlaps in the temporal dimension and generalized autocalibrating partially parallel acquisition, and generalized autocalibrating partially parallel acquisition-enhanced sensitivity maps for sensitivity encoding reconstructions.

  9. Data consistency criterion for selecting parameters for k-space-based reconstruction in parallel imaging.

    PubMed

    Nana, Roger; Hu, Xiaoping

    2010-01-01

    k-space-based reconstruction in parallel imaging depends on the reconstruction kernel setting, including its support. An optimal choice of the kernel depends on the calibration data, coil geometry and signal-to-noise ratio, as well as the criterion used. In this work, data consistency, imposed by the shift invariance requirement of the kernel, is introduced as a goodness measure of k-space-based reconstruction in parallel imaging and demonstrated. Data consistency error (DCE) is calculated as the sum of squared difference between the acquired signals and their estimates obtained based on the interpolation of the estimated missing data. A resemblance between DCE and the mean square error in the reconstructed image was found, demonstrating DCE's potential as a metric for comparing or choosing reconstructions. When used for selecting the kernel support for generalized autocalibrating partially parallel acquisition (GRAPPA) reconstruction and the set of frames for calibration as well as the kernel support in temporal GRAPPA reconstruction, DCE led to improved images over existing methods. Data consistency error is efficient to evaluate, robust for selecting reconstruction parameters and suitable for characterizing and optimizing k-space-based reconstruction in parallel imaging.

  10. 3D hyperpolarized C-13 EPI with calibrationless parallel imaging

    NASA Astrophysics Data System (ADS)

    Gordon, Jeremy W.; Hansen, Rie B.; Shin, Peter J.; Feng, Yesu; Vigneron, Daniel B.; Larson, Peder E. Z.

    2018-04-01

    With the translation of metabolic MRI with hyperpolarized 13C agents into the clinic, imaging approaches will require large volumetric FOVs to support clinical applications. Parallel imaging techniques will be crucial to increasing volumetric scan coverage while minimizing RF requirements and temporal resolution. Calibrationless parallel imaging approaches are well-suited for this application because they eliminate the need to acquire coil profile maps or auto-calibration data. In this work, we explored the utility of a calibrationless parallel imaging method (SAKE) and corresponding sampling strategies to accelerate and undersample hyperpolarized 13C data using 3D blipped EPI acquisitions and multichannel receive coils, and demonstrated its application in a human study of [1-13C]pyruvate metabolism.

  11. Prototype of an auto-calibrating, context-aware, hybrid brain-computer interface.

    PubMed

    Faller, J; Torrellas, S; Miralles, F; Holzner, C; Kapeller, C; Guger, C; Bund, J; Müller-Putz, G R; Scherer, R

    2012-01-01

    We present the prototype of a context-aware framework that allows users to control smart home devices and to access internet services via a Hybrid BCI system of an auto-calibrating sensorimotor rhythm (SMR) based BCI and another assistive device (Integra Mouse mouth joystick). While there is extensive literature that describes the merit of Hybrid BCIs, auto-calibrating and co-adaptive ERD BCI training paradigms, specialized BCI user interfaces, context-awareness and smart home control, there is up to now, no system that includes all these concepts in one integrated easy-to-use framework that can truly benefit individuals with severe functional disabilities by increasing independence and social inclusion. Here we integrate all these technologies in a prototype framework that does not require expert knowledge or excess time for calibration. In a first pilot-study, 3 healthy volunteers successfully operated the system using input signals from an ERD BCI and an Integra Mouse and reached average positive predictive values (PPV) of 72 and 98% respectively. Based on what we learned here we are planning to improve the system for a test with a larger number of healthy volunteers so we can soon bring the system to benefit individuals with severe functional disability.

  12. Auto-Calibration and Fault Detection and Isolation of Skewed Redundant Accelerometers in Measurement While Drilling Systems.

    PubMed

    Seyed Moosavi, Seyed Mohsen; Moaveni, Bijan; Moshiri, Behzad; Arvan, Mohammad Reza

    2018-02-27

    The present study designed skewed redundant accelerometers for a Measurement While Drilling (MWD) tool and executed auto-calibration, fault diagnosis and isolation of accelerometers in this tool. The optimal structure includes four accelerometers was selected and designed precisely in accordance with the physical shape of the existing MWD tool. A new four-accelerometer structure was designed, implemented and installed on the current system, replacing the conventional orthogonal structure. Auto-calibration operation of skewed redundant accelerometers and all combinations of three accelerometers have been done. Consequently, biases, scale factors, and misalignment factors of accelerometers have been successfully estimated. By defecting the sensors in the new optimal skewed redundant structure, the fault was detected using the proposed FDI method and the faulty sensor was diagnosed and isolated. The results indicate that the system can continue to operate with at least three correct sensors.

  13. Auto-Calibration and Fault Detection and Isolation of Skewed Redundant Accelerometers in Measurement While Drilling Systems

    PubMed Central

    Seyed Moosavi, Seyed Mohsen; Moshiri, Behzad; Arvan, Mohammad Reza

    2018-01-01

    The present study designed skewed redundant accelerometers for a Measurement While Drilling (MWD) tool and executed auto-calibration, fault diagnosis and isolation of accelerometers in this tool. The optimal structure includes four accelerometers was selected and designed precisely in accordance with the physical shape of the existing MWD tool. A new four-accelerometer structure was designed, implemented and installed on the current system, replacing the conventional orthogonal structure. Auto-calibration operation of skewed redundant accelerometers and all combinations of three accelerometers have been done. Consequently, biases, scale factors, and misalignment factors of accelerometers have been successfully estimated. By defecting the sensors in the new optimal skewed redundant structure, the fault was detected using the proposed FDI method and the faulty sensor was diagnosed and isolated. The results indicate that the system can continue to operate with at least three correct sensors. PMID:29495434

  14. Quantitative metrics for evaluating parallel acquisition techniques in diffusion tensor imaging at 3 Tesla.

    PubMed

    Ardekani, Siamak; Selva, Luis; Sayre, James; Sinha, Usha

    2006-11-01

    Single-shot echo-planar based diffusion tensor imaging is prone to geometric and intensity distortions. Parallel imaging is a means of reducing these distortions while preserving spatial resolution. A quantitative comparison at 3 T of parallel imaging for diffusion tensor images (DTI) using k-space (generalized auto-calibrating partially parallel acquisitions; GRAPPA) and image domain (sensitivity encoding; SENSE) reconstructions at different acceleration factors, R, is reported here. Images were evaluated using 8 human subjects with repeated scans for 2 subjects to estimate reproducibility. Mutual information (MI) was used to assess the global changes in geometric distortions. The effects of parallel imaging techniques on random noise and reconstruction artifacts were evaluated by placing 26 regions of interest and computing the standard deviation of apparent diffusion coefficient and fractional anisotropy along with the error of fitting the data to the diffusion model (residual error). The larger positive values in mutual information index with increasing R values confirmed the anticipated decrease in distortions. Further, the MI index of GRAPPA sequences for a given R factor was larger than the corresponding mSENSE images. The residual error was lowest in the images acquired without parallel imaging and among the parallel reconstruction methods, the R = 2 acquisitions had the least error. The standard deviation, accuracy, and reproducibility of the apparent diffusion coefficient and fractional anisotropy in homogenous tissue regions showed that GRAPPA acquired with R = 2 had the least amount of systematic and random noise and of these, significant differences with mSENSE, R = 2 were found only for the fractional anisotropy index. Evaluation of the current implementation of parallel reconstruction algorithms identified GRAPPA acquired with R = 2 as optimal for diffusion tensor imaging.

  15. Spelling is Just a Click Away - A User-Centered Brain-Computer Interface Including Auto-Calibration and Predictive Text Entry.

    PubMed

    Kaufmann, Tobias; Völker, Stefan; Gunesch, Laura; Kübler, Andrea

    2012-01-01

    Brain-computer interfaces (BCI) based on event-related potentials (ERP) allow for selection of characters from a visually presented character-matrix and thus provide a communication channel for users with neurodegenerative disease. Although they have been topic of research for more than 20 years and were multiply proven to be a reliable communication method, BCIs are almost exclusively used in experimental settings, handled by qualified experts. This study investigates if ERP-BCIs can be handled independently by laymen without expert support, which is inevitable for establishing BCIs in end-user's daily life situations. Furthermore we compared the classic character-by-character text entry against a predictive text entry (PTE) that directly incorporates predictive text into the character-matrix. N = 19 BCI novices handled a user-centered ERP-BCI application on their own without expert support. The software individually adjusted classifier weights and control parameters in the background, invisible to the user (auto-calibration). All participants were able to operate the software on their own and to twice correctly spell a sentence with the auto-calibrated classifier (once with PTE, once without). Our PTE increased spelling speed and, importantly, did not reduce accuracy. In sum, this study demonstrates feasibility of auto-calibrating ERP-BCI use, independently by laymen and the strong benefit of integrating predictive text directly into the character-matrix.

  16. Auto-calibration of GF-1 WFV images using flat terrain

    NASA Astrophysics Data System (ADS)

    Zhang, Guo; Xu, Kai; Huang, Wenchao

    2017-12-01

    Four wide field view (WFV) cameras with 16-m multispectral medium-resolution and a combined swath of 800 km are onboard the Gaofen-1 (GF-1) satellite, which can increase the revisit frequency to less than 4 days and enable large-scale land monitoring. The detection and elimination of WFV camera distortions is key for subsequent applications. Due to the wide swath of WFV images, geometric calibration using either conventional methods based on the ground control field (GCF) or GCF independent methods is problematic. This is predominantly because current GCFs in China fail to cover the whole WFV image and most GCF independent methods are used for close-range photogrammetry or computer vision fields. This study proposes an auto-calibration method using flat terrain to detect nonlinear distortions of GF-1 WFV images. First, a classic geometric calibration model is built for the GF1 WFV camera, and at least two images with an overlap area that cover flat terrain are collected, then the elevation residuals between the real elevation and that calculated by forward intersection are used to solve nonlinear distortion parameters in WFV images. Experiments demonstrate that the orientation accuracy of the proposed method evaluated by GCF CPs is within 0.6 pixel, and residual errors manifest as random errors. Validation using Google Earth CPs further proves the effectiveness of auto-calibration, and the whole scene is undistorted compared to not using calibration parameters. The orientation accuracy of the proposed method and the GCF method is compared. The maximum difference is approximately 0.3 pixel, and the factors behind this discrepancy are analyzed. Generally, this method can effectively compensate for distortions in the GF-1 WFV camera.

  17. Partial fourier and parallel MR image reconstruction with integrated gradient nonlinearity correction.

    PubMed

    Tao, Shengzhen; Trzasko, Joshua D; Shu, Yunhong; Weavers, Paul T; Huston, John; Gray, Erin M; Bernstein, Matt A

    2016-06-01

    To describe how integrated gradient nonlinearity (GNL) correction can be used within noniterative partial Fourier (homodyne) and parallel (SENSE and GRAPPA) MR image reconstruction strategies, and demonstrate that performing GNL correction during, rather than after, these routines mitigates the image blurring and resolution loss caused by postreconstruction image domain based GNL correction. Starting from partial Fourier and parallel magnetic resonance imaging signal models that explicitly account for GNL, noniterative image reconstruction strategies for each accelerated acquisition technique are derived under the same core mathematical assumptions as their standard counterparts. A series of phantom and in vivo experiments on retrospectively undersampled data were performed to investigate the spatial resolution benefit of integrated GNL correction over conventional postreconstruction correction. Phantom and in vivo results demonstrate that the integrated GNL correction reduces the image blurring introduced by the conventional GNL correction, while still correcting GNL-induced coarse-scale geometrical distortion. Images generated from undersampled data using the proposed integrated GNL strategies offer superior depiction of fine image detail, for example, phantom resolution inserts and anatomical tissue boundaries. Noniterative partial Fourier and parallel imaging reconstruction methods with integrated GNL correction reduce the resolution loss that occurs during conventional postreconstruction GNL correction while preserving the computational efficiency of standard reconstruction techniques. Magn Reson Med 75:2534-2544, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  18. Spelling is Just a Click Away – A User-Centered Brain–Computer Interface Including Auto-Calibration and Predictive Text Entry

    PubMed Central

    Kaufmann, Tobias; Völker, Stefan; Gunesch, Laura; Kübler, Andrea

    2012-01-01

    Brain–computer interfaces (BCI) based on event-related potentials (ERP) allow for selection of characters from a visually presented character-matrix and thus provide a communication channel for users with neurodegenerative disease. Although they have been topic of research for more than 20 years and were multiply proven to be a reliable communication method, BCIs are almost exclusively used in experimental settings, handled by qualified experts. This study investigates if ERP–BCIs can be handled independently by laymen without expert support, which is inevitable for establishing BCIs in end-user’s daily life situations. Furthermore we compared the classic character-by-character text entry against a predictive text entry (PTE) that directly incorporates predictive text into the character-matrix. N = 19 BCI novices handled a user-centered ERP–BCI application on their own without expert support. The software individually adjusted classifier weights and control parameters in the background, invisible to the user (auto-calibration). All participants were able to operate the software on their own and to twice correctly spell a sentence with the auto-calibrated classifier (once with PTE, once without). Our PTE increased spelling speed and, importantly, did not reduce accuracy. In sum, this study demonstrates feasibility of auto-calibrating ERP–BCI use, independently by laymen and the strong benefit of integrating predictive text directly into the character-matrix. PMID:22833713

  19. Autocalibrating motion-corrected wave-encoding for highly accelerated free-breathing abdominal MRI.

    PubMed

    Chen, Feiyu; Zhang, Tao; Cheng, Joseph Y; Shi, Xinwei; Pauly, John M; Vasanawala, Shreyas S

    2017-11-01

    To develop a motion-robust wave-encoding technique for highly accelerated free-breathing abdominal MRI. A comprehensive 3D wave-encoding-based method was developed to enable fast free-breathing abdominal imaging: (a) auto-calibration for wave-encoding was designed to avoid extra scan for coil sensitivity measurement; (b) intrinsic butterfly navigators were used to track respiratory motion; (c) variable-density sampling was included to enable compressed sensing; (d) golden-angle radial-Cartesian hybrid view-ordering was incorporated to improve motion robustness; and (e) localized rigid motion correction was combined with parallel imaging compressed sensing reconstruction to reconstruct the highly accelerated wave-encoded datasets. The proposed method was tested on six subjects and image quality was compared with standard accelerated Cartesian acquisition both with and without respiratory triggering. Inverse gradient entropy and normalized gradient squared metrics were calculated, testing whether image quality was improved using paired t-tests. For respiratory-triggered scans, wave-encoding significantly reduced residual aliasing and blurring compared with standard Cartesian acquisition (metrics suggesting P < 0.05). For non-respiratory-triggered scans, the proposed method yielded significantly better motion correction compared with standard motion-corrected Cartesian acquisition (metrics suggesting P < 0.01). The proposed methods can reduce motion artifacts and improve overall image quality of highly accelerated free-breathing abdominal MRI. Magn Reson Med 78:1757-1766, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.

  20. Partial Overhaul and Initial Parallel Optimization of KINETICS, a Coupled Dynamics and Chemistry Atmosphere Model

    NASA Technical Reports Server (NTRS)

    Nguyen, Howard; Willacy, Karen; Allen, Mark

    2012-01-01

    KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.

  1. Highly accelerated cardiac cine parallel MRI using low-rank matrix completion and partial separability model

    NASA Astrophysics Data System (ADS)

    Lyu, Jingyuan; Nakarmi, Ukash; Zhang, Chaoyi; Ying, Leslie

    2016-05-01

    This paper presents a new approach to highly accelerated dynamic parallel MRI using low rank matrix completion, partial separability (PS) model. In data acquisition, k-space data is moderately randomly undersampled at the center kspace navigator locations, but highly undersampled at the outer k-space for each temporal frame. In reconstruction, the navigator data is reconstructed from undersampled data using structured low-rank matrix completion. After all the unacquired navigator data is estimated, the partial separable model is used to obtain partial k-t data. Then the parallel imaging method is used to acquire the entire dynamic image series from highly undersampled data. The proposed method has shown to achieve high quality reconstructions with reduction factors up to 31, and temporal resolution of 29ms, when the conventional PS method fails.

  2. Polarization Imaging Apparatus with Auto-Calibration

    NASA Technical Reports Server (NTRS)

    Zou, Yingyin Kevin (Inventor); Zhao, Hongzhi (Inventor); Chen, Qiushui (Inventor)

    2013-01-01

    A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5 deg, a second variable phase retarder with its optical axis aligned 45 deg, a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I(sub 0), I(sub 1), I(sub 2) and I(sub 3) of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (pi,0), (pi,pi) and (pi/2,pi), respectively. Then four Stokes components of a Stokes image, S(sub 0), S(sub 1), S(sub 2) and S(sub 3) were calculated using the four intensity images.

  3. Polarization imaging apparatus with auto-calibration

    DOEpatents

    Zou, Yingyin Kevin; Zhao, Hongzhi; Chen, Qiushui

    2013-08-20

    A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5.degree., a second variable phase retarder with its optical axis aligned 45.degree., a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I.sub.0, I.sub.1, I.sub.2 and I.sub.3 of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (.pi.,0), (.pi.,.pi.) and (.pi./2,.pi.), respectively. Then four Stokes components of a Stokes image, S.sub.0, S.sub.1, S.sub.2 and S.sub.3 were calculated using the four intensity images.

  4. Time Parallel Solution of Linear Partial Differential Equations on the Intel Touchstone Delta Supercomputer

    NASA Technical Reports Server (NTRS)

    Toomarian, N.; Fijany, A.; Barhen, J.

    1993-01-01

    Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.

  5. Complexity of parallel implementation of domain decomposition techniques for elliptic partial differential equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gropp, W.D.; Keyes, D.E.

    1988-03-01

    The authors discuss the parallel implementation of preconditioned conjugate gradient (PCG)-based domain decomposition techniques for self-adjoint elliptic partial differential equations in two dimensions on several architectures. The complexity of these methods is described on a variety of message-passing parallel computers as a function of the size of the problem, number of processors and relative communication speeds of the processors. They show that communication startups are very important, and that even the small amount of global communication in these methods can significantly reduce the performance of many message-passing architectures.

  6. Analysis and Modeling of Parallel Photovoltaic Systems under Partial Shading Conditions

    NASA Astrophysics Data System (ADS)

    Buddala, Santhoshi Snigdha

    Since the industrial revolution, fossil fuels like petroleum, coal, oil, natural gas and other non-renewable energy sources have been used as the primary energy source. The consumption of fossil fuels releases various harmful gases into the atmosphere as byproducts which are hazardous in nature and they tend to deplete the protective layers and affect the overall environmental balance. Also the fossil fuels are bounded resources of energy and rapid depletion of these sources of energy, have prompted the need to investigate alternate sources of energy called renewable energy. One such promising source of renewable energy is the solar/photovoltaic energy. This work focuses on investigating a new solar array architecture with solar cells connected in parallel configuration. By retaining the structural simplicity of the parallel architecture, a theoretical small signal model of the solar cell is proposed and modeled to analyze the variations in the module parameters when subjected to partial shading conditions. Simulations were run in SPICE to validate the model implemented in Matlab. The voltage limitations of the proposed architecture are addressed by adopting a simple dc-dc boost converter and evaluating the performance of the architecture in terms of efficiencies by comparing it with the traditional architectures. SPICE simulations are used to compare the architectures and identify the best one in terms of power conversion efficiency under partial shading conditions.

  7. Simulation of partially coherent light propagation using parallel computing devices

    NASA Astrophysics Data System (ADS)

    Magalhães, Tiago C.; Rebordão, José M.

    2017-08-01

    Light acquires or loses coherence and coherence is one of the few optical observables. Spectra can be derived from coherence functions and understanding any interferometric experiment is also relying upon coherence functions. Beyond the two limiting cases (full coherence or incoherence) the coherence of light is always partial and it changes with propagation. We have implemented a code to compute the propagation of partially coherent light from the source plane to the observation plane using parallel computing devices (PCDs). In this paper, we restrict the propagation in free space only. To this end, we used the Open Computing Language (OpenCL) and the open-source toolkit PyOpenCL, which gives access to OpenCL parallel computation through Python. To test our code, we chose two coherence source models: an incoherent source and a Gaussian Schell-model source. In the former case, we divided into two different source shapes: circular and rectangular. The results were compared to the theoretical values. Our implemented code allows one to choose between the PyOpenCL implementation and a standard one, i.e using the CPU only. To test the computation time for each implementation (PyOpenCL and standard), we used several computer systems with different CPUs and GPUs. We used powers of two for the dimensions of the cross-spectral density matrix (e.g. 324, 644) and a significant speed increase is observed in the PyOpenCL implementation when compared to the standard one. This can be an important tool for studying new source models.

  8. Development of a generic auto-calibration package for regional ecological modeling and application in the Central Plains of the United States

    USGS Publications Warehouse

    Wu, Yiping; Liu, Shuguang; Li, Zhengpeng; Dahal, Devendra; Young, Claudia J.; Schmidt, Gail L.; Liu, Jinxun; Davis, Brian; Sohl, Terry L.; Werner, Jeremy M.; Oeding, Jennifer

    2014-01-01

    Process-oriented ecological models are frequently used for predicting potential impacts of global changes such as climate and land-cover changes, which can be useful for policy making. It is critical but challenging to automatically derive optimal parameter values at different scales, especially at regional scale, and validate the model performance. In this study, we developed an automatic calibration (auto-calibration) function for a well-established biogeochemical model—the General Ensemble Biogeochemical Modeling System (GEMS)-Erosion Deposition Carbon Model (EDCM)—using data assimilation technique: the Shuffled Complex Evolution algorithm and a model-inversion R package—Flexible Modeling Environment (FME). The new functionality can support multi-parameter and multi-objective auto-calibration of EDCM at the both pixel and regional levels. We also developed a post-processing procedure for GEMS to provide options to save the pixel-based or aggregated county-land cover specific parameter values for subsequent simulations. In our case study, we successfully applied the updated model (EDCM-Auto) for a single crop pixel with a corn–wheat rotation and a large ecological region (Level II)—Central USA Plains. The evaluation results indicate that EDCM-Auto is applicable at multiple scales and is capable to handle land cover changes (e.g., crop rotations). The model also performs well in capturing the spatial pattern of grain yield production for crops and net primary production (NPP) for other ecosystems across the region, which is a good example for implementing calibration and validation of ecological models with readily available survey data (grain yield) and remote sensing data (NPP) at regional and national levels. The developed platform for auto-calibration can be readily expanded to incorporate other model inversion algorithms and potential R packages, and also be applied to other ecological models.

  9. Solution of partial differential equations on vector and parallel computers

    NASA Technical Reports Server (NTRS)

    Ortega, J. M.; Voigt, R. G.

    1985-01-01

    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.

  10. Auto-calibration of a one-dimensional hydrodynamic-ecological model using a Monte Carlo approach: simulation of hypoxic events in a polymictic lake

    NASA Astrophysics Data System (ADS)

    Luo, L.

    2011-12-01

    Automated calibration of complex deterministic water quality models with a large number of biogeochemical parameters can reduce time-consuming iterative simulations involving empirical judgements of model fit. We undertook auto-calibration of the one-dimensional hydrodynamic-ecological lake model DYRESM-CAEDYM, using a Monte Carlo sampling (MCS) method, in order to test the applicability of this procedure for shallow, polymictic Lake Rotorua (New Zealand). The calibration procedure involved independently minimising the root-mean-square-error (RMSE), maximizing the Pearson correlation coefficient (r) and Nash-Sutcliffe efficient coefficient (Nr) for comparisons of model state variables against measured data. An assigned number of parameter permutations was used for 10,000 simulation iterations. The 'optimal' temperature calibration produced a RMSE of 0.54 °C, Nr-value of 0.99 and r-value of 0.98 through the whole water column based on comparisons with 540 observed water temperatures collected between 13 July 2007 - 13 January 2009. The modeled bottom dissolved oxygen concentration (20.5 m below surface) was compared with 467 available observations. The calculated RMSE of the simulations compared with the measurements was 1.78 mg L-1, the Nr-value was 0.75 and the r-value was 0.87. The autocalibrated model was further tested for an independent data set by simulating bottom-water hypoxia events for the period 15 January 2009 to 8 June 2011 (875 days). This verification produced an accurate simulation of five hypoxic events corresponding to DO < 2 mg L-1 during summer of 2009-2011. The RMSE was 2.07 mg L-1, Nr-value 0.62 and r-value of 0.81, based on the available data set of 738 days. The auto-calibration software of DYRESM-CAEDYM developed here is substantially less time-consuming and more efficient in parameter optimisation than traditional manual calibration which has been the standard tool practiced for similar complex water quality models.

  11. Simultaneous orthogonal plane imaging.

    PubMed

    Mickevicius, Nikolai J; Paulson, Eric S

    2017-11-01

    Intrafraction motion can result in a smearing of planned external beam radiation therapy dose distributions, resulting in an uncertainty in dose actually deposited in tissue. The purpose of this paper is to present a pulse sequence that is capable of imaging a moving target at a high frame rate in two orthogonal planes simultaneously for MR-guided radiotherapy. By balancing the zero gradient moment on all axes, slices in two orthogonal planes may be spatially encoded simultaneously. The orthogonal slice groups may be acquired with equal or nonequal echo times. A Cartesian spoiled gradient echo simultaneous orthogonal plane imaging (SOPI) sequence was tested in phantom and in vivo. Multiplexed SOPI acquisitions were performed in which two parallel slices were imaged along two orthogonal axes simultaneously. An autocalibrating phase-constrained 2D-SENSE-GRAPPA (generalized autocalibrating partially parallel acquisition) algorithm was implemented to reconstruct the multiplexed data. SOPI images without intraslice motion artifacts were reconstructed at a maximum frame rate of 8.16 Hz. The 2D-SENSE-GRAPPA reconstruction separated the parallel slices aliased along each orthogonal axis. The high spatiotemporal resolution provided by SOPI has the potential to be beneficial for intrafraction motion management during MR-guided radiation therapy or other MRI-guided interventions. Magn Reson Med 78:1700-1710, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.

  12. Renal magnetic resonance angiography at 3.0 Tesla using a 32-element phased-array coil system and parallel imaging in 2 directions.

    PubMed

    Fenchel, Michael; Nael, Kambiz; Deshpande, Vibhas S; Finn, J Paul; Kramer, Ulrich; Miller, Stephan; Ruehm, Stefan; Laub, Gerhard

    2006-09-01

    The aim of the present study was to assess the feasibility of renal magnetic resonance angiography at 3.0 T using a phased-array coil system with 32-coil elements. Specifically, high parallel imaging factors were used for an increased spatial resolution and anatomic coverage of the whole abdomen. Signal-to-noise values and the g-factor distribution of the 32 element coil were examined in phantom studies for the magnetic resonance angiography (MRA) sequence. Eleven volunteers (6 men, median age of 30.0 years) were examined on a 3.0-T MR scanner (Magnetom Trio, Siemens Medical Solutions, Malvern, PA) using a 32-element phased-array coil (prototype from In vivo Corp.). Contrast-enhanced 3D-MRA (TR 2.95 milliseconds, TE 1.12 milliseconds, flip angle 25-30 degrees , bandwidth 650 Hz/pixel) was acquired with integrated generalized autocalibrating partially parallel acquisition (GRAPPA), in both phase- and slice-encoding direction. Images were assessed by 2 independent observers with regard to image quality, noise and presence of artifacts. Signal-to-noise levels of 22.2 +/- 22.0 and 57.9 +/- 49.0 were measured with (GRAPPAx6) and without parallel-imaging, respectively. The mean g-factor of the 32-element coil for GRAPPA with an acceleration of 3 and 2 in the phase-encoding and slice-encoding direction, respectively, was 1.61. High image quality was found in 9 of 11 volunteers (2.6 +/- 0.8) with good overall interobserver agreement (k = 0.87). Relatively low image quality with higher noise levels were encountered in 2 volunteers. MRA at 3.0 T using a 32-element phased-array coil is feasible in healthy volunteers. High diagnostic image quality and extended anatomic coverage could be achieved with application of high parallel imaging factors.

  13. Autocalibration of multiprojector CAVE-like immersive environments.

    PubMed

    Sajadi, Behzad; Majumder, Aditi

    2012-03-01

    In this paper, we present the first method for the geometric autocalibration of multiple projectors on a set of CAVE-like immersive display surfaces including truncated domes and 4 or 5-wall CAVEs (three side walls, floor, and/or ceiling). All such surfaces can be categorized as swept surfaces and multiple projectors can be registered on them using a single uncalibrated camera without using any physical markers on the surface. Our method can also handle nonlinear distortion in the projectors, common in compact setups where a short throw lens is mounted on each projector. Further, when the whole swept surface is not visible from a single camera view, we can register the projectors using multiple pan and tilted views of the same camera. Thus, our method scales well with different size and resolution of the display. Since we recover the 3D shape of the display, we can achieve registration that is correct from any arbitrary viewpoint appropriate for head-tracked single-user virtual reality systems. We can also achieve wallpapered registration, more appropriate for multiuser collaborative explorations. Though much more immersive than common surfaces like planes and cylinders, general swept surfaces are used today only for niche display environments. Even the more popular 4 or 5-wall CAVE is treated as a piecewise planar surface for calibration purposes and hence projectors are not allowed to be overlapped across the corners. Our method opens up the possibility of using such swept surfaces to create more immersive VR systems without compromising the simplicity of having a completely automatic calibration technique. Such calibration allows completely arbitrary positioning of the projectors in a 5-wall CAVE, without respecting the corners.

  14. Interleaved EPI diffusion imaging using SPIRiT-based reconstruction with virtual coil compression.

    PubMed

    Dong, Zijing; Wang, Fuyixue; Ma, Xiaodong; Zhang, Zhe; Dai, Erpeng; Yuan, Chun; Guo, Hua

    2018-03-01

    To develop a novel diffusion imaging reconstruction framework based on iterative self-consistent parallel imaging reconstruction (SPIRiT) for multishot interleaved echo planar imaging (iEPI), with computation acceleration by virtual coil compression. As a general approach for autocalibrating parallel imaging, SPIRiT improves the performance of traditional generalized autocalibrating partially parallel acquisitions (GRAPPA) methods in that the formulation with self-consistency is better conditioned, suggesting SPIRiT to be a better candidate in k-space-based reconstruction. In this study, a general SPIRiT framework is adopted to incorporate both coil sensitivity and phase variation information as virtual coils and then is applied to 2D navigated iEPI diffusion imaging. To reduce the reconstruction time when using a large number of coils and shots, a novel shot-coil compression method is proposed for computation acceleration in Cartesian sampling. Simulations and in vivo experiments were conducted to evaluate the performance of the proposed method. Compared with the conventional coil compression, the shot-coil compression achieved higher compression rates with reduced errors. The simulation and in vivo experiments demonstrate that the SPIRiT-based reconstruction outperformed the existing method, realigned GRAPPA, and provided superior images with reduced artifacts. The SPIRiT-based reconstruction with virtual coil compression is a reliable method for high-resolution iEPI diffusion imaging. Magn Reson Med 79:1525-1531, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.

  15. SPIRiT: Iterative Self-consistent Parallel Imaging Reconstruction from Arbitrary k-Space

    PubMed Central

    Lustig, Michael; Pauly, John M.

    2010-01-01

    A new approach to autocalibrating, coil-by-coil parallel imaging reconstruction is presented. It is a generalized reconstruction framework based on self consistency. The reconstruction problem is formulated as an optimization that yields the most consistent solution with the calibration and acquisition data. The approach is general and can accurately reconstruct images from arbitrary k-space sampling patterns. The formulation can flexibly incorporate additional image priors such as off-resonance correction and regularization terms that appear in compressed sensing. Several iterative strategies to solve the posed reconstruction problem in both image and k-space domain are presented. These are based on a projection over convex sets (POCS) and a conjugate gradient (CG) algorithms. Phantom and in-vivo studies demonstrate efficient reconstructions from undersampled Cartesian and spiral trajectories. Reconstructions that include off-resonance correction and nonlinear ℓ1-wavelet regularization are also demonstrated. PMID:20665790

  16. A Simple Application of Compressed Sensing to Further Accelerate Partially Parallel Imaging

    PubMed Central

    Miao, Jun; Guo, Weihong; Narayan, Sreenath; Wilson, David L.

    2012-01-01

    Compressed Sensing (CS) and partially parallel imaging (PPI) enable fast MR imaging by reducing the amount of k-space data required for reconstruction. Past attempts to combine these two have been limited by the incoherent sampling requirement of CS, since PPI routines typically sample on a regular (coherent) grid. Here, we developed a new method, “CS+GRAPPA,” to overcome this limitation. We decomposed sets of equidistant samples into multiple random subsets. Then, we reconstructed each subset using CS, and averaging the results to get a final CS k-space reconstruction. We used both a standard CS, and an edge and joint-sparsity guided CS reconstruction. We tested these intermediate results on both synthetic and real MR phantom data, and performed a human observer experiment to determine the effectiveness of decomposition, and to optimize the number of subsets. We then used these CS reconstructions to calibrate the GRAPPA complex coil weights. In vivo parallel MR brain and heart data sets were used. An objective image quality evaluation metric, Case-PDM, was used to quantify image quality. Coherent aliasing and noise artifacts were significantly reduced using two decompositions. More decompositions further reduced coherent aliasing and noise artifacts but introduced blurring. However, the blurring was effectively minimized using our new edge and joint-sparsity guided CS using two decompositions. Numerical results on parallel data demonstrated that the combined method greatly improved image quality as compared to standard GRAPPA, on average halving Case-PDM scores across a range of sampling rates. The proposed technique allowed the same Case-PDM scores as standard GRAPPA, using about half the number of samples. We conclude that the new method augments GRAPPA by combining it with CS, allowing CS to work even when the k-space sampling pattern is equidistant. PMID:22902065

  17. Autocalibration of a one-dimensional hydrodynamic-ecological model (DYRESM 4.0-CAEDYM 3.1) using a Monte Carlo approach: simulations of hypoxic events in a polymictic lake

    NASA Astrophysics Data System (ADS)

    Luo, Liancong; Hamilton, David; Lan, Jia; McBride, Chris; Trolle, Dennis

    2018-03-01

    Automated calibration of complex deterministic water quality models with a large number of biogeochemical parameters can reduce time-consuming iterative simulations involving empirical judgements of model fit. We undertook autocalibration of the one-dimensional hydrodynamic-ecological lake model DYRESM-CAEDYM, using a Monte Carlo sampling (MCS) method, in order to test the applicability of this procedure for shallow, polymictic Lake Rotorua (New Zealand). The calibration procedure involved independently minimizing the root-mean-square error (RMSE), maximizing the Pearson correlation coefficient (r) and Nash-Sutcliffe efficient coefficient (Nr) for comparisons of model state variables against measured data. An assigned number of parameter permutations was used for 10 000 simulation iterations. The "optimal" temperature calibration produced a RMSE of 0.54 °C, Nr value of 0.99, and r value of 0.98 through the whole water column based on comparisons with 540 observed water temperatures collected between 13 July 2007 and 13 January 2009. The modeled bottom dissolved oxygen concentration (20.5 m below surface) was compared with 467 available observations. The calculated RMSE of the simulations compared with the measurements was 1.78 mg L-1, the Nr value was 0.75, and the r value was 0.87. The autocalibrated model was further tested for an independent data set by simulating bottom-water hypoxia events from 15 January 2009 to 8 June 2011 (875 days). This verification produced an accurate simulation of five hypoxic events corresponding to DO < 2 mg L-1 during summer of 2009-2011. The RMSE was 2.07 mg L-1, Nr value 0.62, and r value of 0.81, based on the available data set of 738 days. The autocalibration software of DYRESM-CAEDYM developed here is substantially less time-consuming and more efficient in parameter optimization than traditional manual calibration which has been the standard tool practiced for similar complex water quality models.

  18. Motions of the hand expose the partial and parallel activation of stereotypes.

    PubMed

    Freeman, Jonathan B; Ambady, Nalini

    2009-10-01

    Perceivers spontaneously sort other people's faces into social categories and activate the stereotype knowledge associated with those categories. In the work described here, participants, presented with sex-typical and sex-atypical faces (i.e., faces containing a mixture of male and female features), identified which of two gender stereotypes (one masculine and one feminine) was appropriate for the face. Meanwhile, their hand movements were measured by recording the streaming x, y coordinates of the computer mouse. As participants stereotyped sex-atypical faces, real-time motor responses exhibited a continuous spatial attraction toward the opposite-gender stereotype. These data provide evidence for the partial and parallel activation of stereotypes belonging to alternate social categories. Thus, perceptual cues of the face can trigger a graded mixture of simultaneously active stereotype knowledge tied to alternate social categories, and this mixture settles over time onto ultimate judgments.

  19. Parallels between control PDE's (Partial Differential Equations) and systems of ODE's (Ordinary Differential Equations)

    NASA Technical Reports Server (NTRS)

    Hunt, L. R.; Villarreal, Ramiro

    1987-01-01

    System theorists understand that the same mathematical objects which determine controllability for nonlinear control systems of ordinary differential equations (ODEs) also determine hypoellipticity for linear partial differentail equations (PDEs). Moreover, almost any study of ODE systems begins with linear systems. It is remarkable that Hormander's paper on hypoellipticity of second order linear p.d.e.'s starts with equations due to Kolmogorov, which are shown to be analogous to the linear PDEs. Eigenvalue placement by state feedback for a controllable linear system can be paralleled for a Kolmogorov equation if an appropriate type of feedback is introduced. Results concerning transformations of nonlinear systems to linear systems are similar to results for transforming a linear PDE to a Kolmogorov equation.

  20. Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Earl, Christopher; Might, Matthew; Bagusetty, Abhishek

    This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.

  1. Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations

    DOE PAGES

    Earl, Christopher; Might, Matthew; Bagusetty, Abhishek; ...

    2016-01-26

    This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.

  2. Automatic Camera Orientation and Structure Recovery with Samantha

    NASA Astrophysics Data System (ADS)

    Gherardi, R.; Toldo, R.; Garro, V.; Fusiello, A.

    2011-09-01

    SAMANTHA is a software capable of computing camera orientation and structure recovery from a sparse block of casual images without human intervention. It can process both calibrated images or uncalibrated, in which case an autocalibration routine is run. Pictures are organized into a hierarchical tree which has single images as leaves and partial reconstructions as internal nodes. The method proceeds bottom up until it reaches the root node, corresponding to the final result. This framework is one order of magnitude faster than sequential approaches, inherently parallel, less sensitive to the error accumulation causing drift. We have verified the quality of our reconstructions both qualitatively producing compelling point clouds and quantitatively, comparing them with laser scans serving as ground truth.

  3. Efficient operator splitting algorithm for joint sparsity-regularized SPIRiT-based parallel MR imaging reconstruction.

    PubMed

    Duan, Jizhong; Liu, Yu; Jing, Peiguang

    2018-02-01

    Self-consistent parallel imaging (SPIRiT) is an auto-calibrating model for the reconstruction of parallel magnetic resonance imaging, which can be formulated as a regularized SPIRiT problem. The Projection Over Convex Sets (POCS) method was used to solve the formulated regularized SPIRiT problem. However, the quality of the reconstructed image still needs to be improved. Though methods such as NonLinear Conjugate Gradients (NLCG) can achieve higher spatial resolution, these methods always demand very complex computation and converge slowly. In this paper, we propose a new algorithm to solve the formulated Cartesian SPIRiT problem with the JTV and JL1 regularization terms. The proposed algorithm uses the operator splitting (OS) technique to decompose the problem into a gradient problem and a denoising problem with two regularization terms, which is solved by our proposed split Bregman based denoising algorithm, and adopts the Barzilai and Borwein method to update step size. Simulation experiments on two in vivo data sets demonstrate that the proposed algorithm is 1.3 times faster than ADMM for datasets with 8 channels. Especially, our proposal is 2 times faster than ADMM for the dataset with 32 channels. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Digital tomosynthesis mammography using a parallel maximum-likelihood reconstruction method

    NASA Astrophysics Data System (ADS)

    Wu, Tao; Zhang, Juemin; Moore, Richard; Rafferty, Elizabeth; Kopans, Daniel; Meleis, Waleed; Kaeli, David

    2004-05-01

    A parallel reconstruction method, based on an iterative maximum likelihood (ML) algorithm, is developed to provide fast reconstruction for digital tomosynthesis mammography. Tomosynthesis mammography acquires 11 low-dose projections of a breast by moving an x-ray tube over a 50° angular range. In parallel reconstruction, each projection is divided into multiple segments along the chest-to-nipple direction. Using the 11 projections, segments located at the same distance from the chest wall are combined to compute a partial reconstruction of the total breast volume. The shape of the partial reconstruction forms a thin slab, angled toward the x-ray source at a projection angle 0°. The reconstruction of the total breast volume is obtained by merging the partial reconstructions. The overlap region between neighboring partial reconstructions and neighboring projection segments is utilized to compensate for the incomplete data at the boundary locations present in the partial reconstructions. A serial execution of the reconstruction is compared to a parallel implementation, using clinical data. The serial code was run on a PC with a single PentiumIV 2.2GHz CPU. The parallel implementation was developed using MPI and run on a 64-node Linux cluster using 800MHz Itanium CPUs. The serial reconstruction for a medium-sized breast (5cm thickness, 11cm chest-to-nipple distance) takes 115 minutes, while a parallel implementation takes only 3.5 minutes. The reconstruction time for a larger breast using a serial implementation takes 187 minutes, while a parallel implementation takes 6.5 minutes. No significant differences were observed between the reconstructions produced by the serial and parallel implementations.

  5. Parallel grid population

    DOEpatents

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  6. Dynamic grid refinement for partial differential equations on parallel computers

    NASA Technical Reports Server (NTRS)

    Mccormick, S.; Quinlan, D.

    1989-01-01

    The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems.

  7. Experimental study on heat transfer enhancement of laminar ferrofluid flow in horizontal tube partially filled porous media under fixed parallel magnet bars

    NASA Astrophysics Data System (ADS)

    Sheikhnejad, Yahya; Hosseini, Reza; Saffar Avval, Majid

    2017-02-01

    In this study, steady state laminar ferroconvection through circular horizontal tube partially filled with porous media under constant heat flux is experimentally investigated. Transverse magnetic fields were applied on ferrofluid flow by two fixed parallel magnet bar positioned on a certain distance from beginning of the test section. The results show promising notable enhancement in heat transfer as a consequence of partially filled porous media and magnetic field, up to 2.2 and 1.4 fold enhancement were observed in heat transfer coefficient respectively. It was found that presence of both porous media and magnetic field simultaneously can highly improve heat transfer up to 2.4 fold. Porous media of course plays a major role in this configuration. Virtually, application of Magnetic field and porous media also insert higher pressure loss along the pipe which again porous media contribution is higher that magnetic field.

  8. Parallel-In-Time For Moving Meshes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Falgout, R. D.; Manteuffel, T. A.; Southworth, B.

    2016-02-04

    With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is appliedmore » to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.« less

  9. Parallel architectures for iterative methods on adaptive, block structured grids

    NASA Technical Reports Server (NTRS)

    Gannon, D.; Vanrosendale, J.

    1983-01-01

    A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.

  10. Iterative algorithms for large sparse linear systems on parallel computers

    NASA Technical Reports Server (NTRS)

    Adams, L. M.

    1982-01-01

    Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.

  11. On the parallel solution of parabolic equations

    NASA Technical Reports Server (NTRS)

    Gallopoulos, E.; Saad, Youcef

    1989-01-01

    Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.

  12. InGaAs/InP SPAD photon-counting module with auto-calibrated gate-width generation and remote control

    NASA Astrophysics Data System (ADS)

    Tosi, Alberto; Ruggeri, Alessandro; Bahgat Shehata, Andrea; Della Frera, Adriano; Scarcella, Carmelo; Tisa, Simone; Giudice, Andrea

    2013-01-01

    We present a photon-counting module based on InGaAs/InP SPAD (Single-Photon Avalanche Diode) for detecting single photons up to 1.7 μm. The module exploits a novel architecture for generating and calibrating the gate width, along with other functions (such as module supervision, counting and processing of detected photons, etc.). The gate width, i.e. the time interval when the SPAD is ON, is user-programmable in the range from 500 ps to 1.5 μs, by means of two different delay generation methods implemented with an FPGA (Field-Programmable Gate Array). In order to compensate chip-to-chip delay variation, an auto-calibration circuit picks out a combination of delays in order to match at best the selected gate width. The InGaAs/InP module accepts asynchronous and aperiodic signals and introduces very low timing jitter. Moreover the photon counting module provides other new features like a microprocessor for system supervision, a touch-screen for local user interface, and an Ethernet link for smart remote control. Thanks to the fullyprogrammable and configurable architecture, the overall instrument provides high system flexibility and can easily match all requirements set by many different applications requiring single photon-level sensitivity in the near infrared with very low photon timing jitter.

  13. Free-breathing diffusion-weighted single-shot echo-planar MR imaging using parallel imaging (GRAPPA 2) and high b value for the detection of primary rectal adenocarcinoma.

    PubMed

    Soyer, Philippe; Lagadec, Matthieu; Sirol, Marc; Dray, Xavier; Duchat, Florent; Vignaud, Alexandre; Fargeaudou, Yann; Placé, Vinciane; Gault, Valérie; Hamzi, Lounis; Pocard, Marc; Boudiaf, Mourad

    2010-02-11

    Our objective was to determine the diagnostic accuracy of a free-breathing diffusion-weighted single-shot echo-planar magnetic resonance imaging (FBDW-SSEPI) technique with parallel imaging and high diffusion factor value (b = 1000 s/mm2) in the detection of primary rectal adenocarcinomas. Thirty-one patients (14M and 17F; mean age 67 years) with histopathologically proven primary rectal adenocarcinomas and 31 patients without rectal malignancies (14M and 17F; mean age 63.6 years) were examined with FBDW-SSEPI (repetition time (TR/echo time (TE) 3900/91 ms, gradient strength 45 mT/m, acquisition time 2 min) at 1.5 T using generalized autocalibrating partially parallel acquisitions (GRAPPA, acceleration factor 2) and a b value of 1000 s/mm2. Apparent diffusion coefficients (ADCs) of rectal adenocarcinomas and normal rectal wall were measured. FBDW-SSEPI images were evaluated for tumour detection by 2 readers. Sensitivity, specificity, accuracy and Youden score for rectal adenocarcinoma detection were calculated with their 95% confidence intervals (CI) for ADC value measurement and visual image analysis. Rectal adenocarcinomas had significantly lower ADCs (mean 1.036 x 10(-3)+/- 0.107 x 10(-3) mm2/s; median 1.015 x 10(-3) mm2/s; range (0.827-1.239) x 10(-3) mm2/s) compared with the rectal wall of control subjects (mean 1.387 x 10(-3)+/- 0.106 x 10(-3) mm2/s; median 1.385 x 10(-3) mm2/s; range (1.176-1.612) x 10(-3) mm2/s) (p < 0.0001). Using a threshold value < or = 1.240 x 10(-3) mm2/s, all rectal adenocarcinomas were correctly categorized and 100% sensitivity (31/31; 95% CI 95-100%), 94% specificity (31/33; 95% CI 88-100%), 97% accuracy (60/62; 95% CI 92-100%) and Youden index 0.94 were obtained for the diagnosis of rectal adenocarcinoma. FBDW-SSEPI image analysis allowed depiction of all rectal adenocarcinomas but resulted in 2 false-positive findings, yielding 100% sensitivity (31/31; 95% CI 95-100%), 94% specificity (31/33; 95% CI 88-100%), 97% accuracy (60

  14. Research in Parallel Algorithms and Software for Computational Aerosciences

    NASA Technical Reports Server (NTRS)

    Domel, Neal D.

    1996-01-01

    Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  15. Visualization and Tracking of Parallel CFD Simulations

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kremenetsky, Mark

    1995-01-01

    We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

  16. Some fast elliptic solvers on parallel architectures and their complexities

    NASA Technical Reports Server (NTRS)

    Gallopoulos, E.; Saad, Y.

    1989-01-01

    The discretization of separable elliptic partial differential equations leads to linear systems with special block tridiagonal matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconstant coefficients. A method was recently proposed to parallelize and vectorize BCR. In this paper, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational compelxity lower than that of parallel BCR.

  17. Some fast elliptic solvers on parallel architectures and their complexities

    NASA Technical Reports Server (NTRS)

    Gallopoulos, E.; Saad, Youcef

    1989-01-01

    The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR.

  18. Solving Partial Differential Equations in a data-driven multiprocessor environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.

    1988-12-31

    Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less

  19. Tolerant (parallel) Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Bailey, David H. (Technical Monitor)

    1997-01-01

    In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

  20. Research in Parallel Algorithms and Software for Computational Aerosciences

    NASA Technical Reports Server (NTRS)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  1. Parallel adaptive wavelet collocation method for PDEs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu

    2015-10-01

    A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less

  2. LORAKS Makes Better SENSE: Phase-Constrained Partial Fourier SENSE Reconstruction without Phase Calibration

    PubMed Central

    Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P.

    2016-01-01

    Purpose Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. Theory and Methods The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly-accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely-used calibrationless uniformly-undersampled trajectories. Results Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. Conclusion The SENSE-LORAKS framework provides promising new opportunities for highly-accelerated MRI. PMID:27037836

  3. K-space reconstruction with anisotropic kernel support (KARAOKE) for ultrafast partially parallel imaging.

    PubMed

    Miao, Jun; Wong, Wilbur C K; Narayan, Sreenath; Wilson, David L

    2011-11-01

    Partially parallel imaging (PPI) greatly accelerates MR imaging by using surface coil arrays and under-sampling k-space. However, the reduction factor (R) in PPI is theoretically constrained by the number of coils (N(C)). A symmetrically shaped kernel is typically used, but this often prevents even the theoretically possible R from being achieved. Here, the authors propose a kernel design method to accelerate PPI faster than R = N(C). K-space data demonstrates an anisotropic pattern that is correlated with the object itself and to the asymmetry of the coil sensitivity profile, which is caused by coil placement and B(1) inhomogeneity. From spatial analysis theory, reconstruction of such pattern is best achieved by a signal-dependent anisotropic shape kernel. As a result, the authors propose the use of asymmetric kernels to improve k-space reconstruction. The authors fit a bivariate Gaussian function to the local signal magnitude of each coil, then threshold this function to extract the kernel elements. A perceptual difference model (Case-PDM) was employed to quantitatively evaluate image quality. A MR phantom experiment showed that k-space anisotropy increased as a function of magnetic field strength. The authors tested a K-spAce Reconstruction with AnisOtropic KErnel support ("KARAOKE") algorithm with both MR phantom and in vivo data sets, and compared the reconstructions to those produced by GRAPPA, a popular PPI reconstruction method. By exploiting k-space anisotropy, KARAOKE was able to better preserve edges, which is particularly useful for cardiac imaging and motion correction, while GRAPPA failed at a high R near or exceeding N(C). KARAOKE performed comparably to GRAPPA at low Rs. As a rule of thumb, KARAOKE reconstruction should always be used for higher quality k-space reconstruction, particularly when PPI data is acquired at high Rs and/or high field strength.

  4. K-space reconstruction with anisotropic kernel support (KARAOKE) for ultrafast partially parallel imaging

    PubMed Central

    Miao, Jun; Wong, Wilbur C. K.; Narayan, Sreenath; Wilson, David L.

    2011-01-01

    Purpose: Partially parallel imaging (PPI) greatly accelerates MR imaging by using surface coil arrays and under-sampling k-space. However, the reduction factor (R) in PPI is theoretically constrained by the number of coils (NC). A symmetrically shaped kernel is typically used, but this often prevents even the theoretically possible R from being achieved. Here, the authors propose a kernel design method to accelerate PPI faster than R = NC. Methods: K-space data demonstrates an anisotropic pattern that is correlated with the object itself and to the asymmetry of the coil sensitivity profile, which is caused by coil placement and B1 inhomogeneity. From spatial analysis theory, reconstruction of such pattern is best achieved by a signal-dependent anisotropic shape kernel. As a result, the authors propose the use of asymmetric kernels to improve k-space reconstruction. The authors fit a bivariate Gaussian function to the local signal magnitude of each coil, then threshold this function to extract the kernel elements. A perceptual difference model (Case-PDM) was employed to quantitatively evaluate image quality. Results: A MR phantom experiment showed that k-space anisotropy increased as a function of magnetic field strength. The authors tested a K-spAce Reconstruction with AnisOtropic KErnel support (“KARAOKE”) algorithm with both MR phantom and in vivo data sets, and compared the reconstructions to those produced by GRAPPA, a popular PPI reconstruction method. By exploiting k-space anisotropy, KARAOKE was able to better preserve edges, which is particularly useful for cardiac imaging and motion correction, while GRAPPA failed at a high R near or exceeding NC. KARAOKE performed comparably to GRAPPA at low Rs. Conclusions: As a rule of thumb, KARAOKE reconstruction should always be used for higher quality k-space reconstruction, particularly when PPI data is acquired at high Rs and∕or high field strength. PMID:22047378

  5. Parallel PWMs Based Fully Digital Transmitter with Wide Carrier Frequency Range

    PubMed Central

    Zhou, Bo; Zhang, Kun; Zhou, Wenbiao; Zhang, Yanjun; Liu, Dake

    2013-01-01

    The carrier-frequency (CF) and intermediate-frequency (IF) pulse-width modulators (PWMs) based on delay lines are proposed, where baseband signals are conveyed by both positions and pulse widths or densities of the carrier clock. By combining IF-PWM and precorrected CF-PWM, a fully digital transmitter with unit-delay autocalibration is implemented in 180 nm CMOS for high reconfiguration. The proposed architecture achieves wide CF range of 2 M–1 GHz, high power efficiency of 70%, and low error vector magnitude (EVM) of 3%, with spectrum purity of 20 dB optimized in comparison to the existing designs. PMID:24223503

  6. A Performance Comparison of the Parallel Preconditioners for Iterative Methods for Large Sparse Linear Systems Arising from Partial Differential Equations on Structured Grids

    NASA Astrophysics Data System (ADS)

    Ma, Sangback

    In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The

  7. LORAKS makes better SENSE: Phase-constrained partial fourier SENSE reconstruction without phase calibration.

    PubMed

    Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P

    2017-03-01

    Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely used calibrationless uniformly undersampled trajectories. Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. The SENSE-LORAKS framework provides promising new opportunities for highly accelerated MRI. Magn Reson Med 77:1021-1035, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.

  8. Integrated water system simulation by considering hydrological and biogeochemical processes: model development, with parameter sensitivity and autocalibration

    NASA Astrophysics Data System (ADS)

    Zhang, Y. Y.; Shao, Q. X.; Ye, A. Z.; Xing, H. T.; Xia, J.

    2016-02-01

    Integrated water system modeling is a feasible approach to understanding severe water crises in the world and promoting the implementation of integrated river basin management. In this study, a classic hydrological model (the time variant gain model: TVGM) was extended to an integrated water system model by coupling multiple water-related processes in hydrology, biogeochemistry, water quality, and ecology, and considering the interference of human activities. A parameter analysis tool, which included sensitivity analysis, autocalibration and model performance evaluation, was developed to improve modeling efficiency. To demonstrate the model performances, the Shaying River catchment, which is the largest highly regulated and heavily polluted tributary of the Huai River basin in China, was selected as the case study area. The model performances were evaluated on the key water-related components including runoff, water quality, diffuse pollution load (or nonpoint sources) and crop yield. Results showed that our proposed model simulated most components reasonably well. The simulated daily runoff at most regulated and less-regulated stations matched well with the observations. The average correlation coefficient and Nash-Sutcliffe efficiency were 0.85 and 0.70, respectively. Both the simulated low and high flows at most stations were improved when the dam regulation was considered. The daily ammonium-nitrogen (NH4-N) concentration was also well captured with the average correlation coefficient of 0.67. Furthermore, the diffuse source load of NH4-N and the corn yield were reasonably simulated at the administrative region scale. This integrated water system model is expected to improve the simulation performances with extension to more model functionalities, and to provide a scientific basis for the implementation in integrated river basin managements.

  9. Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations

    NASA Technical Reports Server (NTRS)

    Fijany, Amir

    1993-01-01

    In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.

  10. Reduction in menopause-related symptoms associated with use of a noninvasive neurotechnology for autocalibration of neural oscillations.

    PubMed

    Tegeler, Charles H; Tegeler, Catherine L; Cook, Jared F; Lee, Sung W; Pajewski, Nicholas M

    2015-06-01

    Increased amplitudes in high-frequency brain electrical activity are reported with menopausal hot flashes. We report outcomes associated with the use of High-resolution, relational, resonance-based, electroencephalic mirroring--a noninvasive neurotechnology for autocalibration of neural oscillations--by women with perimenopausal and postmenopausal hot flashes. Twelve women with hot flashes (median age, 56 y; range, 46-69 y) underwent a median of 13 (range, 8-23) intervention sessions for a median of 9.5 days (range, 4-32). This intervention uses algorithmic analysis of brain electrical activity and near real-time translation of brain frequencies into variable tones for acoustic stimulation. Hot flash frequency and severity were recorded by daily diary. Primary outcomes included hot flash severity score, sleep, and depressive symptoms. High-frequency amplitudes (23-36 Hz) from bilateral temporal scalp recordings were measured at baseline and during serial sessions. Self-reported symptom inventories for sleep and depressive symptoms were collected. The median change in hot flash severity score was -0.97 (range, -3.00 to 1.00; P = 0.015). Sleep and depression scores decreased by -8.5 points (range, -20 to -1; P = 0.022) and -5.5 points (range, -32 to 8; P = 0.015), respectively. The median sum of amplitudes for the right and left temporal high-frequency brain electrical activity was 8.44 μV (range, 6.27-16.66) at baseline and decreased by a median of -2.96 μV (range, -11.05 to -0.65; P = 0.0005) by the final session. Hot flash frequency and severity, symptoms of insomnia and depression, and temporal high-frequency brain electrical activity decrease after High-resolution, relational, resonance-based, electroencephalic mirroring. Larger controlled trials with longer follow-up are warranted.

  11. The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.

    PubMed

    Mitchell, William F

    1998-01-01

    Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.

  12. The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations

    PubMed Central

    Mitchell, William F.

    1998-01-01

    Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355

  13. Accelerated T1ρ acquisition for knee cartilage quantification using compressed sensing and data-driven parallel imaging: A feasibility study.

    PubMed

    Pandit, Prachi; Rivoire, Julien; King, Kevin; Li, Xiaojuan

    2016-03-01

    Quantitative T1ρ imaging is beneficial for early detection for osteoarthritis but has seen limited clinical use due to long scan times. In this study, we evaluated the feasibility of accelerated T1ρ mapping for knee cartilage quantification using a combination of compressed sensing (CS) and data-driven parallel imaging (ARC-Autocalibrating Reconstruction for Cartesian sampling). A sequential combination of ARC and CS, both during data acquisition and reconstruction, was used to accelerate the acquisition of T1ρ maps. Phantom, ex vivo (porcine knee), and in vivo (human knee) imaging was performed on a GE 3T MR750 scanner. T1ρ quantification after CS-accelerated acquisition was compared with non CS-accelerated acquisition for various cartilage compartments. Accelerating image acquisition using CS did not introduce major deviations in quantification. The coefficient of variation for the root mean squared error increased with increasing acceleration, but for in vivo measurements, it stayed under 5% for a net acceleration factor up to 2, where the acquisition was 25% faster than the reference (only ARC). To the best of our knowledge, this is the first implementation of CS for in vivo T1ρ quantification. These early results show that this technique holds great promise in making quantitative imaging techniques more accessible for clinical applications. © 2015 Wiley Periodicals, Inc.

  14. DVS-SOFTWARE: An Effective Tool for Applying Highly Parallelized Hardware To Computational Geophysics

    NASA Astrophysics Data System (ADS)

    Herrera, I.; Herrera, G. S.

    2015-12-01

    Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)

  15. Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barker, Andrew T.; Benson, Thomas R.; Lee, Chak Shing

    ParELAG is a parallel C++ library for numerical upscaling of finite element discretizations and element-based algebraic multigrid solvers. It provides optimal complexity algorithms to build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured meshes. Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.

  16. Parallel pivoting combined with parallel reduction

    NASA Technical Reports Server (NTRS)

    Alaghband, Gita

    1987-01-01

    Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.

  17. Single-Shot MR Spectroscopic Imaging with Partial Parallel Imaging

    PubMed Central

    Posse, Stefan; Otazo, Ricardo; Tsai, Shang-Yueh; Yoshimoto, Akio Ernesto; Lin, Fa-Hsuan

    2010-01-01

    An MR spectroscopic imaging (MRSI) pulse sequence based on Proton-Echo-Planar-Spectroscopic-Imaging (PEPSI) is introduced that measures 2-dimensional metabolite maps in a single excitation. Echo-planar spatial-spectral encoding was combined with interleaved phase encoding and parallel imaging using SENSE to reconstruct absorption mode spectra. The symmetrical k-space trajectory compensates phase errors due to convolution of spatial and spectral encoding. Single-shot MRSI at short TE was evaluated in phantoms and in vivo on a 3 T whole body scanner equipped with 12-channel array coil. Four-step interleaved phase encoding and 4-fold SENSE acceleration were used to encode a 16×16 spatial matrix with 390 Hz spectral width. Comparison with conventional PEPSI and PEPSI with 4-fold SENSE acceleration demonstrated comparable sensitivity per unit time when taking into account g-factor related noise increases and differences in sampling efficiency. LCModel fitting enabled quantification of Inositol, Choline, Creatine and NAA in vivo with concentration values in the ranges measured with conventional PEPSI and SENSE-accelerated PEPSI. Cramer-Rao lower bounds were comparable to those obtained with conventional SENSE-accelerated PEPSI at the same voxel size and measurement time. This single-shot MRSI method is therefore suitable for applications that require high temporal resolution to monitor temporal dynamics or to reduce sensitivity to tissue movement. PMID:19097245

  18. Modulated heat pulse propagation and partial transport barriers in chaotic magnetic fields

    DOE PAGES

    del-Castillo-Negrete, Diego; Blazevski, Daniel

    2016-04-01

    Direct numerical simulations of the time dependent parallel heat transport equation modeling heat pulses driven by power modulation in 3-dimensional chaotic magnetic fields are presented. The numerical method is based on the Fourier formulation of a Lagrangian-Green's function method that provides an accurate and efficient technique for the solution of the parallel heat transport equation in the presence of harmonic power modulation. The numerical results presented provide conclusive evidence that even in the absence of magnetic flux surfaces, chaotic magnetic field configurations with intermediate levels of stochasticity exhibit transport barriers to modulated heat pulse propagation. In particular, high-order islands and remnants of destroyed flux surfaces (Cantori) act as partial barriers that slow down or even stop the propagation of heat waves at places where the magnetic field connection length exhibits a strong gradient. The key parameter ismore » $$\\gamma=\\sqrt{\\omega/2 \\chi_\\parallel}$$ that determines the length scale, $$1/\\gamma$$, of the heat wave penetration along the magnetic field line. For large perturbation frequencies, $$\\omega \\gg 1$$, or small parallel thermal conductivities, $$\\chi_\\parallel \\ll 1$$, parallel heat transport is strongly damped and the magnetic field partial barriers act as robust barriers where the heat wave amplitude vanishes and its phase speed slows down to a halt. On the other hand, in the limit of small $$\\gamma$$, parallel heat transport is largely unimpeded, global transport is observed and the radial amplitude and phase speed of the heat wave remain finite. Results on modulated heat pulse propagation in fully stochastic fields and across magnetic islands are also presented. In qualitative agreement with recent experiments in LHD and DIII-D, it is shown that the elliptic (O) and hyperbolic (X) points of magnetic islands have a direct impact on the spatio-temporal dependence of the amplitude and the time delay

  19. Radial k-t SPIRiT: autocalibrated parallel imaging for generalized phase-contrast MRI.

    PubMed

    Santelli, Claudio; Schaeffter, Tobias; Kozerke, Sebastian

    2014-11-01

    To extend SPIRiT to additionally exploit temporal correlations for highly accelerated generalized phase-contrast MRI and to compare the performance of the proposed radial k-t SPIRiT method relative to frame-by-frame SPIRiT and radial k-t GRAPPA reconstruction for velocity and turbulence mapping in the aortic arch. Free-breathing navigator-gated two-dimensional radial cine imaging with three-directional multi-point velocity encoding was implemented and fully sampled data were obtained in the aortic arch of healthy volunteers. Velocities were encoded with three different first gradient moments per axis to permit quantification of mean velocity and turbulent kinetic energy. Velocity and turbulent kinetic energy maps from up to 14-fold undersampled data were compared for k-t SPIRiT, frame-by-frame SPIRiT, and k-t GRAPPA relative to the fully sampled reference. Using k-t SPIRiT, improvements in magnitude and velocity reconstruction accuracy were found. Temporally resolved magnitude profiles revealed a reduction in spatial blurring with k-t SPIRiT compared with frame-by-frame SPIRiT and k-t GRAPPA for all velocity encodings, leading to improved estimates of turbulent kinetic energy. k-t SPIRiT offers improved reconstruction accuracy at high radial undersampling factors and hence facilitates the use of generalized phase-contrast MRI for routine use. Copyright © 2013 Wiley Periodicals, Inc.

  20. Extendability of parallel sections in vector bundles

    NASA Astrophysics Data System (ADS)

    Kirschner, Tim

    2016-01-01

    I address the following question: Given a differentiable manifold M, what are the open subsets U of M such that, for all vector bundles E over M and all linear connections ∇ on E, any ∇-parallel section in E defined on U extends to a ∇-parallel section in E defined on M? For simply connected manifolds M (among others) I describe the entirety of all such sets U which are, in addition, the complement of a C1 submanifold, boundary allowed, of M. This delivers a partial positive answer to a problem posed by Antonio J. Di Scala and Gianni Manno (2014). Furthermore, in case M is an open submanifold of Rn, n ≥ 2, I prove that the complement of U in M, not required to be a submanifold now, can have arbitrarily large n-dimensional Lebesgue measure.

  1. Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations

    NASA Technical Reports Server (NTRS)

    Chrisochoides, Nikos

    1995-01-01

    We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.

  2. Toward an automated parallel computing environment for geosciences

    NASA Astrophysics Data System (ADS)

    Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping

    2007-08-01

    Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.

  3. Partial structure of the phylloxin gene from the giant monkey frog, Phyllomedusa bicolor: parallel cloning of precursor cDNA and genomic DNA from lyophilized skin secretion.

    PubMed

    Chen, Tianbao; Gagliardo, Ron; Walker, Brian; Zhou, Mei; Shaw, Chris

    2005-12-01

    Phylloxin is a novel prototype antimicrobial peptide from the skin of Phyllomedusa bicolor. Here, we describe parallel identification and sequencing of phylloxin precursor transcript (mRNA) and partial gene structure (genomic DNA) from the same sample of lyophilized skin secretion using our recently-described cloning technique. The open-reading frame of the phylloxin precursor was identical in nucleotide sequence to that previously reported and alignment with the nucleotide sequence derived from genomic DNA indicated the presence of a 175 bp intron located in a near identical position to that found in the dermaseptins. The highly-conserved structural organization of skin secretion peptide genes in P. bicolor can thus be extended to include that encoding phylloxin (plx). These data further reinforce our assertion that application of the described methodology can provide robust genomic/transcriptomic/peptidomic data without the need for specimen sacrifice.

  4. Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation

    NASA Technical Reports Server (NTRS)

    Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.

    1996-01-01

    We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.

  5. Parallel approach to incorporating face image information into dialogue processing

    NASA Astrophysics Data System (ADS)

    Ren, Fuji

    2000-10-01

    There are many kinds of so-called irregular expressions in natural dialogues. Even if the content of a conversation is the same in words, different meanings can be interpreted by a person's feeling or face expression. To have a good understanding of dialogues, it is required in a flexible dialogue processing system to infer the speaker's view properly. However, it is difficult to obtain the meaning of the speaker's sentences in various scenes using traditional methods. In this paper, a new approach for dialogue processing that incorporates information from the speaker's face is presented. We first divide conversation statements into several simple tasks. Second, we process each simple task using an independent processor. Third, we employ some speaker's face information to estimate the view of the speakers to solve ambiguities in dialogues. The approach presented in this paper can work efficiently, because independent processors run in parallel, writing partial results to a shared memory, incorporating partial results at appropriate points, and complementing each other. A parallel algorithm and a method for employing the face information in a dialogue machine translation will be discussed, and some results will be included in this paper.

  6. Automating the parallel processing of fluid and structural dynamics calculations

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.; Cole, Gary L.

    1987-01-01

    The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.

  7. Physics Structure Analysis of Parallel Waves Concept of Physics Teacher Candidate

    NASA Astrophysics Data System (ADS)

    Sarwi, S.; Supardi, K. I.; Linuwih, S.

    2017-04-01

    The aim of this research was to find a parallel structure concept of wave physics and the factors that influence on the formation of parallel conceptions of physics teacher candidates. The method used qualitative research which types of cross-sectional design. These subjects were five of the third semester of basic physics and six of the fifth semester of wave course students. Data collection techniques used think aloud and written tests. Quantitative data were analysed with descriptive technique-percentage. The data analysis technique for belief and be aware of answers uses an explanatory analysis. Results of the research include: 1) the structure of the concept can be displayed through the illustration of a map containing the theoretical core, supplements the theory and phenomena that occur daily; 2) the trend of parallel conception of wave physics have been identified on the stationary waves, resonance of the sound and the propagation of transverse electromagnetic waves; 3) the influence on the parallel conception that reading textbooks less comprehensive and knowledge is partial understanding as forming the structure of the theory.

  8. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  9. Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver

    NASA Technical Reports Server (NTRS)

    Mavriplis, Dimitri J.

    1998-01-01

    A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.

  10. Single-shot magnetic resonance spectroscopic imaging with partial parallel imaging.

    PubMed

    Posse, Stefan; Otazo, Ricardo; Tsai, Shang-Yueh; Yoshimoto, Akio Ernesto; Lin, Fa-Hsuan

    2009-03-01

    A magnetic resonance spectroscopic imaging (MRSI) pulse sequence based on proton-echo-planar-spectroscopic-imaging (PEPSI) is introduced that measures two-dimensional metabolite maps in a single excitation. Echo-planar spatial-spectral encoding was combined with interleaved phase encoding and parallel imaging using SENSE to reconstruct absorption mode spectra. The symmetrical k-space trajectory compensates phase errors due to convolution of spatial and spectral encoding. Single-shot MRSI at short TE was evaluated in phantoms and in vivo on a 3-T whole-body scanner equipped with a 12-channel array coil. Four-step interleaved phase encoding and fourfold SENSE acceleration were used to encode a 16 x 16 spatial matrix with a 390-Hz spectral width. Comparison with conventional PEPSI and PEPSI with fourfold SENSE acceleration demonstrated comparable sensitivity per unit time when taking into account g-factor-related noise increases and differences in sampling efficiency. LCModel fitting enabled quantification of inositol, choline, creatine, and N-acetyl-aspartate (NAA) in vivo with concentration values in the ranges measured with conventional PEPSI and SENSE-accelerated PEPSI. Cramer-Rao lower bounds were comparable to those obtained with conventional SENSE-accelerated PEPSI at the same voxel size and measurement time. This single-shot MRSI method is therefore suitable for applications that require high temporal resolution to monitor temporal dynamics or to reduce sensitivity to tissue movement.

  11. Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring

    NASA Technical Reports Server (NTRS)

    Padovan, Joe; Kwang, Abel

    1994-01-01

    This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.

  12. Self-Calibration and Optimal Response in Intelligent Sensors Design Based on Artificial Neural Networks

    PubMed Central

    Rivera, José; Carrillo, Mariano; Chacón, Mario; Herrera, Gilberto; Bojorquez, Gilberto

    2007-01-01

    The development of smart sensors involves the design of reconfigurable systems capable of working with different input sensors. Reconfigurable systems ideally should spend the least possible amount of time in their calibration. An autocalibration algorithm for intelligent sensors should be able to fix major problems such as offset, variation of gain and lack of linearity, as accurately as possible. This paper describes a new autocalibration methodology for nonlinear intelligent sensors based on artificial neural networks, ANN. The methodology involves analysis of several network topologies and training algorithms. The proposed method was compared against the piecewise and polynomial linearization methods. Method comparison was achieved using different number of calibration points, and several nonlinear levels of the input signal. This paper also shows that the proposed method turned out to have a better overall accuracy than the other two methods. Besides, experimentation results and analysis of the complete study, the paper describes the implementation of the ANN in a microcontroller unit, MCU. In order to illustrate the method capability to build autocalibration and reconfigurable systems, a temperature measurement system was designed and tested. The proposed method is an improvement over the classic autocalibration methodologies, because it impacts on the design process of intelligent sensors, autocalibration methodologies and their associated factors, like time and cost.

  13. Partially orthogonal resonators for magnetic resonance imaging

    NASA Astrophysics Data System (ADS)

    Chacon-Caldera, Jorge; Malzacher, Matthias; Schad, Lothar R.

    2017-02-01

    Resonators for signal reception in magnetic resonance are traditionally planar to restrict coil material and avoid coil losses. Here, we present a novel concept to model resonators partially in a plane with maximum sensitivity to the magnetic resonance signal and partially in an orthogonal plane with reduced signal sensitivity. Thus, properties of individual elements in coil arrays can be modified to optimize physical planar space and increase the sensitivity of the overall array. A particular case of the concept is implemented to decrease H-field destructive interferences in planar concentric in-phase arrays. An increase in signal to noise ratio of approximately 20% was achieved with two resonators placed over approximately the same planar area compared to common approaches at a target depth of 10 cm at 3 Tesla. Improved parallel imaging performance of this configuration is also demonstrated. The concept can be further used to increase coil density.

  14. Some thoughts about parallel process and psychotherapy supervision: when is a parallel just a parallel?

    PubMed

    Watkins, C Edward

    2012-09-01

    In a way not done before, Tracey, Bludworth, and Glidden-Tracey ("Are there parallel processes in psychotherapy supervision: An empirical examination," Psychotherapy, 2011, advance online publication, doi.10.1037/a0026246) have shown us that parallel process in psychotherapy supervision can indeed be rigorously and meaningfully researched, and their groundbreaking investigation provides a nice prototype for future supervision studies to emulate. In what follows, I offer a brief complementary comment to Tracey et al., addressing one matter that seems to be a potentially important conceptual and empirical parallel process consideration: When is a parallel just a parallel? PsycINFO Database Record (c) 2012 APA, all rights reserved.

  15. Fault Tolerant Parallel Implementations of Iterative Algorithms for Optimal Control Problems

    DTIC Science & Technology

    1988-01-21

    p/.V)] steps, but did not discuss any specific parallel implementation. Gajski [51 improved upon this result by performing the SIMD computation in...N = p2. our approach reduces to that of [51, except that Gajski presents the coefficient computation and partial solution phases as a single...8217>. the SIMD algo- rithm presented by Gajski [5] can be most efficiently mapped to a unidirec- tional ring network with broadcasting capability. Based

  16. a Non-Overlapping Discretization Method for Partial Differential Equations

    NASA Astrophysics Data System (ADS)

    Rosas-Medina, A.; Herrera, I.

    2013-05-01

    Mathematical models of many systems of interest, including very important continuous systems of Engineering and Science, lead to a great variety of partial differential equations whose solution methods are based on the computational processing of large-scale algebraic systems. Furthermore, the incredible expansion experienced by the existing computational hardware and software has made amenable to effective treatment problems of an ever increasing diversity and complexity, posed by engineering and scientific applications. The emergence of parallel computing prompted on the part of the computational-modeling community a continued and systematic effort with the purpose of harnessing it for the endeavor of solving boundary-value problems (BVPs) of partial differential equations. Very early after such an effort began, it was recognized that domain decomposition methods (DDM) were the most effective technique for applying parallel computing to the solution of partial differential equations, since such an approach drastically simplifies the coordination of the many processors that carry out the different tasks and also reduces very much the requirements of information-transmission between them. Ideally, DDMs intend producing algorithms that fulfill the DDM-paradigm; i.e., such that "the global solution is obtained by solving local problems defined separately in each subdomain of the coarse-mesh -or domain-decomposition-". Stated in a simplistic manner, the basic idea is that, when the DDM-paradigm is satisfied, full parallelization can be achieved by assigning each subdomain to a different processor. When intensive DDM research began much attention was given to overlapping DDMs, but soon after attention shifted to non-overlapping DDMs. This evolution seems natural when the DDM-paradigm is taken into account: it is easier to uncouple the local problems when the subdomains are separated. However, an important limitation of non-overlapping domain decompositions, as that

  17. Partial Arc Curvilinear Direct Drive Servomotor

    NASA Technical Reports Server (NTRS)

    Sun, Xiuhong (Inventor)

    2014-01-01

    A partial arc servomotor assembly having a curvilinear U-channel with two parallel rare earth permanent magnet plates facing each other and a pivoted ironless three phase coil armature winding moves between the plates. An encoder read head is fixed to a mounting plate above the coil armature winding and a curvilinear encoder scale is curved to be co-axis with the curvilinear U-channel permanent magnet track formed by the permanent magnet plates. Driven by a set of miniaturized power electronics devices closely looped with a positioning feedback encoder, the angular position and velocity of the pivoted payload is programmable and precisely controlled.

  18. Immediate versus delayed loading of strategic mini dental implants for the stabilization of partial removable dental prostheses: a patient cluster randomized, parallel-group 3-year trial.

    PubMed

    Mundt, Torsten; Al Jaghsi, Ahmad; Schwahn, Bernd; Hilgert, Janina; Lucas, Christian; Biffar, Reiner; Schwahn, Christian; Heinemann, Friedhelm

    2016-07-30

    Acceptable short-term survival rates (>90 %) of mini-implants (diameter < 3.0 mm) are only documented for mandibular overdentures. Sound data for mini-implants as strategic abutments for a better retention of partial removable dental prosthesis (PRDP) are not available. The purpose of this study is to test the hypothesis that immediately loaded mini-implants show more bone loss and less success than strategic mini-implants with delayed loading. In this four-center (one university hospital, three dental practices in Germany), parallel-group, controlled clinical trial, which is cluster randomized on patient level, a total of 80 partially edentulous patients with unfavourable number and distribution of remaining abutment teeth in at least one jaw will receive supplementary min-implants to stabilize their PRDP. The mini-implant are either immediately loaded after implant placement (test group) or delayed after four months (control group). Follow-up of the patients will be performed for 36 months. The primary outcome is the radiographic bone level changes at implants. The secondary outcome is the implant success as a composite variable. Tertiary outcomes include clinical, subjective (quality of life, satisfaction, chewing ability) and dental or technical complications. Strategic implants under an existing PRDP are only documented for standard-diameter implants. Mini-implants could be a minimal invasive and low cost solution for this treatment modality. The trial is registered at Deutsches Register Klinischer Studien (German register of clinical trials) under DRKS-ID: DRKS00007589 ( www.germanctr.de ) on January 13(th), 2015.

  19. Parallelization Issues and Particle-In Codes.

    NASA Astrophysics Data System (ADS)

    Elster, Anne Cathrine

    1994-01-01

    "Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on

  20. Parallel rendering

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  1. Tailoring of the partial magnonic gap in three-dimensional magnetoferritin-based magnonic crystals

    NASA Astrophysics Data System (ADS)

    Mamica, S.

    2013-07-01

    We investigate theoretically the use of magnetoferritin nanoparticles, self-assembled in the protein crystallization process, as the basis for the realization of 3D magnonic crystals in which the interparticle space is filled with a ferromagnetic material. Using the plane wave method we study the dependence of the width of the partial band gap and its central frequency on the total magnetic moment of the magnetoferritin core and the lattice constant of the magnetoferritin crystal. We show that by adjusting the combination of these two parameters the partial gap can be tailored in a wide frequency range and shifted to sub-terahertz frequencies. Moreover, the difference in the width of the partial gap for spin waves propagating in planes parallel and perpendicular to the external field allows for switching on and off the partial magnonic gap by changing the direction of the applied field.

  2. Electrical Conductivity of Partially Molten Peridotite Analogue Under Shear: Supporting Evidence to the Partial Melting Hypothesis for the Oceanic Plate Motion

    NASA Astrophysics Data System (ADS)

    Manthilake, G.; Matsuzaki, T.; Yoshino, T.; Yamazaki, D.; Yoneda, A.; Ito, E.; Katsura, T.

    2008-12-01

    So far, two hypotheses have been proposed to explain softening of the oceanic asthenosphere allowing smooth motion of the oceanic lithosphere. One is partial melting, and the other is hydraulitic weakening. Although the hydraulitic weakening hypothesis is popular recently, Yoshino et al. [2006] suggested that this hypothesis cannot explain the high and anisotropic conductivity at the top of the asthenosphere near East Pacific Rise observed by Evans et al. [2005]. In order to explain the conductivity anisotropy over one order of magnitude by the partial melting hypothesis, we measured conductivity of partially molten peridotite analogue under shear conditions. The measured samples were mixtures of forsterite and chemically simplified basalt. The samples were pre- synthesized using a piston-cylinder apparatus at 1600 K and 2 GPa to obtain textural equilibrium. The pre- synthesized samples were formed to a disk with 3 mm in diameter and 1 mm in thickness. Conductivity measurement was carried out also at 1600 K and 2 GPa in a cubic-anvil apparatus with an additional uniaxial piston. The sample was sandwiched by two alumina pistons whose top was cut to 45 degree slope to generate shear. The shear strain rates of the sample were calibrated using a Mo strain marker in separate runs. The lower alumina piston was pushed by a tungsten carbide piston embedded in a bottom anvil with a constant speed. Conductivity was measured in the directions normal and parallel to the shear direction simultaneously. We mainly studied the sample with 1.6 volume percent of basaltic component. The shear strain rates were 0, 1.2x10(-6) and 5.2x10(-6) /s. The sample without shear did not show conductivity anisotropy. In contrast, the samples with shear showed one order of magnitude higher conductivity in the direction parallel to the shear than that normal to the shear. After the total strains reached 0.3, the magnitude of anisotropy became almost constant for both of the strain rates. The

  3. Domain decomposition methods for the parallel computation of reacting flows

    NASA Technical Reports Server (NTRS)

    Keyes, David E.

    1988-01-01

    Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.

  4. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  5. Super-resolved Parallel MRI by Spatiotemporal Encoding

    PubMed Central

    Schmidt, Rita; Baishya, Bikash; Ben-Eliezer, Noam; Seginer, Amir; Frydman, Lucio

    2016-01-01

    Recent studies described an alternative “ultrafast” scanning method based on spatiotemporal (SPEN) principles. SPEN demonstrates numerous potential advantages over EPI-based alternatives, at no additional expense in experimental complexity. An important aspect that SPEN still needs to achieve for providing a competitive acquisition alternative entails exploiting parallel imaging algorithms, without compromising its proven capabilities. The present work introduces a combination of multi-band frequency-swept pulses simultaneously encoding multiple, partial fields-of-view; together with a new algorithm merging a Super-Resolved SPEN image reconstruction and SENSE multiple-receiving methods. The ensuing approach enables one to reduce both the excitation and acquisition times of ultrafast SPEN acquisitions by the customary acceleration factor R, without compromises in either the ensuing spatial resolution, SAR deposition, or the capability to operate in multi-slice mode. The performance of these new single-shot imaging sequences and their ancillary algorithms were explored on phantoms and human volunteers at 3T. The gains of the parallelized approach were particularly evident when dealing with heterogeneous systems subject to major T2/T2* effects, as is the case upon single-scan imaging near tissue/air interfaces. PMID:24120293

  6. IOPA: I/O-aware parallelism adaption for parallel programs

    PubMed Central

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236

  7. IOPA: I/O-aware parallelism adaption for parallel programs.

    PubMed

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.

  8. High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

    ERIC Educational Resources Information Center

    von Davier, Matthias

    2016-01-01

    This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…

  9. Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool

    NASA Astrophysics Data System (ADS)

    Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.

    1997-12-01

    Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.

  10. SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

    PubMed

    O'Connor, B P

    2000-08-01

    Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.

  11. Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system

    DOEpatents

    Gooding, Thomas Michael; McCarthy, Patrick Joseph

    2010-03-02

    A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.

  12. Accuracy of different impression materials in parallel and nonparallel implants

    PubMed Central

    Vojdani, Mahroo; Torabi, Kianoosh; Ansarifard, Elham

    2015-01-01

    Background: A precise impression is mandatory to obtain passive fit in implant-supported prostheses. The aim of this study was to compare the accuracy of three impression materials in both parallel and nonparallel implant positions. Materials and Methods: In this experimental study, two partial dentate maxillary acrylic models with four implant analogues in canines and lateral incisors areas were used. One model was simulating the parallel condition and the other nonparallel one, in which implants were tilted 30° bucally and 20° in either mesial or distal directions. Thirty stone casts were made from each model using polyether (Impregum), additional silicone (Monopren) and vinyl siloxanether (Identium), with open tray technique. The distortion values in three-dimensions (X, Y and Z-axis) were measured by coordinate measuring machine. Two-way analysis of variance (ANOVA), one-way ANOVA and Tukey tests were used for data analysis (α = 0.05). Results: Under parallel condition, all the materials showed comparable, accurate casts (P = 0.74). In the presence of angulated implants, while Monopren showed more accurate results compared to Impregum (P = 0.01), Identium yielded almost similar results to those produced by Impregum (P = 0.27) and Monopren (P = 0.26). Conclusion: Within the limitations of this study, in parallel conditions, the type of impression material cannot affect the accuracy of the implant impressions; however, in nonparallel conditions, polyvinyl siloxane is shown to be a better choice, followed by vinyl siloxanether and polyether respectively. PMID:26288620

  13. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  14. Accelerated proton echo planar spectroscopic imaging (PEPSI) using GRAPPA with a 32-channel phased-array coil.

    PubMed

    Tsai, Shang-Yueh; Otazo, Ricardo; Posse, Stefan; Lin, Yi-Ru; Chung, Hsiao-Wen; Wald, Lawrence L; Wiggins, Graham C; Lin, Fa-Hsuan

    2008-05-01

    Parallel imaging has been demonstrated to reduce the encoding time of MR spectroscopic imaging (MRSI). Here we investigate up to 5-fold acceleration of 2D proton echo planar spectroscopic imaging (PEPSI) at 3T using generalized autocalibrating partial parallel acquisition (GRAPPA) with a 32-channel coil array, 1.5 cm(3) voxel size, TR/TE of 15/2000 ms, and 2.1 Hz spectral resolution. Compared to an 8-channel array, the smaller RF coil elements in this 32-channel array provided a 3.1-fold and 2.8-fold increase in signal-to-noise ratio (SNR) in the peripheral region and the central region, respectively, and more spatial modulated information. Comparison of sensitivity-encoding (SENSE) and GRAPPA reconstruction using an 8-channel array showed that both methods yielded similar quantitative metabolite measures (P > 0.1). Concentration values of N-acetyl-aspartate (NAA), total creatine (tCr), choline (Cho), myo-inositol (mI), and the sum of glutamate and glutamine (Glx) for both methods were consistent with previous studies. Using the 32-channel array coil the mean Cramer-Rao lower bounds (CRLB) were less than 8% for NAA, tCr, and Cho and less than 15% for mI and Glx at 2-fold acceleration. At 4-fold acceleration the mean CRLB for NAA, tCr, and Cho was less than 11%. In conclusion, the use of a 32-channel coil array and GRAPPA reconstruction can significantly reduce the measurement time for mapping brain metabolites. (c) 2008 Wiley-Liss, Inc.

  15. Parallel integer sorting with medium and fine-scale parallelism

    NASA Technical Reports Server (NTRS)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  16. Applications of New Surrogate Global Optimization Algorithms including Efficient Synchronous and Asynchronous Parallelism for Calibration of Expensive Nonlinear Geophysical Simulation Models.

    NASA Astrophysics Data System (ADS)

    Shoemaker, C. A.; Pang, M.; Akhtar, T.; Bindel, D.

    2016-12-01

    New parallel surrogate global optimization algorithms are developed and applied to objective functions that are expensive simulations (possibly with multiple local minima). The algorithms can be applied to most geophysical simulations, including those with nonlinear partial differential equations. The optimization does not require simulations be parallelized. Asynchronous (and synchronous) parallel execution is available in the optimization toolbox "pySOT". The parallel algorithms are modified from serial to eliminate fine grained parallelism. The optimization is computed with open source software pySOT, a Surrogate Global Optimization Toolbox that allows user to pick the type of surrogate (or ensembles), the search procedure on surrogate, and the type of parallelism (synchronous or asynchronous). pySOT also allows the user to develop new algorithms by modifying parts of the code. In the applications here, the objective function takes up to 30 minutes for one simulation, and serial optimization can take over 200 hours. Results from Yellowstone (NSF) and NCSS (Singapore) supercomputers are given for groundwater contaminant hydrology simulations with applications to model parameter estimation and decontamination management. All results are compared with alternatives. The first results are for optimization of pumping at many wells to reduce cost for decontamination of groundwater at a superfund site. The optimization runs with up to 128 processors. Superlinear speed up is obtained for up to 16 processors, and efficiency with 64 processors is over 80%. Each evaluation of the objective function requires the solution of nonlinear partial differential equations to describe the impact of spatially distributed pumping and model parameters on model predictions for the spatial and temporal distribution of groundwater contaminants. The second application uses an asynchronous parallel global optimization for groundwater quality model calibration. The time for a single objective

  17. Parallel block schemes for large scale least squares computations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Golub, G.H.; Plemmons, R.J.; Sameh, A.

    1986-04-01

    Large scale least squares computations arise in a variety of scientific and engineering problems, including geodetic adjustments and surveys, medical image analysis, molecular structures, partial differential equations and substructuring methods in structural engineering. In each of these problems, matrices often arise which possess a block structure which reflects the local connection nature of the underlying physical problem. For example, such super-large nonlinear least squares computations arise in geodesy. Here the coordinates of positions are calculated by iteratively solving overdetermined systems of nonlinear equations by the Gauss-Newton method. The US National Geodetic Survey will complete this year (1986) the readjustment ofmore » the North American Datum, a problem which involves over 540 thousand unknowns and over 6.5 million observations (equations). The observation matrix for these least squares computations has a block angular form with 161 diagnonal blocks, each containing 3 to 4 thousand unknowns. In this paper parallel schemes are suggested for the orthogonal factorization of matrices in block angular form and for the associated backsubstitution phase of the least squares computations. In addition, a parallel scheme for the calculation of certain elements of the covariance matrix for such problems is described. It is shown that these algorithms are ideally suited for multiprocessors with three levels of parallelism such as the Cedar system at the University of Illinois. 20 refs., 7 figs.« less

  18. A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications

    NASA Technical Reports Server (NTRS)

    Povitsky, Alex; Morris, Philip J.

    1999-01-01

    In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.

  19. Bilingual parallel programming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foster, I.; Overbeek, R.

    1990-01-01

    Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less

  20. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth

  1. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  2. A Multiscale Parallel Computing Architecture for Automated Segmentation of the Brain Connectome

    PubMed Central

    Knobe, Kathleen; Newton, Ryan R.; Schlimbach, Frank; Blower, Melanie; Reid, R. Clay

    2015-01-01

    Several groups in neurobiology have embarked into deciphering the brain circuitry using large-scale imaging of a mouse brain and manual tracing of the connections between neurons. Creating a graph of the brain circuitry, also called a connectome, could have a huge impact on the understanding of neurodegenerative diseases such as Alzheimer’s disease. Although considerably smaller than a human brain, a mouse brain already exhibits one billion connections and manually tracing the connectome of a mouse brain can only be achieved partially. This paper proposes to scale up the tracing by using automated image segmentation and a parallel computing approach designed for domain experts. We explain the design decisions behind our parallel approach and we present our results for the segmentation of the vasculature and the cell nuclei, which have been obtained without any manual intervention. PMID:21926011

  3. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  4. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  5. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  6. Parallel Computation of Flow in Heterogeneous Media Modelled by Mixed Finite Elements

    NASA Astrophysics Data System (ADS)

    Cliffe, K. A.; Graham, I. G.; Scheichl, R.; Stals, L.

    2000-11-01

    In this paper we describe a fast parallel method for solving highly ill-conditioned saddle-point systems arising from mixed finite element simulations of stochastic partial differential equations (PDEs) modelling flow in heterogeneous media. Each realisation of these stochastic PDEs requires the solution of the linear first-order velocity-pressure system comprising Darcy's law coupled with an incompressibility constraint. The chief difficulty is that the permeability may be highly variable, especially when the statistical model has a large variance and a small correlation length. For reasonable accuracy, the discretisation has to be extremely fine. We solve these problems by first reducing the saddle-point formulation to a symmetric positive definite (SPD) problem using a suitable basis for the space of divergence-free velocities. The reduced problem is solved using parallel conjugate gradients preconditioned with an algebraically determined additive Schwarz domain decomposition preconditioner. The result is a solver which exhibits a good degree of robustness with respect to the mesh size as well as to the variance and to physically relevant values of the correlation length of the underlying permeability field. Numerical experiments exhibit almost optimal levels of parallel efficiency. The domain decomposition solver (DOUG, http://www.maths.bath.ac.uk/~parsoft) used here not only is applicable to this problem but can be used to solve general unstructured finite element systems on a wide range of parallel architectures.

  7. An object-oriented approach for parallel self adaptive mesh refinement on block structured grids

    NASA Technical Reports Server (NTRS)

    Lemke, Max; Witsch, Kristian; Quinlan, Daniel

    1993-01-01

    Self-adaptive mesh refinement dynamically matches the computational demands of a solver for partial differential equations to the activity in the application's domain. In this paper we present two C++ class libraries, P++ and AMR++, which significantly simplify the development of sophisticated adaptive mesh refinement codes on (massively) parallel distributed memory architectures. The development is based on our previous research in this area. The C++ class libraries provide abstractions to separate the issues of developing parallel adaptive mesh refinement applications into those of parallelism, abstracted by P++, and adaptive mesh refinement, abstracted by AMR++. P++ is a parallel array class library to permit efficient development of architecture independent codes for structured grid applications, and AMR++ provides support for self-adaptive mesh refinement on block-structured grids of rectangular non-overlapping blocks. Using these libraries, the application programmers' work is greatly simplified to primarily specifying the serial single grid application and obtaining the parallel and self-adaptive mesh refinement code with minimal effort. Initial results for simple singular perturbation problems solved by self-adaptive multilevel techniques (FAC, AFAC), being implemented on the basis of prototypes of the P++/AMR++ environment, are presented. Singular perturbation problems frequently arise in large applications, e.g. in the area of computational fluid dynamics. They usually have solutions with layers which require adaptive mesh refinement and fast basic solvers in order to be resolved efficiently.

  8. Bit-parallel arithmetic in a massively-parallel associative processor

    NASA Technical Reports Server (NTRS)

    Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

    1992-01-01

    A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.

  9. Lamotrigine add-on for drug-resistant partial epilepsy.

    PubMed

    Ramaratnam, S; Marson, A G; Baker, G A

    2001-01-01

    Epilepsy is a common neurological disorder, affecting almost 0.5 to 1% of the population. Nearly 30% of patients with epilepsy are refractory to currently available drugs. Lamotrigine is one of the newer antiepileptic drugs and is the topic of this review. To examine the effects of lamotrigine on seizures, side effects, cognition and quality of life, when used as an add-on treatment for patients with drug-resistant partial epilepsy. We searched the Cochrane Epilepsy Group trials register, the Cochrane Controlled Trials Register (Cochrane Library Issue 2, 2001), MEDLINE (January 1966 to April 2001) and reference lists of articles. We also contacted the manufacturers of lamotrigine (Glaxo-Wellcome). Randomized placebo controlled trials, of patients with drug-resistant partial epilepsy of any age, in which an adequate method of concealment of randomization was used. The studies may be double, single or unblinded. For crossover studies, the first treatment period was treated as a parallel trial. Two reviewers independently assessed the trials for inclusion and extracted data. Primary analyses were by intention to treat. Outcomes included 50% or greater reduction in seizure frequency, treatment withdrawal (any reason), side effects, effects on cognition, and quality of life. We found three parallel add-on studies and eight cross-over studies, which included 1243 patients (199 children and 1044 adults). The overall Peto's Odds Ratio (OR) and 95% confidence intervals (CIs) across all studies for 50% or greater reduction in seizure frequency was 2.71 (1.87, 3.91) indicating that lamotrigine is significantly more effective than placebo in reducing seizure frequency. The overall OR (95%CI) for treatment withdrawal (for any reason) is 1.12 (0.78, 1.61). The 99% CIs for ataxia, dizziness, nausea, and diplopia do not include unity, indicating that they are significantly associated with lamotrigine. The limited data available precludes any conclusions about effects on cognition

  10. Lamotrigine add-on for drug-resistant partial epilepsy.

    PubMed

    Ramaratnam, S; Marson, A G; Baker, G A

    2000-01-01

    Epilepsy is a common neurological disorder, affecting almost 0.5 to 1% of the population. Nearly 30% of patients with epilepsy are refractory to currently available drugs. Lamotrigine is one of the newer antiepileptic drugs and is the topic of this review. To examine the effects of lamotrigine on seizures, side effects, cognition and quality of life, when used as an add-on treatment for patients with drug-resistant partial epilepsy. We searched the Cochrane Epilepsy Group trials register, the Cochrane Controlled Trials Register (Cochrane Library Issue 1, 2000), MEDLINE (January 1966 to December 1999) and reference lists of articles. We also contacted the manufacturers of lamotrigine (Glaxo-Wellcome). Randomized placebo controlled trials, of patients with drug-resistant partial epilepsy of any age, in which an adequate method of concealment of randomization was used. The studies may be double, single or unblinded. For crossover studies, the first treatment period was treated as a parallel trial. Two reviewers independently assessed the trials for inclusion and extracted data. Primary analyses were by intention to treat. Outcomes included 50% or greater reduction in seizure frequency, treatment withdrawal (any reason), side effects, effects on cognition, and quality of life. We found three parallel add-on studies and eight cross-over studies, which included 1243 patients (199 children and 1044 adults). The overall Peto's Odds Ratio (OR) and 95% confidence intervals (CIs) across all studies for 50% or greater reduction in seizure frequency was 2.71 (1.87, 3.91) indicating that lamotrigine is significantly more effective than placebo in reducing seizure frequency. The overall OR (95%CI) for treatment withdrawal (for any reason) is 1.12 (0.78, 1. 61). The 99% CIs for ataxia, dizziness, nausea, and diplopia do not include unity, indicating that they are significantly associated with lamotrigine. The limited data available preclude any conclusions about effects on

  11. Hybrid Parallelization of Adaptive MHD-Kinetic Module in Multi-Scale Fluid-Kinetic Simulation Suite

    DOE PAGES

    Borovikov, Sergey; Heerikhuisen, Jacob; Pogorelov, Nikolai

    2013-04-01

    The Multi-Scale Fluid-Kinetic Simulation Suite has a computational tool set for solving partially ionized flows. In this paper we focus on recent developments of the kinetic module which solves the Boltzmann equation using the Monte-Carlo method. The module has been recently redesigned to utilize intra-node hybrid parallelization. We describe in detail the redesign process, implementation issues, and modifications made to the code. Finally, we conduct a performance analysis.

  12. A scalable parallel black oil simulator on distributed memory parallel computers

    NASA Astrophysics Data System (ADS)

    Wang, Kun; Liu, Hui; Chen, Zhangxin

    2015-11-01

    This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.

  13. Parallel algorithms for mapping pipelined and parallel computations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  14. Parallel algorithm for determining motion vectors in ice floe images by matching edge features

    NASA Technical Reports Server (NTRS)

    Manohar, M.; Ramapriyan, H. K.; Strong, J. P.

    1988-01-01

    A parallel algorithm is described to determine motion vectors of ice floes using time sequences of images of the Arctic ocean obtained from the Synthetic Aperture Radar (SAR) instrument flown on-board the SEASAT spacecraft. Researchers describe a parallel algorithm which is implemented on the MPP for locating corresponding objects based on their translationally and rotationally invariant features. The algorithm first approximates the edges in the images by polygons or sets of connected straight-line segments. Each such edge structure is then reduced to a seed point. Associated with each seed point are the descriptions (lengths, orientations and sequence numbers) of the lines constituting the corresponding edge structure. A parallel matching algorithm is used to match packed arrays of such descriptions to identify corresponding seed points in the two images. The matching algorithm is designed such that fragmentation and merging of ice floes are taken into account by accepting partial matches. The technique has been demonstrated to work on synthetic test patterns and real image pairs from SEASAT in times ranging from .5 to 0.7 seconds for 128 x 128 images.

  15. All-Optical Two-Dimensional Serial-to-Parallel Pulse Converter Using an Organic Film with Femtosecond Optical Response

    NASA Astrophysics Data System (ADS)

    Tatsuura, Satoshi; Wada, Osamu; Furuki, Makoto; Tian, Minquan; Sato, Yasuhiro; Iwasa, Izumi; Pu, Lyong Sun

    2001-04-01

    In this study, we introduce a new concept of all-optical two-dimensional serial-to-parallel pulse converters. Femtosecond optical pulses can be understood as thin plates of light traveling in space. When a femtosecond signal-pulse train and a single gate pulse were fed onto a material with a finite incident angle, each signal-pulse plate met the gate-pulse plate at different locations in the material due to the time-of-flight effect. Meeting points can be made two-dimensional by adding a partial time delay to the gate pulse. By placing a nonlinear optical material at an appropriate position, two-dimensional serial-to-parallel conversion of a signal-pulse train can be achieved with a single gate pulse. We demonstrated the detection of parallel outputs from a 1-Tb/s optical-pulse train through the use of a BaB2O4 crystal. We also succeeded in demonstrating 1-Tb/s serial-to-parallel operation through the use of a novel organic nonlinear optical material, squarylium-dye J-aggregate film, which exhibits ultrafast recovery of bleached absorption.

  16. Parallel computing works

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less

  17. Tape casting and partial melting of Bi-2212 thick films

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buhl, D.; Lang, T.; Heeb, B.

    1994-12-31

    To produce Bi-2212 thick films with high critical current densities tape casting and partial melting is a promising fabrication method. Bi-2212 powder and organic additives were mixed into a slurry and tape casted onto glass by the doctor blade tape casting process. The films were cut from the green tape and partially molten on Ag foils during heat treatment. We obtained almost single-phase and well-textured films over the whole thickness of 20 {mu}m. The orientation of the (a,b)-plane of the grains were parallel to the substrate with a misalignment of less than 6{degrees}. At 77K/OT a critical current density ofmore » 15`000 A/cm{sup 2} was reached in films of the dimension 1cm x 2cm x 20{mu}m (1{mu}V/cm criterion, resistively measured). At 4K/OT the highest value was 350`000 A/cm{sup 2} (1nV/cm criterion, magnetically measured).« less

  18. Tape casting and partial melting of Bi-2212 thick films

    NASA Technical Reports Server (NTRS)

    Buhl, D.; Lang, TH.; Heeb, B.; Gauckler, L. J.

    1995-01-01

    To produce Bi-2212 thick films with high critical current densities tape casting and partial melting is a promising fabrication method. Bi-2212 powder and organic additives were mixed into a slurry and tape casted onto glass by the doctor blade tape casting process. The films were cut from the green tape and partially molten on Ag foils during heat treatment. We obtained almost single-phase and well-textured films over the whole thickness of 20 microns. The orientation of the (a,b)-plane of the grains was parallel to the substrate with a misalignment of less than 6 deg. At 77 K/0T a critical current density of 15, 000 A/sq cm was reached in films of the dimension 1 cm x 2 cm x 20 microns (1 micron V/cm criterion, resistively measured). At 4 K/0T the highest value was 350,000 A/sq cm (1 nV/cm criterion, magnetically measured).

  19. Template based parallel checkpointing in a massively parallel computer system

    DOEpatents

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  20. The role of single immediate loading implant in long Class IV Kennedy mandibular partial denture.

    PubMed

    Mohamed, Gehan F; El Sawy, Amal A

    2012-10-01

    The treatment of long-span Kennedy class IV considers a prosthodontic challenge. This study evaluated the integrity of principle abutments in long Kennedy class IV clinically and radiographically, when rehabilitated with conventional metallic partial denture as a control group and mandibular partial overdentures supported with single immediately loaded implant in symphyseal as a study group. Twelve male patients were divided randomly allotted into two equal groups. First group patients received removable metallic partial denture, whereas in the second group, patients received partial overdentures supported with single immediately loaded implant in symphyseal region. The partial dentures design in both groups was the same. Long-cone paralleling technique and transmission densitometer were used at the time of denture insertion, 3, 6, and 12 months. Gingival index, bone loss, and optical density were measured for principle abutments during the follow-up. A significant reduction in bone loss and density were detected in group II comparing with group I. Gingival index had no significant change (p-value < 0.05). A single symphyseal implant in long span class IV Kennedy can play a pivotal role to improve the integrity of the principle abutments and alveolar bone support. © 2010 Wiley Periodicals, Inc.

  1. Research in parallel computing

    NASA Technical Reports Server (NTRS)

    Ortega, James M.; Henderson, Charles

    1994-01-01

    This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.

  2. [Constrained competition in parallel drug importation: the case of simvastatin in Germany, the Netherlands, and the United Kingdom].

    PubMed

    Costa-Font, Joan; Kanavos, Panos

    2007-01-01

    To examine the effects of parallel simvastatin importation on drug price in three of the main parallel importing countries in the European Union, namely the United Kingdom, Germany, and the Netherlands. To estimate the market share of parallel imported simvastatin and the unit price -both locally produced and parallel imported- adjusted by defined daily dose in the importing country and in the exporting country (Spain). Ordinary least squares regression was used to examine the potential price competition resulting from parallel drug trade between 1997 and 2002. The market share of parallel imported simvastatin progressively expanded (especially in the United Kingdom and Germany) in the period examined, although the price difference between parallel imported and locally sourced simvastatin was not significant. Prices tended to rise in the United Kingdom and Germany and declined in the Netherlands. We found no evidence of pro-competitive effects resulting from the expansion of parallel trade. The development of parallel drug importation in the European Union produced unexpected effects (limited competition) on prices that differ from those expected by the introduction of a new competitor. This is partially the result of drug price regulation scant incentives to competition and of the lack of transparency in the drug reimbursement system, especially due to the effect of informal discounts (not observable to researchers). The case of simvastatin reveals that savings to the health system from parallel trade are trivial. Finally, of the three countries examined, the only country that shows a moderate downward pattern in simvastatin prices is the Netherlands. This effect can be attributed to the existence of a system that claws back informal discounts.

  3. Synchronous partial melting, deformation, and magmatism: evidence from in an exhumed Proterozoic orogen

    NASA Astrophysics Data System (ADS)

    Levine, J. S. F.; Mosher, S.

    2017-12-01

    Older orogenic belts that now expose the middle and lower crust record interaction between partial melting, magmatism, and deformation. A field- and microstructural-based case study from the Wet Mountains of central Colorado, an exhumed section of Proterozoic rock, shows structures associated with anatexis and magmatism, from the grain- to the kilometer-scale, that indicate the interconnection between deformation, partial melting, and magmatism, and allow reconstructions of the processes occurring in hot active orogens. Metamorphic grade, along with the degree of deformation, partial melting, and magmatism increase from northwest to southeast. Deformation synchronous with this high-grade metamorphic event is localized into areas with greater quantities of former melt, and preferential melting occurs within high-strain locations. In the less deformed northwest, partial melting occurs dominantly via muscovite-dehydration melting, with a low abundance of partial melting, and an absence of granitic magmatism. The central Wet Mountains are characterized by biotite dehydration melting, abundant former melt and foliation-parallel inferred melt channels along grain boundaries, and the presence of a nearby granitic pluton. Rocks in the southern portion of the Wet Mountains are characterized by partial melting via both biotite dehydration and granitic wet melting, with widespread partial melting as evidenced by well-preserved former melt microstructures and evidence for back reaction between melt and the host rocks. The southern Wet Mountains has more intense deformation and widespread plutonism than other locations and two generations of dikes and sills. Recognition of textures and fabrics associated with partial melting in older orogens is paramount for interpreting the complex interplay of processes occurring in the cores of orogenic systems.

  4. Three-dimensional through-time radial GRAPPA for renal MR angiography.

    PubMed

    Wright, Katherine L; Lee, Gregory R; Ehses, Philipp; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole

    2014-10-01

    To achieve high temporal and spatial resolution for contrast-enhanced time-resolved MR angiography exams (trMRAs), fast imaging techniques such as non-Cartesian parallel imaging must be used. In this study, the three-dimensional (3D) through-time radial generalized autocalibrating partially parallel acquisition (GRAPPA) method is used to reconstruct highly accelerated stack-of-stars data for time-resolved renal MRAs. Through-time radial GRAPPA has been recently introduced as a method for non-Cartesian GRAPPA weight calibration, and a similar concept can also be used in 3D acquisitions. By combining different sources of calibration information, acquisition time can be reduced. Here, different GRAPPA weight calibration schemes are explored in simulation, and the results are applied to reconstruct undersampled stack-of-stars data. Simulations demonstrate that an accurate and efficient approach to 3D calibration is to combine a small number of central partitions with as many temporal repetitions as exam time permits. These findings were used to reconstruct renal trMRA data with an in-plane acceleration factor as high as 12.6 with respect to the Nyquist sampling criterion, where the lowest root mean squared error value of 16.4% was achieved when using a calibration scheme with 8 partitions, 16 repetitions, and a 4 projection × 8 read point segment size. 3D through-time radial GRAPPA can be used to successfully reconstruct highly accelerated non-Cartesian data. By using in-plane radial undersampling, a trMRA can be acquired with a temporal footprint less than 4s/frame with a spatial resolution of approximately 1.5 mm × 1.5 mm × 3 mm. © 2014 Wiley Periodicals, Inc.

  5. Robot-assisted partial nephrectomy: Superiority over laparoscopic partial nephrectomy.

    PubMed

    Shiroki, Ryoichi; Fukami, Naohiko; Fukaya, Kosuke; Kusaka, Mamoru; Natsume, Takahiro; Ichihara, Takashi; Toyama, Hiroshi

    2016-02-01

    Nephron-sparing surgery has been proven to positively impact the postoperative quality of life for the treatment of small renal tumors, possibly leading to functional improvements. Laparoscopic partial nephrectomy is still one of the most demanding procedures in urological surgery. Laparoscopic partial nephrectomy sometimes results in extended warm ischemic time and severe complications, such as open conversion, postoperative hemorrhage and urine leakage. Robot-assisted partial nephrectomy exploits the advantages offered by the da Vinci Surgical System to laparoscopic partial nephrectomy, equipped with 3-D vision and a better degree in the freedom of surgical instruments. The introduction of the da Vinci Surgical System made nephron-sparing surgery, specifically robot-assisted partial nephrectomy, safe with promising results, leading to the shortening of warm ischemic time and a reduction in perioperative complications. Even for complex and challenging tumors, robotic assistance is expected to provide the benefit of minimally-invasive surgery with safe and satisfactory renal function. Warm ischemic time is the modifiable factor during robot-assisted partial nephrectomy to affect postoperative kidney function. We analyzed the predictive factors for extended warm ischemic time from our robot-assisted partial nephrectomy series. The surface area of the tumor attached to the kidney parenchyma was shown to significantly affect the extended warm ischemic time during robot-assisted partial nephrectomy. In cases with tumor-attached surface area more than 15 cm(2) , we should consider switching robot-assisted partial nephrectomy to open partial nephrectomy under cold ischemia if it is imperative. In Japan, a nationwide prospective study has been carried out to show the superiority of robot-assisted partial nephrectomy to laparoscopic partial nephrectomy in improving warm ischemic time and complications. By facilitating robotic technology, robot-assisted partial nephrectomy

  6. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.

  7. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    NASA Astrophysics Data System (ADS)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL

  8. Correction for Eddy Current-Induced Echo-Shifting Effect in Partial-Fourier Diffusion Tensor Imaging.

    PubMed

    Truong, Trong-Kha; Song, Allen W; Chen, Nan-Kuei

    2015-01-01

    In most diffusion tensor imaging (DTI) studies, images are acquired with either a partial-Fourier or a parallel partial-Fourier echo-planar imaging (EPI) sequence, in order to shorten the echo time and increase the signal-to-noise ratio (SNR). However, eddy currents induced by the diffusion-sensitizing gradients can often lead to a shift of the echo in k-space, resulting in three distinct types of artifacts in partial-Fourier DTI. Here, we present an improved DTI acquisition and reconstruction scheme, capable of generating high-quality and high-SNR DTI data without eddy current-induced artifacts. This new scheme consists of three components, respectively, addressing the three distinct types of artifacts. First, a k-space energy-anchored DTI sequence is designed to recover eddy current-induced signal loss (i.e., Type 1 artifact). Second, a multischeme partial-Fourier reconstruction is used to eliminate artificial signal elevation (i.e., Type 2 artifact) associated with the conventional partial-Fourier reconstruction. Third, a signal intensity correction is applied to remove artificial signal modulations due to eddy current-induced erroneous T2(∗) -weighting (i.e., Type 3 artifact). These systematic improvements will greatly increase the consistency and accuracy of DTI measurements, expanding the utility of DTI in translational applications where quantitative robustness is much needed.

  9. Correction for Eddy Current-Induced Echo-Shifting Effect in Partial-Fourier Diffusion Tensor Imaging

    PubMed Central

    Truong, Trong-Kha; Song, Allen W.; Chen, Nan-kuei

    2015-01-01

    In most diffusion tensor imaging (DTI) studies, images are acquired with either a partial-Fourier or a parallel partial-Fourier echo-planar imaging (EPI) sequence, in order to shorten the echo time and increase the signal-to-noise ratio (SNR). However, eddy currents induced by the diffusion-sensitizing gradients can often lead to a shift of the echo in k-space, resulting in three distinct types of artifacts in partial-Fourier DTI. Here, we present an improved DTI acquisition and reconstruction scheme, capable of generating high-quality and high-SNR DTI data without eddy current-induced artifacts. This new scheme consists of three components, respectively, addressing the three distinct types of artifacts. First, a k-space energy-anchored DTI sequence is designed to recover eddy current-induced signal loss (i.e., Type 1 artifact). Second, a multischeme partial-Fourier reconstruction is used to eliminate artificial signal elevation (i.e., Type 2 artifact) associated with the conventional partial-Fourier reconstruction. Third, a signal intensity correction is applied to remove artificial signal modulations due to eddy current-induced erroneous T 2 ∗-weighting (i.e., Type 3 artifact). These systematic improvements will greatly increase the consistency and accuracy of DTI measurements, expanding the utility of DTI in translational applications where quantitative robustness is much needed. PMID:26413505

  10. Increased Energy Delivery for Parallel Battery Packs with No Regulated Bus

    NASA Astrophysics Data System (ADS)

    Hsu, Chung-Ti

    In this dissertation, a new approach to paralleling different battery types is presented. A method for controlling charging/discharging of different battery packs by using low-cost bi-directional switches instead of DC-DC converters is proposed. The proposed system architecture, algorithms, and control techniques allow batteries with different chemistry, voltage, and SOC to be properly charged and discharged in parallel without causing safety problems. The physical design and cost for the energy management system is substantially reduced. Additionally, specific types of failures in the maximum power point tracking (MPPT) in a photovoltaic (PV) system when tracking only the load current of a DC-DC converter are analyzed. The periodic nonlinear load current will lead MPPT realized by the conventional perturb and observe (P&O) algorithm to be problematic. A modified MPPT algorithm is proposed and it still only requires typically measured signals, yet is suitable for both linear and periodic nonlinear loads. Moreover, for a modular DC-DC converter using several converters in parallel, the input power from PV panels is processed and distributed at the module level. Methods for properly implementing distributed MPPT are studied. A new approach to efficient MPPT under partial shading conditions is presented. The power stage architecture achieves fast input current change rate by combining a current-adjustable converter with a few converters operating at a constant current.

  11. The parallel-sequential field subtraction technique for coherent nonlinear ultrasonic imaging

    NASA Astrophysics Data System (ADS)

    Cheng, Jingwei; Potter, Jack N.; Drinkwater, Bruce W.

    2018-06-01

    Nonlinear imaging techniques have recently emerged which have the potential to detect cracks at a much earlier stage than was previously possible and have sensitivity to partially closed defects. This study explores a coherent imaging technique based on the subtraction of two modes of focusing: parallel, in which the elements are fired together with a delay law and sequential, in which elements are fired independently. In the parallel focusing a high intensity ultrasonic beam is formed in the specimen at the focal point. However, in sequential focusing only low intensity signals from individual elements enter the sample and the full matrix of transmit-receive signals is recorded and post-processed to form an image. Under linear elastic assumptions, both parallel and sequential images are expected to be identical. Here we measure the difference between these images and use this to characterise the nonlinearity of small closed fatigue cracks. In particular we monitor the change in relative phase and amplitude at the fundamental frequencies for each focal point and use this nonlinear coherent imaging metric to form images of the spatial distribution of nonlinearity. The results suggest the subtracted image can suppress linear features (e.g. back wall or large scatters) effectively when instrumentation noise compensation in applied, thereby allowing damage to be detected at an early stage (c. 15% of fatigue life) and reliably quantified in later fatigue life.

  12. Solving the Cauchy-Riemann equations on parallel computers

    NASA Technical Reports Server (NTRS)

    Fatoohi, Raad A.; Grosch, Chester E.

    1987-01-01

    Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.

  13. Parallel digital forensics infrastructure.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liebrock, Lorie M.; Duggan, David Patrick

    2009-10-01

    This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexicomore » Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.« less

  14. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Lau, Sonie

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

  15. Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu

    2012-10-01

    We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re

  16. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  17. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  18. Parallel Algorithms and Patterns

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robey, Robert W.

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  19. Application Portable Parallel Library

    NASA Technical Reports Server (NTRS)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  20. Parallel Logic Programming and Parallel Systems Software and Hardware

    DTIC Science & Technology

    1989-07-29

    Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted

  1. Feasibility of the determination of polycyclic aromatic hydrocarbons in edible oils via unfolded partial least-squares/residual bilinearization and parallel factor analysis of fluorescence excitation emission matrices.

    PubMed

    Alarcón, Francis; Báez, María E; Bravo, Manuel; Richter, Pablo; Escandar, Graciela M; Olivieri, Alejandro C; Fuentes, Edwar

    2013-01-15

    The possibility of simultaneously determining seven concerned heavy polycyclic aromatic hydrocarbons (PAHs) of the US-EPA priority pollutant list, in extra virgin olive and sunflower oils was examined using unfolded partial least-squares with residual bilinearization (U-PLS/RBL) and parallel factor analysis (PARAFAC). Both of these methods were applied to fluorescence excitation emission matrices. The compounds studied were benzo[a]anthracene, benzo[b]fluoranthene, benzo[k]fluoranthene, benzo[a]pyrene, dibenz[a,h]anthracene, benzo[g,h,i]perylene and indeno[1,2,3-c,d]-pyrene. The analysis was performed using fluorescence spectroscopy after a microwave assisted liquid-liquid extraction and solid-phase extraction on silica. The U-PLS/RBL algorithm exhibited the best performance for resolving the heavy PAH mixture in the presence of both the highly complex oil matrix and other unpredicted PAHs of the US-EPA list. The obtained limit of detection for the proposed method ranged from 0.07 to 2 μg kg(-1). The predicted U-PLS/RBL concentrations were satisfactorily compared with those obtained using high-performance liquid chromatography with fluorescence detection. A simple analysis with a considerable reduction in time and solvent consumption in comparison with chromatography are the principal advantages of the proposed method. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

    Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less

  3. Trench-parallel flow beneath the nazca plate from seismic anisotropy.

    PubMed

    Russo, R M; Silver, P G

    1994-02-25

    Shear-wave splitting of S and SKS phases reveals the anisotropy and strain field of the mantle beneath the subducting Nazca plate, Cocos plate, and the Caribbean region. These observations can be used to test models of mantle flow. Two-dimensional entrained mantle flow beneath the subducting Nazca slab is not consistent with the data. Rather, there is evidence for horizontal trench-parallel flow in the mantle beneath the Nazca plate along much of the Andean subduction zone. Trench-parallel flow is attributale utable to retrograde motion of the slab, the decoupling of the slab and underlying mantle, and a partial barrier to flow at depth, resulting in lateral mantle flow beneath the slab. Such flow facilitates the transfer of material from the shrinking mantle reservoir beneath the Pacific basin to the growing mantle reservoir beneath the Atlantic basin. Trenchparallel flow may explain the eastward motions of the Caribbean and Scotia sea plates, the anomalously shallow bathymetry of the eastern Nazca plate, the long-wavelength geoid high over western South America, and it may contribute to the high elevation and intense deformation of the central Andes.

  4. Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU.

    PubMed

    Xia, Yong; Wang, Kuanquan; Zhang, Henggui

    2015-01-01

    Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.

  5. Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

    PubMed

    Tam, Wing-Kin; Yang, Zhi

    2018-05-01

    Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel; Lesoinne, Michel

    1993-01-01

    Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.

  7. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Data communications in a parallel active messaging interface ('PAMI') or a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution of a compute node, including specification of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications instruction, the instruction characterized by instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance witht the instruction type, the transfer data from the origin endpoin to the target endpoint.

  8. Parallels in History.

    ERIC Educational Resources Information Center

    Mugleston, William F.

    2000-01-01

    Believes that by focusing on the recurrent situations and problems, or parallels, throughout history, students will understand the relevance of history to their own times and lives. Provides suggestions for parallels in history that may be introduced within lectures or as a means to class discussions. (CMK)

  9. Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Choudhary, Alok Nidhi

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.

  10. Integrated Task and Data Parallel Programming

    NASA Technical Reports Server (NTRS)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated

  11. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-29

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.

  12. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2014-10-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

  13. File concepts for parallel I/O

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1989-01-01

    The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.

  14. Non-Cartesian Parallel Imaging Reconstruction

    PubMed Central

    Wright, Katherine L.; Hamilton, Jesse I.; Griswold, Mark A.; Gulani, Vikas; Seiberlich, Nicole

    2014-01-01

    Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be employed to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the non-homogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian GRAPPA, and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. PMID:24408499

  15. A numerical differentiation library exploiting parallel architectures

    NASA Astrophysics Data System (ADS)

    Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

    2009-08-01

    We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such

  16. An observational and thermodynamic investigation of carbonate partial melting

    NASA Astrophysics Data System (ADS)

    Floess, David; Baumgartner, Lukas P.; Vonlanthen, Pierre

    2015-01-01

    Melting experiments available in the literature show that carbonates and pelites melt at similar conditions in the crust. While partial melting of pelitic rocks is common and well-documented, reports of partial melting in carbonates are rare and ambiguous, mainly because of intensive recrystallization and the resulting lack of criteria for unequivocal identification of melting. Here we present microstructural, textural, and geochemical evidence for partial melting of calcareous dolomite marbles in the contact aureole of the Tertiary Adamello Batholith. Petrographic observations and X-ray micro-computed tomography (X-ray μCT) show that calcite crystallized either in cm- to dm-scale melt pockets, or as an interstitial phase forming an interconnected network between dolomite grains. Calcite-dolomite thermometry yields a temperature of at least 670 °C, which is well above the minimum melting temperature of ∼600 °C reported for the CaO-MgO-CO2-H2O system. Rare-earth element (REE) partition coefficients (KDcc/do) range between 9-35 for adjacent calcite-dolomite pairs. These KD values are 3-10 times higher than equilibrium values between dolomite and calcite reported in the literature. They suggest partitioning of incompatible elements into a melt phase. The δ18O and δ13C isotopic values of calcite and dolomite support this interpretation. Crystallographic orientations measured by electron backscattered diffraction (EBSD) show a clustering of c-axes for dolomite and interstitial calcite normal to the foliation plane, a typical feature for compressional deformation, whereas calcite crystallized in pockets shows a strong clustering of c-axes parallel to the pocket walls, suggesting that it crystallized after deformation had stopped. All this together suggests the formation of partial melts in these carbonates. A Schreinemaker analysis of the experimental data for a CO2-H2O fluid-saturated system indeed predicts formation of calcite-rich melt between 650-880 °C, in

  17. Parallel k-means++

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less

  18. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2015-11-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.

  19. Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    NASA Technical Reports Server (NTRS)

    Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

    1994-01-01

    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.

  20. In-memory integration of existing software components for parallel adaptive unstructured mesh workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett

    Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.

  1. In-memory integration of existing software components for parallel adaptive unstructured mesh workflows

    DOE PAGES

    Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett; ...

    2017-01-01

    Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.

  2. Unsteady stokes flow of dusty fluid between two parallel plates through porous medium in the presence of magnetic field

    NASA Astrophysics Data System (ADS)

    Sasikala, R.; Govindarajan, A.; Gayathri, R.

    2018-04-01

    This paper focus on the result of dust particle between two parallel plates through porous medium in the presence of magnetic field with constant suction in the upper plate and constant injection in the lower plate. The partial differential equations governing the flow are solved by similarity transformation. The velocity of the fluid and the dust particle decreases when there is an increase in the Hartmann number.

  3. Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

    NASA Astrophysics Data System (ADS)

    Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

    2017-07-01

    Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).

  4. Partial melting of amphibolite to trondhjemite at Nunatak Fiord, St. Elias Mountains, Alaska

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barker, F.; McLellan, E.L.; Plafker, G.

    1985-01-01

    At Nunatak Fiord, 55km NE of Yakutat, Alaska, a uniform layer of Cretaceous basalt ca. 3km thick was metamorphosed ca. 67 million years ago to amphibolite and locally partially melted to pegmatitic trondhjemite. Segregations of plagioclase-quartz+/-biotite rock, leucosomes in amphibolite matrix, range from stringers 5-10mm thick to blunt pods as thick as 6m. They tend to be parallel to foliation of the amphibolite, but crosscutting is common. The assemblage aluminous hornblende-plagioclase-epidote-sphene-quartz gave a hydrous melt that crystallized to plagioclase-quartz+/-biotite pegmatitic trondhjemite. 5-10% of the rock melted. Eu at 2x chondrites is positively anomalous. REE partitioning in melt/residum was controlled largelymore » by hornblende and sphene. Though the mineralogical variability precludes quantitative modeling, partial melting of garnet-free amphibolite to heavy-REE-depleted trondhjemitic melt is a viable process.« less

  5. Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU

    PubMed Central

    Xia, Yong; Zhang, Henggui

    2015-01-01

    Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations. PMID:26581957

  6. Parallel Computing Strategies for Irregular Algorithms

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  7. A Spaceborne Synthetic Aperture Radar Partial Fixed-Point Imaging System Using a Field- Programmable Gate Array—Application-Specific Integrated Circuit Hybrid Heterogeneous Parallel Acceleration Technique

    PubMed Central

    Li, Bingyi; Chen, Liang; Wei, Chunpeng; Xie, Yizhuang; Chen, He; Yu, Wenyue

    2017-01-01

    With the development of satellite load technology and very large scale integrated (VLSI) circuit technology, onboard real-time synthetic aperture radar (SAR) imaging systems have become a solution for allowing rapid response to disasters. A key goal of the onboard SAR imaging system design is to achieve high real-time processing performance with severe size, weight, and power consumption constraints. In this paper, we analyse the computational burden of the commonly used chirp scaling (CS) SAR imaging algorithm. To reduce the system hardware cost, we propose a partial fixed-point processing scheme. The fast Fourier transform (FFT), which is the most computation-sensitive operation in the CS algorithm, is processed with fixed-point, while other operations are processed with single precision floating-point. With the proposed fixed-point processing error propagation model, the fixed-point processing word length is determined. The fidelity and accuracy relative to conventional ground-based software processors is verified by evaluating both the point target imaging quality and the actual scene imaging quality. As a proof of concept, a field- programmable gate array—application-specific integrated circuit (FPGA-ASIC) hybrid heterogeneous parallel accelerating architecture is designed and realized. The customized fixed-point FFT is implemented using the 130 nm complementary metal oxide semiconductor (CMOS) technology as a co-processor of the Xilinx xc6vlx760t FPGA. A single processing board requires 12 s and consumes 21 W to focus a 50-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384. PMID:28672813

  8. A Spaceborne Synthetic Aperture Radar Partial Fixed-Point Imaging System Using a Field- Programmable Gate Array-Application-Specific Integrated Circuit Hybrid Heterogeneous Parallel Acceleration Technique.

    PubMed

    Yang, Chen; Li, Bingyi; Chen, Liang; Wei, Chunpeng; Xie, Yizhuang; Chen, He; Yu, Wenyue

    2017-06-24

    With the development of satellite load technology and very large scale integrated (VLSI) circuit technology, onboard real-time synthetic aperture radar (SAR) imaging systems have become a solution for allowing rapid response to disasters. A key goal of the onboard SAR imaging system design is to achieve high real-time processing performance with severe size, weight, and power consumption constraints. In this paper, we analyse the computational burden of the commonly used chirp scaling (CS) SAR imaging algorithm. To reduce the system hardware cost, we propose a partial fixed-point processing scheme. The fast Fourier transform (FFT), which is the most computation-sensitive operation in the CS algorithm, is processed with fixed-point, while other operations are processed with single precision floating-point. With the proposed fixed-point processing error propagation model, the fixed-point processing word length is determined. The fidelity and accuracy relative to conventional ground-based software processors is verified by evaluating both the point target imaging quality and the actual scene imaging quality. As a proof of concept, a field- programmable gate array-application-specific integrated circuit (FPGA-ASIC) hybrid heterogeneous parallel accelerating architecture is designed and realized. The customized fixed-point FFT is implemented using the 130 nm complementary metal oxide semiconductor (CMOS) technology as a co-processor of the Xilinx xc6vlx760t FPGA. A single processing board requires 12 s and consumes 21 W to focus a 50-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384.

  9. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan

    PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less

  10. CRUNCH_PARALLEL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shumaker, Dana E.; Steefel, Carl I.

    The code CRUNCH_PARALLEL is a parallel version of the CRUNCH code. CRUNCH code version 2.0 was previously released by LLNL, (UCRL-CODE-200063). Crunch is a general purpose reactive transport code developed by Carl Steefel and Yabusake (Steefel Yabsaki 1996). The code handles non-isothermal transport and reaction in one, two, and three dimensions. The reaction algorithm is generic in form, handling an arbitrary number of aqueous and surface complexation as well as mineral dissolution/precipitation. A standardized database is used containing thermodynamic and kinetic data. The code includes advective, dispersive, and diffusive transport.

  11. A simple hyperbolic model for communication in parallel processing environments

    NASA Technical Reports Server (NTRS)

    Stoica, Ion; Sultan, Florin; Keyes, David

    1994-01-01

    We introduce a model for communication costs in parallel processing environments called the 'hyperbolic model,' which generalizes two-parameter dedicated-link models in an analytically simple way. Dedicated interprocessor links parameterized by a latency and a transfer rate that are independent of load are assumed by many existing communication models; such models are unrealistic for workstation networks. The communication system is modeled as a directed communication graph in which terminal nodes represent the application processes that initiate the sending and receiving of the information and in which internal nodes, called communication blocks (CBs), reflect the layered structure of the underlying communication architecture. The direction of graph edges specifies the flow of the information carried through messages. Each CB is characterized by a two-parameter hyperbolic function of the message size that represents the service time needed for processing the message. The parameters are evaluated in the limits of very large and very small messages. Rules are given for reducing a communication graph consisting of many to an equivalent two-parameter form, while maintaining an approximation for the service time that is exact in both large and small limits. The model is validated on a dedicated Ethernet network of workstations by experiments with communication subprograms arising in scientific applications, for which a tight fit of the model predictions with actual measurements of the communication and synchronization time between end processes is demonstrated. The model is then used to evaluate the performance of two simple parallel scientific applications from partial differential equations: domain decomposition and time-parallel multigrid. In an appropriate limit, we also show the compatibility of the hyperbolic model with the recently proposed LogP model.

  12. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Lau, Sonie; Yan, Jerry C.

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

  13. CSM parallel structural methods research

    NASA Technical Reports Server (NTRS)

    Storaasli, Olaf O.

    1989-01-01

    Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.

  14. Detection of partial-thickness tears in ligaments and tendons by Stokes-polarimetry imaging

    NASA Astrophysics Data System (ADS)

    Kim, Jihoon; John, Raheel; Walsh, Joseph T.

    2008-02-01

    A Stokes polarimetry imaging (SPI) system utilizes an algorithm developed to construct degree of polarization (DoP) image maps from linearly polarized light illumination. Partial-thickness tears of turkey tendons were imaged by the SPI system in order to examine the feasibility of the system to detect partial-thickness rotator cuff tear or general tendon pathology. The rotating incident polarization angle (IPA) for the linearly polarized light provides a way to analyze different tissue types which may be sensitive to IPA variations. Degree of linear polarization (DoLP) images revealed collagen fiber structure, related to partial-thickness tears, better than standard intensity images. DoLP images also revealed structural changes in tears that are related to the tendon load. DoLP images with red-wavelength-filtered incident light may show tears and related organization of collagen fiber structure at a greater depth from the tendon surface. Degree of circular polarization (DoCP) images exhibited well the horizontal fiber orientation that is not parallel to the vertically aligned collagen fibers of the tendon. The SPI system's DOLP images reveal alterations in tendons and ligaments, which have a tissue matrix consisting largely of collagen, better than intensity images. All polarized images showed modulated intensity as the IPA was varied. The optimal detection of the partial-thickness tendon tears at a certain IPA was observed. The SPI system with varying IPA and spectral information can improve the detection of partial-thickness rotator cuff tears by higher visibility of fiber orientations and thereby improve diagnosis and treatment of tendon related injuries.

  15. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining

  16. Simplified Parallel Domain Traversal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Erickson III, David J

    2011-01-01

    Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less

  17. The neural basis of parallel saccade programming: an fMRI study.

    PubMed

    Hu, Yanbo; Walker, Robin

    2011-11-01

    The neural basis of parallel saccade programming was examined in an event-related fMRI study using a variation of the double-step saccade paradigm. Two double-step conditions were used: one enabled the second saccade to be partially programmed in parallel with the first saccade while in a second condition both saccades had to be prepared serially. The intersaccadic interval, observed in the parallel programming (PP) condition, was significantly reduced compared with latency in the serial programming (SP) condition and also to the latency of single saccades in control conditions. The fMRI analysis revealed greater activity (BOLD response) in the frontal and parietal eye fields for the PP condition compared with the SP double-step condition and when compared with the single-saccade control conditions. By contrast, activity in the supplementary eye fields was greater for the double-step condition than the single-step condition but did not distinguish between the PP and SP requirements. The role of the frontal eye fields in PP may be related to the advanced temporal preparation and increased salience of the second saccade goal that may mediate activity in other downstream structures, such as the superior colliculus. The parietal lobes may be involved in the preparation for spatial remapping, which is required in double-step conditions. The supplementary eye fields appear to have a more general role in planning saccade sequences that may be related to error monitoring and the control over the execution of the correct sequence of responses.

  18. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

    1993-01-01

    A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  19. Parallel flow diffusion battery

    DOEpatents

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  20. Parallel flow diffusion battery

    DOEpatents

    Yeh, Hsu-Chi; Cheng, Yung-Sung

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  1. Parallel consistent labeling algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Samal, A.; Henderson, T.

    Mackworth and Freuder have analyzed the time complexity of several constraint satisfaction algorithms. Mohr and Henderson have given new algorithms, AC-4 and PC-3, for arc and path consistency, respectively, and have shown that the arc consistency algorithm is optimal in time complexity and of the same order space complexity as the earlier algorithms. In this paper, they give parallel algorithms for solving node and arc consistency. They show that any parallel algorithm for enforcing arc consistency in the worst case must have O(na) sequential steps, where n is number of nodes, and a is the number of labels per node.more » They give several parallel algorithms to do arc consistency. It is also shown that they all have optimal time complexity. The results of running the parallel algorithms on a BBN Butterfly multiprocessor are also presented.« less

  2. Experts' understanding of partial derivatives using the partial derivative machine

    NASA Astrophysics Data System (ADS)

    Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.

    2015-12-01

    [This paper is part of the Focused Collection on Upper Division Physics Courses.] Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of students in STEM disciplines. In this paper, we report on an initial study of expert understanding of partial derivatives across three disciplines: physics, engineering, and mathematics. We report on the central research question of how disciplinary experts understand partial derivatives, and how their concept images of partial derivatives differ, with a focus on experimentally measured quantities. Using the partial derivative machine (PDM), we probed expert understanding of partial derivatives in an experimental context without a known functional form. In particular, we investigated which representations were cued by the experts' interactions with the PDM. Whereas the physicists and engineers were quick to use measurements to find a numeric approximation for a derivative, the mathematicians repeatedly returned to speculation as to the functional form; although they were comfortable drawing qualitative conclusions about the system from measurements, they were reluctant to approximate the derivative through measurement. On a theoretical front, we found ways in which existing frameworks for the concept of derivative could be expanded to include numerical approximation.

  3. Partitioning in parallel processing of production systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oflazer, K.

    1987-01-01

    This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less

  4. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

    1991-01-01

    A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  5. Regional alveolar partial pressure of oxygen measurement with parallel accelerated hyperpolarized gas MRI.

    PubMed

    Kadlecek, Stephen; Hamedani, Hooman; Xu, Yinan; Emami, Kiarash; Xin, Yi; Ishii, Masaru; Rizi, Rahim

    2013-10-01

    Alveolar oxygen tension (Pao2) is sensitive to the interplay between local ventilation, perfusion, and alveolar-capillary membrane permeability, and thus reflects physiologic heterogeneity of healthy and diseased lung function. Several hyperpolarized helium ((3)He) magnetic resonance imaging (MRI)-based Pao2 mapping techniques have been reported, and considerable effort has gone toward reducing Pao2 measurement error. We present a new Pao2 imaging scheme, using parallel accelerated MRI, which significantly reduces measurement error. The proposed Pao2 mapping scheme was computer-simulated and was tested on both phantoms and five human subjects. Where possible, correspondence between actual local oxygen concentration and derived values was assessed for both bias (deviation from the true mean) and imaging artifact (deviation from the true spatial distribution). Phantom experiments demonstrated a significantly reduced coefficient of variation using the accelerated scheme. Simulation results support this observation and predict that correspondence between the true spatial distribution and the derived map is always superior using the accelerated scheme, although the improvement becomes less significant as the signal-to-noise ratio increases. Paired measurements in the human subjects, comparing accelerated and fully sampled schemes, show a reduced Pao2 distribution width for 41 of 46 slices. In contrast to proton MRI, acceleration of hyperpolarized imaging has no signal-to-noise penalty; its use in Pao2 measurement is therefore always beneficial. Comparison of multiple schemes shows that the benefit arises from a longer time-base during which oxygen-induced depolarization modifies the signal strength. Demonstration of the accelerated technique in human studies shows the feasibility of the method and suggests that measurement error is reduced here as well, particularly at low signal-to-noise levels. Copyright © 2013 AUR. Published by Elsevier Inc. All rights reserved.

  6. Sublattice parallel replica dynamics.

    PubMed

    Martínez, Enrique; Uberuaga, Blas P; Voter, Arthur F

    2014-06-01

    Exascale computing presents a challenge for the scientific community as new algorithms must be developed to take full advantage of the new computing paradigm. Atomistic simulation methods that offer full fidelity to the underlying potential, i.e., molecular dynamics (MD) and parallel replica dynamics, fail to use the whole machine speedup, leaving a region in time and sample size space that is unattainable with current algorithms. In this paper, we present an extension of the parallel replica dynamics algorithm [A. F. Voter, Phys. Rev. B 57, R13985 (1998)] by combining it with the synchronous sublattice approach of Shim and Amar [ and , Phys. Rev. B 71, 125432 (2005)], thereby exploiting event locality to improve the algorithm scalability. This algorithm is based on a domain decomposition in which events happen independently in different regions in the sample. We develop an analytical expression for the speedup given by this sublattice parallel replica dynamics algorithm and compare it with parallel MD and traditional parallel replica dynamics. We demonstrate how this algorithm, which introduces a slight additional approximation of event locality, enables the study of physical systems unreachable with traditional methodologies and promises to better utilize the resources of current high performance and future exascale computers.

  7. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  8. Parallel computation with the force

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1985-01-01

    A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.

  9. Design considerations for parallel graphics libraries

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  10. Experts' Understanding of Partial Derivatives Using the Partial Derivative Machine

    ERIC Educational Resources Information Center

    Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.

    2015-01-01

    Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of…

  11. PyPWA: A partial-wave/amplitude analysis software framework

    NASA Astrophysics Data System (ADS)

    Salgado, Carlos

    2016-05-01

    The PyPWA project aims to develop a software framework for Partial Wave and Amplitude Analysis of data; providing the user with software tools to identify resonances from multi-particle final states in photoproduction. Most of the code is written in Python. The software is divided into two main branches: one general-shell where amplitude's parameters (or any parametric model) are to be estimated from the data. This branch also includes software to produce simulated data-sets using the fitted amplitudes. A second branch contains a specific realization of the isobar model (with room to include Deck-type and other isobar model extensions) to perform PWA with an interface into the computer resources at Jefferson Lab. We are currently implementing parallelism and vectorization using the Intel's Xeon Phi family of coprocessors.

  12. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  13. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  14. Parallelization of the FLAPW method

    NASA Astrophysics Data System (ADS)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  15. Using Parallel Processing for Problem Solving.

    DTIC Science & Technology

    1979-12-01

    are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language

  16. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

    2001-01-01

    This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.

  17. Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains.

    PubMed

    Jha, Ashwani; Flurchick, K M; Bikdash, Marwan; Kc, Dukka B

    2016-01-01

    Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10-15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors.

  18. Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains

    PubMed Central

    Jha, Ashwani; Flurchick, K. M.; Bikdash, Marwan

    2016-01-01

    Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10–15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors. PMID:27747230

  19. Turbomachinery CFD on parallel computers

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.

    1992-01-01

    The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.

  20. Parallel CE/SE Computations via Domain Decomposition

    NASA Technical Reports Server (NTRS)

    Himansu, Ananda; Jorgenson, Philip C. E.; Wang, Xiao-Yen; Chang, Sin-Chung

    2000-01-01

    This paper describes the parallelization strategy and achieved parallel efficiency of an explicit time-marching algorithm for solving conservation laws. The Space-Time Conservation Element and Solution Element (CE/SE) algorithm for solving the 2D and 3D Euler equations is parallelized with the aid of domain decomposition. The parallel efficiency of the resultant algorithm on a Silicon Graphics Origin 2000 parallel computer is checked.

  1. Survey of the status of finite element methods for partial differential equations

    NASA Technical Reports Server (NTRS)

    Temam, Roger

    1986-01-01

    The finite element methods (FEM) have proved to be a powerful technique for the solution of boundary value problems associated with partial differential equations of either elliptic, parabolic, or hyperbolic type. They also have a good potential for utilization on parallel computers particularly in relation to the concept of domain decomposition. This report is intended as an introduction to the FEM for the nonspecialist. It contains a survey which is totally nonexhaustive, and it also contains as an illustration, a report on some new results concerning two specific applications, namely a free boundary fluid-structure interaction problem and the Euler equations for inviscid flows.

  2. Partial tooth gear bearings

    NASA Technical Reports Server (NTRS)

    Vranish, John M. (Inventor)

    2010-01-01

    A partial gear bearing including an upper half, comprising peak partial teeth, and a lower, or bottom, half, comprising valley partial teeth. The upper half also has an integrated roller section between each of the peak partial teeth with a radius equal to the gear pitch radius of the radially outwardly extending peak partial teeth. Conversely, the lower half has an integrated roller section between each of the valley half teeth with a radius also equal to the gear pitch radius of the peak partial teeth. The valley partial teeth extend radially inwardly from its roller section. The peak and valley partial teeth are exactly out of phase with each other, as are the roller sections of the upper and lower halves. Essentially, the end roller bearing of the typical gear bearing has been integrated into the normal gear tooth pattern.

  3. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  4. On a model of three-dimensional bursting and its parallel implementation

    NASA Astrophysics Data System (ADS)

    Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.

    2008-04-01

    A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.

  5. Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

    2010-01-01

    Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less

  6. Parallel processing considerations for image recognition tasks

    NASA Astrophysics Data System (ADS)

    Simske, Steven J.

    2011-01-01

    Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.

  7. New NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

    1997-01-01

    NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.

  8. Expressing Parallelism with ROOT

    NASA Astrophysics Data System (ADS)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  9. Massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Fung, L. W. (Inventor)

    1983-01-01

    An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

  10. Parallel computing for probabilistic fatigue analysis

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

    1993-01-01

    This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.

  11. Exploiting Symmetry on Parallel Architectures.

    NASA Astrophysics Data System (ADS)

    Stiller, Lewis Benjamin

    1995-01-01

    This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

  12. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  13. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  14. Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

    PubMed

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-06-01

    We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.

  15. An object-oriented approach to nested data parallelism

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.; Chatterjee, Siddhartha

    1994-01-01

    This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.

  16. Growth responses of Neurospora crassa to increased partial pressures of the noble gases and nitrogen.

    PubMed

    Buchheit, R G; Schreiner, H R; Doebbler, G F

    1966-02-01

    Buchheit, R. G. (Union Carbide Corp., Tonawanda, N.Y.), H. R. Schreiner, and G. F. Doebbler. Growth responses of Neurospora crassa to increased partial pressures of the noble gases and nitrogen. J. Bacteriol. 91:622-627. 1966.-Growth rate of the fungus Neurospora crassa depends in part on the nature of metabolically "inert gas" present in its environment. At high partial pressures, the noble gas elements (helium, neon, argon, krypton, and xenon) inhibit growth in the order: Xe > Kr> Ar > Ne > He. Nitrogen (N(2)) closely resembles He in inhibitory effectiveness. Partial pressures required for 50% inhibition of growth were: Xe (0.8 atm), Kr (1.6 atm), Ar (3.8 atm), Ne (35 atm), and He ( approximately 300 atm). With respect to inhibition of growth, the noble gases and N(2) differ qualitatively and quantitatively from the order of effectiveness found with other biological effects, i.e., narcosis, inhibition of insect development, depression of O(2)-dependent radiation sensitivity, and effects on tissue-slice glycolysis and respiration. Partial pressures giving 50% inhibition of N. crassa growth parallel various physical properties (i.e., solubilities, solubility ratios, etc.) of the noble gases. Linear correlation of 50% inhibition pressures to the polarizability and of the logarithm of pressure to the first and second ionization potentials suggests the involvement of weak intermolecular interactions or charge-transfer in the biological activity of the noble gases.

  17. Performance Evaluation in Network-Based Parallel Computing

    NASA Technical Reports Server (NTRS)

    Dezhgosha, Kamyar

    1996-01-01

    Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.

  18. Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lifflander, Jonathan; Meneses, Esteban; Menon, Harshita

    2014-09-22

    Deterministic replay of a parallel application is commonly used for discovering bugs or to recover from a hard fault with message-logging fault tolerance. For message passing programs, a major source of overhead during forward execution is recording the order in which messages are sent and received. During replay, this ordering must be used to deterministically reproduce the execution. Previous work in replay algorithms often makes minimal assumptions about the programming model and application in order to maintain generality. However, in many cases, only a partial order must be recorded due to determinism intrinsic in the code, ordering constraints imposed bymore » the execution model, and events that are commutative (their relative execution order during replay does not need to be reproduced exactly). In this paper, we present a novel algebraic framework for reasoning about the minimum dependencies required to represent the partial order for different concurrent orderings and interleavings. By exploiting this theory, we improve on an existing scalable message-logging fault tolerance scheme. The improved scheme scales to 131,072 cores on an IBM BlueGene/P with up to 2x lower overhead than one that records a total order.« less

  19. Simulation Exploration through Immersive Parallel Planes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brunhart-Lupo, Nicholas J; Bush, Brian W; Gruchalla, Kenny M

    We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less

  20. Parallel multiscale simulations of a brain aneurysm

    PubMed Central

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2012-01-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future

  1. Parallel multiscale simulations of a brain aneurysm.

    PubMed

    Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future

  2. Parallel multiscale simulations of a brain aneurysm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em, E-mail: george_karniadakis@brown.edu

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm.more » The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed

  3. Hypercluster Parallel Processor

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

    1992-01-01

    Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.

  4. MOOSE: A parallel computational framework for coupled systems of nonlinear equations.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Derek Gaston; Chris Newman; Glen Hansen

    Systems of coupled, nonlinear partial differential equations (PDEs) often arise in simulation of nuclear processes. MOOSE: Multiphysics Object Oriented Simulation Environment, a parallel computational framework targeted at the solution of such systems, is presented. As opposed to traditional data-flow oriented computational frameworks, MOOSE is instead founded on the mathematical principle of Jacobian-free Newton-Krylov (JFNK) solution methods. Utilizing the mathematical structure present in JFNK, physics expressions are modularized into `Kernels,'' allowing for rapid production of new simulation tools. In addition, systems are solved implicitly and fully coupled, employing physics based preconditioning, which provides great flexibility even with large variance in timemore » scales. A summary of the mathematics, an overview of the structure of MOOSE, and several representative solutions from applications built on the framework are presented.« less

  5. Neoclassical parallel flow calculation in the presence of external parallel momentum sources in Heliotron J

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nishioka, K.; Nakamura, Y.; Nishimura, S.

    A moment approach to calculate neoclassical transport in non-axisymmetric torus plasmas composed of multiple ion species is extended to include the external parallel momentum sources due to unbalanced tangential neutral beam injections (NBIs). The momentum sources that are included in the parallel momentum balance are calculated from the collision operators of background particles with fast ions. This method is applied for the clarification of the physical mechanism of the neoclassical parallel ion flows and the multi-ion species effect on them in Heliotron J NBI plasmas. It is found that parallel ion flow can be determined by the balance between themore » parallel viscosity and the external momentum source in the region where the external source is much larger than the thermodynamic force driven source in the collisional plasmas. This is because the friction between C{sup 6+} and D{sup +} prevents a large difference between C{sup 6+} and D{sup +} flow velocities in such plasmas. The C{sup 6+} flow velocities, which are measured by the charge exchange recombination spectroscopy system, are numerically evaluated with this method. It is shown that the experimentally measured C{sup 6+} impurity flow velocities do not contradict clearly with the neoclassical estimations, and the dependence of parallel flow velocities on the magnetic field ripples is consistent in both results.« less

  6. Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

    PubMed

    Bhandarkar, S M; Chirravuri, S; Arnold, J

    1996-01-01

    Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.

  7. The Relation between Reconnected Flux, the Parallel Electric Field, and the Reconnection Rate in a Three-Dimensional Kinetic Simulation of Magnetic Reconnection

    NASA Astrophysics Data System (ADS)

    Wendel, D. E.; Olson, D. K.; Hesse, M.; Karimabadi, H.; Daughton, W. S.

    2013-12-01

    We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection of a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of topological features such as separators and null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a correspondence between the locus of changes in magnetic connectivity, or the quasi-separatrix layer, and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we compare the distribution of parallel electric fields along field lines with the reconnection rate. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first-order trends in the parallel electric field, while the contribution from high amplitude parallel fluctuations, such as electron holes, is negligible. The results impact the determination of reconnection sites within models of 3D turbulent reconnection as well as the inference of reconnection rates from in situ spacecraft measurements. It is difficult through direct observation to isolate the locus of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the partial sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.

  8. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with themore » endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.« less

  9. Expressing Parallelism with ROOT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Piparo, D.; Tejedor, E.; Guiraud, E.

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less

  10. Parallel programming of industrial applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heroux, M; Koniges, A; Simon, H

    1998-07-21

    In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from thesemore » applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).« less

  11. Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

    DOE PAGES

    Zhang, Hong; Zapol, Peter; Dixon, David A.; ...

    2015-11-17

    The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less

  12. Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Hong; Zapol, Peter; Dixon, David A.

    The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less

  13. Parallel Event Analysis Under Unix

    NASA Astrophysics Data System (ADS)

    Looney, S.; Nilsson, B. S.; Oest, T.; Pettersson, T.; Ranjard, F.; Thibonnier, J.-P.

    The ALEPH experiment at LEP, the CERN CN division and Digital Equipment Corp. have, in a joint project, developed a parallel event analysis system. The parallel physics code is identical to ALEPH's standard analysis code, ALPHA, only the organisation of input/output is changed. The user may switch between sequential and parallel processing by simply changing one input "card". The initial implementation runs on an 8-node DEC 3000/400 farm, using the PVM software, and exhibits a near-perfect speed-up linearity, reducing the turn-around time by a factor of 8.

  14. Partial volume segmentation in 3D of lesions and tissues in magnetic resonance images

    NASA Astrophysics Data System (ADS)

    Johnston, Brian; Atkins, M. Stella; Booth, Kellogg S.

    1994-05-01

    An important first step in diagnosis and treatment planning using tomographic imaging is differentiating and quantifying diseased as well as healthy tissue. One of the difficulties encountered in solving this problem to date has been distinguishing the partial volume constituents of each voxel in the image volume. Most proposed solutions to this problem involve analysis of planar images, in sequence, in two dimensions only. We have extended a model-based method of image segmentation which applies the technique of iterated conditional modes in three dimensions. A minimum of user intervention is required to train the algorithm. Partial volume estimates for each voxel in the image are obtained yielding fractional compositions of multiple tissue types for individual voxels. A multispectral approach is applied, where spatially registered data sets are available. The algorithm is simple and has been parallelized using a dataflow programming environment to reduce the computational burden. The algorithm has been used to segment dual echo MRI data sets of multiple sclerosis patients using lesions, gray matter, white matter, and cerebrospinal fluid as the partial volume constituents. The results of the application of the algorithm to these datasets is presented and compared to the manual lesion segmentation of the same data.

  15. Collectively loading an application in a parallel computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

    Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.

  16. Integrated Task And Data Parallel Programming: Language Design

    NASA Technical Reports Server (NTRS)

    Grimshaw, Andrew S.; West, Emily A.

    1998-01-01

    his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated

  17. Performance of the Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.

  18. Parallel computing on Unix workstation arrays

    NASA Astrophysics Data System (ADS)

    Reale, F.; Bocchino, F.; Sciortino, S.

    1994-12-01

    We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.

  19. Revisiting Parallel Cyclic Reduction and Parallel Prefix-Based Algorithms for Block Tridiagonal System of Equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seal, Sudip K; Perumalla, Kalyan S; Hirshman, Steven Paul

    2013-01-01

    Simulations that require solutions of block tridiagonal systems of equations rely on fast parallel solvers for runtime efficiency. Leading parallel solvers that are highly effective for general systems of equations, dense or sparse, are limited in scalability when applied to block tridiagonal systems. This paper presents scalability results as well as detailed analyses of two parallel solvers that exploit the special structure of block tridiagonal matrices to deliver superior performance, often by orders of magnitude. A rigorous analysis of their relative parallel runtimes is shown to reveal the existence of a critical block size that separates the parameter space spannedmore » by the number of block rows, the block size and the processor count, into distinct regions that favor one or the other of the two solvers. Dependence of this critical block size on the above parameters as well as on machine-specific constants is established. These formal insights are supported by empirical results on up to 2,048 cores of a Cray XT4 system. To the best of our knowledge, this is the highest reported scalability for parallel block tridiagonal solvers to date.« less

  20. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.

  1. Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Dimpsey, Robert Tod

    1992-01-01

    In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.

  2. PARALLEL PERTURBATION MODEL FOR CYCLE TO CYCLE VARIABILITY PPM4CCV

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ameen, Muhsin Mohammed; Som, Sibendu

    This code consists of a Fortran 90 implementation of the parallel perturbation model to compute cyclic variability in spark ignition (SI) engines. Cycle-to-cycle variability (CCV) is known to be detrimental to SI engine operation resulting in partial burn and knock, and result in an overall reduction in the reliability of the engine. Numerical prediction of cycle-to-cycle variability (CCV) in SI engines is extremely challenging for two key reasons: (i) high-fidelity methods such as large eddy simulation (LES) are required to accurately capture the in-cylinder turbulent flow field, and (ii) CCV is experienced over long timescales and hence the simulations needmore » to be performed for hundreds of consecutive cycles. In the new technique, the strategy is to perform multiple parallel simulations, each of which encompasses 2-3 cycles, by effectively perturbing the simulation parameters such as the initial and boundary conditions. The PPM4CCV code is a pre-processing code and can be coupled with any engine CFD code. PPM4CCV was coupled with Converge CFD code and a 10-time speedup was demonstrated over the conventional multi-cycle LES in predicting the CCV for a motored engine. Recently, the model is also being applied to fired engines including port fuel injected (PFI) and direct injection spark ignition engines and the preliminary results are very encouraging.« less

  3. Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

    PubMed Central

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-01-01

    We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529

  4. A parallel variable metric optimization algorithm

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.

    1973-01-01

    An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.

  5. Genetic Parallel Programming: design and implementation.

    PubMed

    Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

    2006-01-01

    This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.

  6. Parallel programming with Easy Java Simulations

    NASA Astrophysics Data System (ADS)

    Esquembre, F.; Christian, W.; Belloni, M.

    2018-01-01

    Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.

  7. Parallel auto-correlative statistics with VTK.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  8. Xyce parallel electronic simulator : users' guide.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-artmore » algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a

  9. Capabilities of Fully Parallelized MHD Stability Code MARS

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2016-10-01

    Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.

  10. Implementation and performance of parallel Prolog interpreter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, S.; Kale, L.V.; Balkrishna, R.

    1988-01-01

    In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

  11. Force user's manual: A portable, parallel FORTRAN

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1990-01-01

    The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.

  12. Partial (focal) seizure

    MedlinePlus

    ... Jacksonian seizure; Seizure - partial (focal); Temporal lobe seizure; Epilepsy - partial seizures ... Abou-Khalil BW, Gallagher MJ, Macdonald RL. Epilepsies. In: Daroff ... Practice . 7th ed. Philadelphia, PA: Elsevier; 2016:chap 101. ...

  13. Effective Parallel Algorithm Animation

    DTIC Science & Technology

    1994-03-01

    parallel computer. The system incorporates the 14 Parallel Processing System us" r User User UMe PMwuM Progra Propu Plropm ýData Dots Data Daft...that produce meaningful animations. The following sections outline characteristics 146 Animation 0 71 r 40 02 I 5 * *2! 4 Idle Bu~sy Send Recv 7...Event Simulation. Technical Report, Georgia Institute of Technology, 1992. 22. Garey, Michael R . and David S. Johnson. Computers and Intractability: A

  14. MOOSE: A PARALLEL COMPUTATIONAL FRAMEWORK FOR COUPLED SYSTEMS OF NONLINEAR EQUATIONS.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    G. Hansen; C. Newman; D. Gaston

    Systems of coupled, nonlinear partial di?erential equations often arise in sim- ulation of nuclear processes. MOOSE: Multiphysics Ob ject Oriented Simulation Environment, a parallel computational framework targeted at solving these systems is presented. As opposed to traditional data / ?ow oriented com- putational frameworks, MOOSE is instead founded on mathematics based on Jacobian-free Newton Krylov (JFNK). Utilizing the mathematical structure present in JFNK, physics are modularized into “Kernels” allowing for rapid production of new simulation tools. In addition, systems are solved fully cou- pled and fully implicit employing physics based preconditioning allowing for a large amount of ?exibility even withmore » large variance in time scales. Background on the mathematics, an inspection of the structure of MOOSE and several rep- resentative solutions from applications built on the framework are presented.« less

  15. Biodynamic feedback training to assure learning partial load bearing on forearm crutches.

    PubMed

    Krause, Daniel; Wünnemann, Martin; Erlmann, Andre; Hölzchen, Timo; Mull, Melanie; Olivier, Norbert; Jöllenbeck, Thomas

    2007-07-01

    To examine how biodynamic feedback training affects the learning of prescribed partial load bearing (200N). Three pre-post experiments. Biomechanics laboratory in a German university. A volunteer sample of 98 uninjured subjects who had not used crutches recently. There were 24 subjects in experiment 1 (mean age, 23.2y); 64 in experiment 2 (mean age, 43.6y); and 10 in experiment 3 (mean age, 40.3y), parallelized by arm force. Video instruction and feedback training: In experiment 1, 2 varied instruction videos and reduced feedback frequency; in experiment 2, varied frequencies of changing tasks (contextual interference); and in experiment 3, feedback training (walking) and transfer (stair tasks). Vertical ground reaction force. Absolute error of practiced tasks was significantly reduced for all samples (P<.050). Varied contextual interference conditions did not significantly affect retention (P=.798) or transfer (P=.897). Positive transfer between tasks was significant in experiment 2 (P<.001) and was contrary to findings in experiment 3 (P=.071). Biodynamic feedback training is applicable for learning prescribed partial load bearing. The frequency of changing tasks is irrelevant. Despite some support for transfer effects, additional practice in climbing and descending stairs might be beneficial.

  16. Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

    PubMed

    Pessôa, Rodrigo; Watanabe, Jaqueline Tomoko; Nukui, Youko; Pereira, Juliana; Casseb, Jorge; Kasseb, Jorge; de Oliveira, Augusto César Penalva; Segurado, Aluisio Cotrim; Sanabani, Sabri Saeed

    2014-01-01

    Here, we report on the partial and full-length genomic (FLG) variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs), 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) and 7 adult T-cell leukemia/lymphoma (ATLL) patients, using an Illumina paired-end protocol. Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14) and FLG (n = 76) data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5%) individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA) and that 4 individuals (4.5%) were infected with the Japanese sub-subtypes (aB). A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data will add to our current understanding of the

  17. Parallel Algorithms for the Exascale Era

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robey, Robert W.

    New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this workmore » has been done by undergraduates and published in leading scientific journals.« less

  18. Growth Responses of Neurospora crassa to Increased Partial Pressures of the Noble Gases and Nitrogen

    PubMed Central

    Buchheit, R. G.; Schreiner, H. R.; Doebbler, G. F.

    1966-01-01

    Buchheit, R. G. (Union Carbide Corp., Tonawanda, N.Y.), H. R. Schreiner, and G. F. Doebbler. Growth responses of Neurospora crassa to increased partial pressures of the noble gases and nitrogen. J. Bacteriol. 91:622–627. 1966.—Growth rate of the fungus Neurospora crassa depends in part on the nature of metabolically “inert gas” present in its environment. At high partial pressures, the noble gas elements (helium, neon, argon, krypton, and xenon) inhibit growth in the order: Xe > Kr> Ar ≫ Ne ≫ He. Nitrogen (N2) closely resembles He in inhibitory effectiveness. Partial pressures required for 50% inhibition of growth were: Xe (0.8 atm), Kr (1.6 atm), Ar (3.8 atm), Ne (35 atm), and He (∼ 300 atm). With respect to inhibition of growth, the noble gases and N2 differ qualitatively and quantitatively from the order of effectiveness found with other biological effects, i.e., narcosis, inhibition of insect development, depression of O2-dependent radiation sensitivity, and effects on tissue-slice glycolysis and respiration. Partial pressures giving 50% inhibition of N. crassa growth parallel various physical properties (i.e., solubilities, solubility ratios, etc.) of the noble gases. Linear correlation of 50% inhibition pressures to the polarizability and of the logarithm of pressure to the first and second ionization potentials suggests the involvement of weak intermolecular interactions or charge-transfer in the biological activity of the noble gases. PMID:5883104

  19. Learning in Parallel: Using Parallel Corpora to Enhance Written Language Acquisition at the Beginning Level

    ERIC Educational Resources Information Center

    Bluemel, Brody

    2014-01-01

    This article illustrates the pedagogical value of incorporating parallel corpora in foreign language education. It explores the development of a Chinese/English parallel corpus designed specifically for pedagogical application. The corpus tool was created to aid language learners in reading comprehension and writing development by making foreign…

  20. Partial-depth lock-release flows

    NASA Astrophysics Data System (ADS)

    Khodkar, M. A.; Nasr-Azadani, M. M.; Meiburg, E.

    2017-06-01

    We extend the vorticity-based modeling concept for stratified flows introduced by Borden and Meiburg [Z. Borden and E. Meiburg, J. Fluid Mech. 726, R1 (2013), 10.1017/jfm.2013.239] to unsteady flow fields that cannot be rendered quasisteady by a change of reference frames. Towards this end, we formulate a differential control volume balance for the conservation of mass and vorticity in the fully unsteady parts of the flow, which we refer to as the differential vorticity model. We furthermore show that with the additional assumptions of locally uniform parallel flow within each layer, the unsteady vorticity modeling approach reproduces the familiar two-layer shallow-water equations. To evaluate its accuracy, we then apply the vorticity model approach to partial-depth lock-release flows. Consistent with the shallow water analysis of Rottman and Simpson [J. W. Rottman and J. E. Simpson, J. Fluid Mech. 135, 95 (1983), 10.1017/S0022112083002979], the vorticity model demonstrates the formation of a quasisteady gravity current front, a fully unsteady expansion wave, and a propagating bore that is present only if the lock depth exceeds half the channel height. When this bore forms, it travels with a velocity that does not depend on the lock height and the interface behind it is always at half the channel depth. We demonstrate that such a bore is energy conserving. The differential vorticity model gives predictions for the height and velocity of the gravity current and the bore, as well as for the propagation velocities of the edges of the expansion fan, as a function of the lock height. All of these predictions are seen to be in good agreement with the direct numerical simulation data and, where available, with experimental results. An energy analysis shows lock-release flows to be energy conserving only for the case of a full lock, whereas they are always dissipative for partial-depth locks.

  1. Collisions between quasi-parallel shocks

    NASA Technical Reports Server (NTRS)

    Cargill, Peter J.

    1991-01-01

    The collision between pairs of quasi-parallel shocks is examined using hybrid numerical simulations. In the interaction, the two shocks are transmitted through each other leaving behind a hot plasma with a population of particles with energies in excess of 40 E0, where E0 is the kinetic energy of particles in the shock frame prior to the collision. The energization is more efficient for quasi-parallel shocks than parallel shocks. Collisions between shocks of equal strengths are more efficient than those that are unequal. The results are of importance for phenomena during the impulsive phase of solar flares, in the distant solar wind and at planetary bow shocks.

  2. Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    ERIC Educational Resources Information Center

    Agarwal, Mayank

    2009-01-01

    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…

  3. Bayer image parallel decoding based on GPU

    NASA Astrophysics Data System (ADS)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  4. Identification of the Bovine Arachnomelia Mutation by Massively Parallel Sequencing Implicates Sulfite Oxidase (SUOX) in Bone Development

    PubMed Central

    Drögemüller, Cord; Tetens, Jens; Sigurdsson, Snaevar; Gentile, Arcangelo; Testoni, Stefania; Lindblad-Toh, Kerstin; Leeb, Tosso

    2010-01-01

    Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development. PMID:20865119

  5. Parallel machine architecture and compiler design facilities

    NASA Technical Reports Server (NTRS)

    Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

    1990-01-01

    The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.

  6. Shift-connected SIMD array architectures for digital optical computing systems, with algorithms for numerical transforms and partial differential equations

    NASA Astrophysics Data System (ADS)

    Drabik, Timothy J.; Lee, Sing H.

    1986-11-01

    The intrinsic parallelism characteristics of easily realizable optical SIMD arrays prompt their present consideration in the implementation of highly structured algorithms for the numerical solution of multidimensional partial differential equations and the computation of fast numerical transforms. Attention is given to a system, comprising several spatial light modulators (SLMs), an optical read/write memory, and a functional block, which performs simple, space-invariant shifts on images with sufficient flexibility to implement the fastest known methods for partial differential equations as well as a wide variety of numerical transforms in two or more dimensions. Either fixed or floating-point arithmetic may be used. A performance projection of more than 1 billion floating point operations/sec using SLMs with 1000 x 1000-resolution and operating at 1-MHz frame rates is made.

  7. A Tutorial on Parallel and Concurrent Programming in Haskell

    NASA Astrophysics Data System (ADS)

    Peyton Jones, Simon; Singh, Satnam

    This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.

  8. Parallel Computing Using Web Servers and "Servlets".

    ERIC Educational Resources Information Center

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  9. Parallel closure theory for toroidally confined plasmas

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Held, Eric D.

    2017-10-01

    We solve a system of general moment equations to obtain parallel closures for electrons and ions in an axisymmetric toroidal magnetic field. Magnetic field gradient terms are kept and treated using the Fourier series method. Assuming lowest order density (pressure) and temperature to be flux labels, the parallel heat flow, friction, and viscosity are expressed in terms of radial gradients of the lowest-order temperature and pressure, parallel gradients of temperature and parallel flow, and the relative electron-ion parallel flow velocity. Convergence of closure quantities is demonstrated as the number of moments and Fourier modes are increased. Properties of the moment equations in the collisionless limit are also discussed. Combining closures with fluid equations parallel mass flow and electric current are also obtained. Work in collaboration with the PSI Center and supported by the U.S. DOE under Grant Nos. DE-SC0014033, DE-SC0016256, and DE-FG02-04ER54746.

  10. Linearly exact parallel closures for slab geometry

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-01

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).

  11. Implementing Shared Memory Parallelism in MCBEND

    NASA Astrophysics Data System (ADS)

    Bird, Adam; Long, David; Dobson, Geoff

    2017-09-01

    MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.

  12. Configuration affects parallel stent grafting results.

    PubMed

    Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L

    2018-05-01

    A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31

  13. Coil Compression for Accelerated Imaging with Cartesian Sampling

    PubMed Central

    Zhang, Tao; Pauly, John M.; Vasanawala, Shreyas S.; Lustig, Michael

    2012-01-01

    MRI using receiver arrays with many coil elements can provide high signal-to-noise ratio and increase parallel imaging acceleration. At the same time, the growing number of elements results in larger datasets and more computation in the reconstruction. This is of particular concern in 3D acquisitions and in iterative reconstructions. Coil compression algorithms are effective in mitigating this problem by compressing data from many channels into fewer virtual coils. In Cartesian sampling there often are fully sampled k-space dimensions. In this work, a new coil compression technique for Cartesian sampling is presented that exploits the spatially varying coil sensitivities in these non-subsampled dimensions for better compression and computation reduction. Instead of directly compressing in k-space, coil compression is performed separately for each spatial location along the fully-sampled directions, followed by an additional alignment process that guarantees the smoothness of the virtual coil sensitivities. This important step provides compatibility with autocalibrating parallel imaging techniques. Its performance is not susceptible to artifacts caused by a tight imaging fieldof-view. High quality compression of in-vivo 3D data from a 32 channel pediatric coil into 6 virtual coils is demonstrated. PMID:22488589

  14. Matching pursuit parallel decomposition of seismic data

    NASA Astrophysics Data System (ADS)

    Li, Chuanhui; Zhang, Fanchang

    2017-07-01

    In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.

  15. Parallel tempering for the traveling salesman problem

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Percus, Allon; Wang, Richard; Hyman, Jeffrey

    We explore the potential of parallel tempering as a combinatorial optimization method, applying it to the traveling salesman problem. We compare simulation results of parallel tempering with a benchmark implementation of simulated annealing, and study how different choices of parameters affect the relative performance of the two methods. We find that a straightforward implementation of parallel tempering can outperform simulated annealing in several crucial respects. When parameters are chosen appropriately, both methods yield close approximation to the actual minimum distance for an instance with 200 nodes. However, parallel tempering yields more consistently accurate results when a series of independent simulationsmore » are performed. Our results suggest that parallel tempering might offer a simple but powerful alternative to simulated annealing for combinatorial optimization problems.« less

  16. Exploration of a Permanent Magnet Synchronous Generator with Compensated Reactance Windings in Parallel Rod Configuration

    NASA Astrophysics Data System (ADS)

    Lyan, Oleg; Jankunas, Valdas; Guseinoviene, Eleonora; Pašilis, Aleksas; Senulis, Audrius; Knolis, Audrius; Kurt, Erol

    2018-02-01

    In this study, a permanent magnet synchronous generator (PMSG) topology with compensated reactance windings in parallel rod configuration is proposed to reduce the armature reactance X L and to achieve higher efficiency of PMSG. The PMSG was designed using iron-cored bifilar coil topology to overcome problems of market-dominant rotary type generators. Often the problem is a comparatively high armature reactance X L, which is usually bigger than armature resistance R a. Therefore, the topology is proposed to partially compensate or negligibly reduce the PMSG reactance. The study was performed by using finite element method (FEM) analysis and experimental investigation. FEM analysis was used to investigate magnetic field flux distribution and density in PMSG. The PMSG experimental analyses of no-load losses and electromotive force versus frequency (i.e., speed) was performed. Also terminal voltage, power output and efficiency relation with load current at different frequencies have been evaluated. The reactance of PMSG has low value and a linear relation with operating frequency. The low reactance gives a small variation of efficiency (from 90% to 95%) in a wide range of load (from 3 A to 10 A) and operation frequency (from 44 Hz to 114 Hz). The comparison of PMSG characteristics with parallel and series winding connection showed insignificant power variation. The research results showed that compensated reactance winding in parallel rod configuration in PMSG design provides lower reactance and therefore, higher efficiency under wider load and frequency variation.

  17. General design approach and practical realization of decoupling matrices for parallel transmission coils.

    PubMed

    Mahmood, Zohaib; McDaniel, Patrick; Guérin, Bastien; Keil, Boris; Vester, Markus; Adalsteinsson, Elfar; Wald, Lawrence L; Daniel, Luca

    2016-07-01

    In a coupled parallel transmit (pTx) array, the power delivered to a channel is partially distributed to other channels because of coupling. This power is dissipated in circulators resulting in a significant reduction in power efficiency. In this study, a technique for designing robust decoupling matrices interfaced between the RF amplifiers and the coils is proposed. The decoupling matrices ensure that most forward power is delivered to the load without loss of encoding capabilities of the pTx array. The decoupling condition requires that the impedance matrix seen by the power amplifiers is a diagonal matrix whose entries match the characteristic impedance of the power amplifiers. In this work, the impedance matrix of the coupled coils is diagonalized by a successive multiplication by its eigenvectors. A general design procedure and software are developed to generate automatically the hardware that implements diagonalization using passive components. The general design method is demonstrated by decoupling two example parallel transmit arrays. Our decoupling matrices achieve better than -20 db decoupling in both cases. A robust framework for designing decoupling matrices for pTx arrays is presented and validated. The proposed decoupling strategy theoretically scales to any arbitrary number of channels. Magn Reson Med 76:329-339, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  18. Parallel evolutionary computation in bioinformatics applications.

    PubMed

    Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

    2013-05-01

    A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  19. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2001-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  20. Parallel MR imaging: a user's guide.

    PubMed

    Glockner, James F; Hu, Houchun H; Stanley, David W; Angelos, Lisa; King, Kevin

    2005-01-01

    Parallel imaging is a recently developed family of techniques that take advantage of the spatial information inherent in phased-array radiofrequency coils to reduce acquisition times in magnetic resonance imaging. In parallel imaging, the number of sampled k-space lines is reduced, often by a factor of two or greater, thereby significantly shortening the acquisition time. Parallel imaging techniques have only recently become commercially available, and the wide range of clinical applications is just beginning to be explored. The potential clinical applications primarily involve reduction in acquisition time, improved spatial resolution, or a combination of the two. Improvements in image quality can be achieved by reducing the echo train lengths of fast spin-echo and single-shot fast spin-echo sequences. Parallel imaging is particularly attractive for cardiac and vascular applications and will likely prove valuable as 3-T body and cardiovascular imaging becomes part of standard clinical practice. Limitations of parallel imaging include reduced signal-to-noise ratio and reconstruction artifacts. It is important to consider these limitations when deciding when to use these techniques. (c) RSNA, 2005.

  1. Relative Debugging of Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2002-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  2. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Gryphon, Coranth D.; Miller, Mark D.

    1991-01-01

    PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.

  3. Multitasking TORT under UNICOS: Parallel performance models and measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barnett, A.; Azmy, Y.Y.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  4. Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Azmy, Y.Y.; Barnett, D.A.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  5. Parallel Processing at the High School Level.

    ERIC Educational Resources Information Center

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  6. Simulation Exploration through Immersive Parallel Planes: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny

    We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less

  7. Parallel integrated frame synchronizer chip

    NASA Technical Reports Server (NTRS)

    Solomon, Jeffrey Michael (Inventor); Ghuman, Parminder Singh (Inventor); Bennett, Toby Dennis (Inventor)

    2000-01-01

    A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.

  8. Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.

  9. Massively parallel multicanonical simulations

    NASA Astrophysics Data System (ADS)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  10. Introducing parallelism to histogramming functions for GEM systems

    NASA Astrophysics Data System (ADS)

    Krawczyk, Rafał D.; Czarski, Tomasz; Kolasinski, Piotr; Pozniak, Krzysztof T.; Linczuk, Maciej; Byszuk, Adrian; Chernyshova, Maryna; Juszczyk, Bartlomiej; Kasprowicz, Grzegorz; Wojenski, Andrzej; Zabolotny, Wojciech

    2015-09-01

    This article is an assessment of potential parallelization of histogramming algorithms in GEM detector system. Histogramming and preprocessing algorithms in MATLAB were analyzed with regard to adding parallelism. Preliminary implementation of parallel strip histogramming resulted in speedup. Analysis of algorithms parallelizability is presented. Overview of potential hardware and software support to implement parallel algorithm is discussed.

  11. Code Parallelization with CAPO: A User Manual

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

  12. Dynamic 2D self-phase-map Nyquist ghost correction for simultaneous multi-slice echo planar imaging.

    PubMed

    Yarach, Uten; Tung, Yi-Hang; Setsompop, Kawin; In, Myung-Ho; Chatnuntawech, Itthi; Yakupov, Renat; Godenschweger, Frank; Speck, Oliver

    2018-02-09

    To develop a reconstruction pipeline that intrinsically accounts for both simultaneous multislice echo planar imaging (SMS-EPI) reconstruction and dynamic slice-specific Nyquist ghosting correction in time-series data. After 1D slice-group average phase correction, the separate polarity (i.e., even and odd echoes) SMS-EPI data were unaliased by slice GeneRalized Autocalibrating Partial Parallel Acquisition. Both the slice-unaliased even and odd echoes were jointly reconstructed using a model-based framework, extended for SMS-EPI reconstruction that estimates a 2D self-phase map, corrects dynamic slice-specific phase errors, and combines data from all coils and echoes to obtain the final images. The percentage ghost-to-signal ratios (%GSRs) and its temporal variations for MB3R y 2 with a field of view/4 shift in a human brain obtained by the proposed dynamic 2D and standard 1D phase corrections were 1.37 ± 0.11 and 2.66 ± 0.16, respectively. Even with a large regularization parameter λ applied in the proposed reconstruction, the smoothing effect in fMRI activation maps was comparable to a very small Gaussian kernel size 1 × 1 × 1 mm 3 . The proposed reconstruction pipeline reduced slice-specific phase errors in SMS-EPI, resulting in reduction of GSR. It is applicable for functional MRI studies because the smoothing effect caused by the regularization parameter selection can be minimal in a blood-oxygen-level-dependent activation map. © 2018 International Society for Magnetic Resonance in Medicine.

  13. Diffusion tensor imaging (DTI) with retrospective motion correction for large-scale pediatric imaging.

    PubMed

    Holdsworth, Samantha J; Aksoy, Murat; Newbould, Rexford D; Yeom, Kristen; Van, Anh T; Ooi, Melvyn B; Barnes, Patrick D; Bammer, Roland; Skare, Stefan

    2012-10-01

    To develop and implement a clinical DTI technique suitable for the pediatric setting that retrospectively corrects for large motion without the need for rescanning and/or reacquisition strategies, and to deliver high-quality DTI images (both in the presence and absence of large motion) using procedures that reduce image noise and artifacts. We implemented an in-house built generalized autocalibrating partially parallel acquisitions (GRAPPA)-accelerated diffusion tensor (DT) echo-planar imaging (EPI) sequence at 1.5T and 3T on 1600 patients between 1 month and 18 years old. To reconstruct the data, we developed a fully automated tailored reconstruction software that selects the best GRAPPA and ghost calibration weights; does 3D rigid-body realignment with importance weighting; and employs phase correction and complex averaging to lower Rician noise and reduce phase artifacts. For select cases we investigated the use of an additional volume rejection criterion and b-matrix correction for large motion. The DTI image reconstruction procedures developed here were extremely robust in correcting for motion, failing on only three subjects, while providing the radiologists high-quality data for routine evaluation. This work suggests that, apart from the rare instance of continuous motion throughout the scan, high-quality DTI brain data can be acquired using our proposed integrated sequence and reconstruction that uses a retrospective approach to motion correction. In addition, we demonstrate a substantial improvement in overall image quality by combining phase correction with complex averaging, which reduces the Rician noise that biases noisy data. Copyright © 2012 Wiley Periodicals, Inc.

  14. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  15. Parallel fabrication of macroporous scaffolds.

    PubMed

    Dobos, Andrew; Grandhi, Taraka Sai Pavan; Godeshala, Sudhakar; Meldrum, Deirdre R; Rege, Kaushal

    2018-07-01

    Scaffolds generated from naturally occurring and synthetic polymers have been investigated in several applications because of their biocompatibility and tunable chemo-mechanical properties. Existing methods for generation of 3D polymeric scaffolds typically cannot be parallelized, suffer from low throughputs, and do not allow for quick and easy removal of the fragile structures that are formed. Current molds used in hydrogel and scaffold fabrication using solvent casting and porogen leaching are often single-use and do not facilitate 3D scaffold formation in parallel. Here, we describe a simple device and related approaches for the parallel fabrication of macroporous scaffolds. This approach was employed for the generation of macroporous and non-macroporous materials in parallel, in higher throughput and allowed for easy retrieval of these 3D scaffolds once formed. In addition, macroporous scaffolds with interconnected as well as non-interconnected pores were generated, and the versatility of this approach was employed for the generation of 3D scaffolds from diverse materials including an aminoglycoside-derived cationic hydrogel ("Amikagel"), poly(lactic-co-glycolic acid) or PLGA, and collagen. Macroporous scaffolds generated using the device were investigated for plasmid DNA binding and cell loading, indicating the use of this approach for developing materials for different applications in biotechnology. Our results demonstrate that the device-based approach is a simple technology for generating scaffolds in parallel, which can enhance the toolbox of current fabrication techniques. © 2018 Wiley Periodicals, Inc.

  16. Parallel solution of sparse one-dimensional dynamic programming problems

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1989-01-01

    Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.

  17. Adaptive parallel logic networks

    NASA Technical Reports Server (NTRS)

    Martinez, Tony R.; Vidal, Jacques J.

    1988-01-01

    Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

  18. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    NASA Astrophysics Data System (ADS)

    Nash, Thomas

    1989-12-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.

  19. Piezoelectric and pyroelectric properties of PZT/P(VDF-TrFE) composites with constituent phases poled in parallel or antiparallel directions.

    PubMed

    Ng, K L; Chan, H L; Choy, C L

    2000-01-01

    Composites of lead zirconate titanate (PZT) powder dispersed in a vinylidene fluoride-trifluoroethylene copolymer [P(VDF-TrFE)] matrix have been prepared by compression molding. Three groups of polarized samples have been prepared by poling: only the ceramic phase, the ceramic and polymer phases in parallel directions, and the two phases in antiparallel directions. The measured permittivities of the unpoled composites are consistent with the predictions of the Bruggeman model. The changes in the pyroelectric and piezoelectric coefficients of the poled composites with increasing ceramic volume fraction can be described by modified linear mixture rules. When the ceramic and copolymer phases are poled in the same direction, their pyroelectric activities reinforce while their piezoelectric activities partially cancel. However, when the ceramic and copolymer phases are poled in opposite directions, their piezoelectric activities reinforce while their pyroelectric activities partially cancel.

  20. Increasing processor utilization during parallel computation rundown

    NASA Technical Reports Server (NTRS)

    Jones, W. H.

    1986-01-01

    Some parallel processing environments provide for asynchronous execution and completion of general purpose parallel computations from a single computational phase. When all the computations from such a phase are complete, a new parallel computational phase is begun. Depending upon the granularity of the parallel computations to be performed, there may be a shortage of available work as a particular computational phase draws to a close (computational rundown). This can result in the waste of computing resources and the delay of the overall problem. In many practical instances, strict sequential ordering of phases of parallel computation is not totally required. In such cases, the beginning of one phase can be correctly computed before the end of a previous phase is completed. This allows additional work to be generated somewhat earlier to keep computing resources busy during each computational rundown. The conditions under which this can occur are identified and the frequency of occurrence of such overlapping in an actual parallel Navier-Stokes code is reported. A language construct is suggested and possible control strategies for the management of such computational phase overlapping are discussed.

  1. Modulated heat pulse propagation and partial transport barriers in chaotic magnetic fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castillo-Negrete, Diego del; Blazevski, Daniel

    2016-04-15

    Direct numerical simulations of the time dependent parallel heat transport equation modeling heat pulses driven by power modulation in three-dimensional chaotic magnetic fields are presented. The numerical method is based on the Fourier formulation of a Lagrangian-Green's function method that provides an accurate and efficient technique for the solution of the parallel heat transport equation in the presence of harmonic power modulation. The numerical results presented provide conclusive evidence that even in the absence of magnetic flux surfaces, chaotic magnetic field configurations with intermediate levels of stochasticity exhibit transport barriers to modulated heat pulse propagation. In particular, high-order islands andmore » remnants of destroyed flux surfaces (Cantori) act as partial barriers that slow down or even stop the propagation of heat waves at places where the magnetic field connection length exhibits a strong gradient. Results on modulated heat pulse propagation in fully stochastic fields and across magnetic islands are also presented. In qualitative agreement with recent experiments in large helical device and DIII-D, it is shown that the elliptic (O) and hyperbolic (X) points of magnetic islands have a direct impact on the spatio-temporal dependence of the amplitude of modulated heat pulses.« less

  2. Smart System for Bicarbonate Control in Irrigation for Hydroponic Precision Farming

    PubMed Central

    Cambra, Carlos; Lacuesta, Raquel

    2018-01-01

    Improving the sustainability in agriculture is nowadays an important challenge. The automation of irrigation processes via low-cost sensors can to spread technological advances in a sector very influenced by economical costs. This article presents an auto-calibrated pH sensor able to detect and adjust the imbalances in the pH levels of the nutrient solution used in hydroponic agriculture. The sensor is composed by a pH probe and a set of micropumps that sequentially pour the different liquid solutions to maintain the sensor calibration and the water samples from the channels that contain the nutrient solution. To implement our architecture, we use an auto-calibrated pH sensor connected to a wireless node. Several nodes compose our wireless sensor networks (WSN) to control our greenhouse. The sensors periodically measure the pH level of each hydroponic support and send the information to a data base (DB) which stores and analyzes the data to warn farmers about the measures. The data can then be accessed through a user-friendly, web-based interface that can be accessed through the Internet by using desktop or mobile devices. This paper also shows the design and test bench for both the auto-calibrated pH sensor and the wireless network to check their correct operation. PMID:29693611

  3. Smart System for Bicarbonate Control in Irrigation for Hydroponic Precision Farming.

    PubMed

    Cambra, Carlos; Sendra, Sandra; Lloret, Jaime; Lacuesta, Raquel

    2018-04-25

    Improving the sustainability in agriculture is nowadays an important challenge. The automation of irrigation processes via low-cost sensors can to spread technological advances in a sector very influenced by economical costs. This article presents an auto-calibrated pH sensor able to detect and adjust the imbalances in the pH levels of the nutrient solution used in hydroponic agriculture. The sensor is composed by a pH probe and a set of micropumps that sequentially pour the different liquid solutions to maintain the sensor calibration and the water samples from the channels that contain the nutrient solution. To implement our architecture, we use an auto-calibrated pH sensor connected to a wireless node. Several nodes compose our wireless sensor networks (WSN) to control our greenhouse. The sensors periodically measure the pH level of each hydroponic support and send the information to a data base (DB) which stores and analyzes the data to warn farmers about the measures. The data can then be accessed through a user-friendly, web-based interface that can be accessed through the Internet by using desktop or mobile devices. This paper also shows the design and test bench for both the auto-calibrated pH sensor and the wireless network to check their correct operation.

  4. Parallel regulation of feedforward inhibition and excitation during whisker map plasticity

    PubMed Central

    House, David RC; Elstrott, Justin; Koh, Eileen; Chung, Jason; Feldman, Daniel E.

    2011-01-01

    Sensory experience drives robust plasticity of sensory maps in cerebral cortex, but the role of inhibitory circuits in this process is not fully understood. We show that classical deprivation-induced whisker map plasticity in layer 2/3 (L2/3) of rat somatosensory (S1) cortex involves robust weakening of L4-L2/3 feedforward inhibition. This weakening was caused by reduced L4 excitation onto L2/3 fast-spiking (FS) interneurons, which mediate sensitive feedforward inhibition, and was partially offset by strengthening of unitary FS to L2/3 pyramidal cell synapses. Weakening of feedforward inhibition paralleled the known weakening of feedforward excitation, so that mean excitatory-inhibitory balance and timing onto L2/3 pyramidal cells were preserved. Thus, reduced feedforward inhibition is a covert compensatory process that can maintain excitatory-inhibitory balance during classical deprivation-induced Hebbian map plasticity. PMID:22153377

  5. Parallel computations and control of adaptive structures

    NASA Technical Reports Server (NTRS)

    Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

    1991-01-01

    The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.

  6. A parallel adaptive mesh refinement algorithm

    NASA Technical Reports Server (NTRS)

    Quirk, James J.; Hanebutte, Ulf R.

    1993-01-01

    Over recent years, Adaptive Mesh Refinement (AMR) algorithms which dynamically match the local resolution of the computational grid to the numerical solution being sought have emerged as powerful tools for solving problems that contain disparate length and time scales. In particular, several workers have demonstrated the effectiveness of employing an adaptive, block-structured hierarchical grid system for simulations of complex shock wave phenomena. Unfortunately, from the parallel algorithm developer's viewpoint, this class of scheme is quite involved; these schemes cannot be distilled down to a small kernel upon which various parallelizing strategies may be tested. However, because of their block-structured nature such schemes are inherently parallel, so all is not lost. In this paper we describe the method by which Quirk's AMR algorithm has been parallelized. This method is built upon just a few simple message passing routines and so it may be implemented across a broad class of MIMD machines. Moreover, the method of parallelization is such that the original serial code is left virtually intact, and so we are left with just a single product to support. The importance of this fact should not be underestimated given the size and complexity of the original algorithm.

  7. Equalizer: a scalable parallel rendering framework.

    PubMed

    Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato

    2009-01-01

    Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.

  8. Broadcasting a message in a parallel computer

    DOEpatents

    Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  9. Partial polarizer filter

    NASA Technical Reports Server (NTRS)

    Title, A. M. (Inventor)

    1978-01-01

    A birefringent filter module comprises, in seriatum. (1) an entrance polarizer, (2) a first birefringent crystal responsive to optical energy exiting the entrance polarizer, (3) a partial polarizer responsive to optical energy exiting the first polarizer, (4) a second birefringent crystal responsive to optical energy exiting the partial polarizer, and (5) an exit polarizer. The first and second birefringent crystals have fast axes disposed + or -45 deg from the high transmitivity direction of the partial polarizer. Preferably, the second crystal has a length 1/2 that of the first crystal and the high transmitivity direction of the partial polarizer is nine times as great as the low transmitivity direction. To provide tuning, the polarizations of the energy entering the first crystal and leaving the second crystal are varied by either rotating the entrance and exit polarizers, or by sandwiching the entrance and exit polarizers between pairs of half wave plates that are rotated relative to the polarizers. A plurality of the filter modules may be cascaded.

  10. The AIS-5000 parallel processor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schmitt, L.A.; Wilson, S.S.

    1988-05-01

    The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In thismore » paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.« less

  11. Fast I/O for Massively Parallel Applications

    NASA Technical Reports Server (NTRS)

    OKeefe, Matthew T.

    1996-01-01

    The two primary goals for this report were the design, contruction and modeling of parallel disk arrays for scientific visualization and animation, and a study of the IO requirements of highly parallel applications. In addition, further work in parallel display systems required to project and animate the very high-resolution frames resulting from our supercomputing simulations in ocean circulation and compressible gas dynamics.

  12. Parallel Algorithms for Groebner-Basis Reduction

    DTIC Science & Technology

    1987-09-25

    22209 ELEMENT NO. NO. NO. ACCESSION NO. 11. TITLE (Include Security Classification) * PARALLEL ALGORITHMS FOR GROEBNER -BASIS REDUCTION 12. PERSONAL...All other editions are obsolete. Productivity Engineering in the UNIXt Environment p Parallel Algorithms for Groebner -Basis Reduction Technical Report

  13. Implementation of Helioseismic Data Reduction and Diagnostic Techniques on Massively Parallel Architectures

    NASA Technical Reports Server (NTRS)

    Korzennik, Sylvain

    1997-01-01

    Under the direction of Dr. Rhodes, and the technical supervision of Dr. Korzennik, the data assimilation of high spatial resolution solar dopplergrams has been carried out throughout the program on the Intel Delta Touchstone supercomputer. With the help of a research assistant, partially supported by this grant, and under the supervision of Dr. Korzennik, code development was carried out at SAO, using various available resources. To ensure cross-platform portability, PVM was selected as the message passing library. A parallel implementation of power spectra computation for helioseismology data reduction, using PVM was successfully completed. It was successfully ported to SMP architectures (i.e. SUN), and to some MPP architectures (i.e. the CM5). Due to limitation of the implementation of PVM on the Cray T3D, the port to that architecture was not completed at the time.

  14. Address tracing for parallel machines

    NASA Technical Reports Server (NTRS)

    Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

    1991-01-01

    Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.

  15. Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Widlund, Olof B.

    2015-06-09

    The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less

  16. INVITED TOPICAL REVIEW: Parallel magnetic resonance imaging

    NASA Astrophysics Data System (ADS)

    Larkman, David J.; Nunes, Rita G.

    2007-04-01

    Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed.

  17. Scalable Parallel Density-based Clustering and Applications

    NASA Astrophysics Data System (ADS)

    Patwary, Mostofa Ali

    2014-04-01

    Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.

  18. Three-way analysis of the UPLC-PDA dataset for the multicomponent quantitation of hydrochlorothiazide and olmesartan medoxomil in tablets by parallel factor analysis and three-way partial least squares.

    PubMed

    Dinç, Erdal; Ertekin, Zehra Ceren

    2016-01-01

    An application of parallel factor analysis (PARAFAC) and three-way partial least squares (3W-PLS1) regression models to ultra-performance liquid chromatography-photodiode array detection (UPLC-PDA) data with co-eluted peaks in the same wavelength and time regions was described for the multicomponent quantitation of hydrochlorothiazide (HCT) and olmesartan medoxomil (OLM) in tablets. Three-way dataset of HCT and OLM in their binary mixtures containing telmisartan (IS) as an internal standard was recorded with a UPLC-PDA instrument. Firstly, the PARAFAC algorithm was applied for the decomposition of three-way UPLC-PDA data into the chromatographic, spectral and concentration profiles to quantify the concerned compounds. Secondly, 3W-PLS1 approach was subjected to the decomposition of a tensor consisting of three-way UPLC-PDA data into a set of triads to build 3W-PLS1 regression for the analysis of the same compounds in samples. For the proposed three-way analysis methods in the regression and prediction steps, the applicability and validity of PARAFAC and 3W-PLS1 models were checked by analyzing the synthetic mixture samples, inter-day and intra-day samples, and standard addition samples containing HCT and OLM. Two different three-way analysis methods, PARAFAC and 3W-PLS1, were successfully applied to the quantitative estimation of the solid dosage form containing HCT and OLM. Regression and prediction results provided from three-way analysis were compared with those obtained by traditional UPLC method. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Parallel multigrid smoothing: polynomial versus Gauss-Seidel

    NASA Astrophysics Data System (ADS)

    Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

    2003-07-01

    Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.

  20. Parallel Activation in Bilingual Phonological Processing

    ERIC Educational Resources Information Center

    Lee, Su-Yeon

    2011-01-01

    In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…

  1. Anatomic partial nephrectomy: technique evolution.

    PubMed

    Azhar, Raed A; Metcalfe, Charles; Gill, Inderbir S

    2015-03-01

    Partial nephrectomy provides equivalent long-term oncologic and superior functional outcomes as radical nephrectomy for T1a renal masses. Herein, we review the various vascular clamping techniques employed during minimally invasive partial nephrectomy, describe the evolution of our partial nephrectomy technique and provide an update on contemporary thinking about the impact of ischemia on renal function. Recently, partial nephrectomy surgical technique has shifted away from main artery clamping and towards minimizing/eliminating global renal ischemia during partial nephrectomy. Supported by high-fidelity three-dimensional imaging, novel anatomic-based partial nephrectomy techniques have recently been developed, wherein partial nephrectomy can now be performed with segmental, minimal or zero global ischemia to the renal remnant. Sequential innovations have included early unclamping, segmental clamping, super-selective clamping and now culminating in anatomic zero-ischemia surgery. By eliminating 'under-the-gun' time pressure of ischemia for the surgeon, these techniques allow an unhurried, tightly contoured tumour excision with point-specific sutured haemostasis. Recent data indicate that zero-ischemia partial nephrectomy may provide better functional outcomes by minimizing/eliminating global ischemia and preserving greater vascularized kidney volume. Contemporary partial nephrectomy includes a spectrum of surgical techniques ranging from conventional-clamped to novel zero-ischemia approaches. Technique selection should be tailored to each individual case on the basis of tumour characteristics, surgical feasibility, surgeon experience, patient demographics and baseline renal function.

  2. PARAVT: Parallel Voronoi tessellation code

    NASA Astrophysics Data System (ADS)

    González, R. E.

    2016-10-01

    In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.

  3. Parallelization and automatic data distribution for nuclear reactor simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less

  4. Numerical Prediction of CCV in a PFI Engine using a Parallel LES Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ameen, Muhsin M; Mirzaeian, Mohsen; Millo, Federico

    Cycle-to-cycle variability (CCV) is detrimental to IC engine operation and can lead to partial burn, misfire, and knock. Predicting CCV numerically is extremely challenging due to two key reasons. Firstly, high-fidelity methods such as large eddy simulation (LES) are required to accurately resolve the incylinder turbulent flowfield both spatially and temporally. Secondly, CCV is experienced over long timescales and hence the simulations need to be performed for hundreds of consecutive cycles. Ameen et al. (Int. J. Eng. Res., 2017) developed a parallel perturbation model (PPM) approach to dissociate this long time-scale problem into several shorter timescale problems. The strategy ismore » to perform multiple single-cycle simulations in parallel by effectively perturbing the initial velocity field based on the intensity of the in-cylinder turbulence. This strategy was demonstrated for motored engine and it was shown that the mean and variance of the in-cylinder flowfield was captured reasonably well by this approach. In the present study, this PPM approach is extended to simulate the CCV in a fired port-fuel injected (PFI) SI engine. Two operating conditions are considered – a medium CCV operating case corresponding to 2500 rpm and 16 bar BMEP and a low CCV case corresponding to 4000 rpm and 12 bar BMEP. The predictions from this approach are also shown to be similar to the consecutive LES cycles. Both the consecutive and PPM LES cycles are observed to under-predict the variability in the early stage of combustion. The parallel approach slightly underpredicts the cyclic variability at all stages of combustion as compared to the consecutive LES cycles. However, it is shown that the parallel approach is able to predict the coefficient of variation (COV) of the in-cylinder pressure and burn rate related parameters with sufficient accuracy, and is also able to predict the qualitative trends in CCV with changing operating conditions. The convergence of the statistics

  5. Topological switching between an alpha-beta parallel protein and a remarkably helical molten globule.

    PubMed

    Nabuurs, Sanne M; Westphal, Adrie H; aan den Toorn, Marije; Lindhoud, Simon; van Mierlo, Carlo P M

    2009-06-17

    Partially folded protein species transiently exist during folding of most proteins. Often these species are molten globules, which may be on- or off-pathway to native protein. Molten globules have a substantial amount of secondary structure but lack virtually all the tertiary side-chain packing characteristic of natively folded proteins. These ensembles of interconverting conformers are prone to aggregation and potentially play a role in numerous devastating pathologies, and thus attract considerable attention. The molten globule that is observed during folding of apoflavodoxin from Azotobacter vinelandii is off-pathway, as it has to unfold before native protein can be formed. Here we report that this species can be trapped under nativelike conditions by substituting amino acid residue F44 by Y44, allowing spectroscopic characterization of its conformation. Whereas native apoflavodoxin contains a parallel beta-sheet surrounded by alpha-helices (i.e., the flavodoxin-like or alpha-beta parallel topology), it is shown that the molten globule has a totally different topology: it is helical and contains no beta-sheet. The presence of this remarkably nonnative species shows that single polypeptide sequences can code for distinct folds that swap upon changing conditions. Topological switching between unrelated protein structures is likely a general phenomenon in the protein structure universe.

  6. Synchronization Of Parallel Discrete Event Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S.

    1992-01-01

    Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.

  7. Distributed parallel messaging for multiprocessor systems

    DOEpatents

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  8. Knowledge representation into Ada parallel processing

    NASA Technical Reports Server (NTRS)

    Masotto, Tom; Babikyan, Carol; Harper, Richard

    1990-01-01

    The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.

  9. Implementations of BLAST for parallel computers.

    PubMed

    Jülich, A

    1995-02-01

    The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.

  10. Parallel Algorithms for Image Analysis.

    DTIC Science & Technology

    1982-06-01

    8217 _ _ _ _ _ _ _ 4. TITLE (aid Subtitle) S. TYPE OF REPORT & PERIOD COVERED PARALLEL ALGORITHMS FOR IMAGE ANALYSIS TECHNICAL 6. PERFORMING O4G. REPORT NUMBER TR-1180...Continue on reverse side it neceesary aid Identlfy by block number) Image processing; image analysis ; parallel processing; cellular computers. 20... IMAGE ANALYSIS TECHNICAL 6. PERFORMING ONG. REPORT NUMBER TR-1180 - 7. AUTHOR(&) S. CONTRACT OR GRANT NUMBER(s) Azriel Rosenfeld AFOSR-77-3271 9

  11. Speeding up parallel processing

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.

  12. MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Taft, James R.

    1999-01-01

    Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.

  13. National Combustion Code: Parallel Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

    2000-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.

  14. Automatic Management of Parallel and Distributed System Resources

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

    1990-01-01

    Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

  15. A Domain Decomposition Parallelization of the Fast Marching Method

    NASA Technical Reports Server (NTRS)

    Herrmann, M.

    2003-01-01

    In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.

  16. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ...) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration proceeding... the receivership includes the resolution of claims made by customers; or (3) A petition filed under... any of the foregoing with knowledge of a parallel proceeding shall promptly notify the Commission, by...

  17. Phenomenological Partial Specific Volumes for G-Quadruplex DNAs

    PubMed Central

    Hellman, Lance M.; Rodgers, David W.; Fried, Michael G.

    2009-01-01

    Accurate partial specific volume (ν̄) values are required for sedimentation velocity and sedimentation equilibrium analyses. For nucleic acids, the estimation of these values is complicated by the fact that ν̄ depends on base composition, secondary structure, solvation and the concentrations and identities of ions in the surrounding buffer. Here we describe sedimentation equilibrium measurements of the apparent isopotential partial specific volume φ′ for two G-quadruplex DNAs and a single-stranded DNA of similar molecular weight and base composition. The G-quadruplex DNAs are a 22 nucleotide fragment of the human telomere consensus sequence and a 27 nucleotide fragment from the human c-myc promoter. The single-stranded DNA is 26 nucleotides long and is designed to have low propensity to form secondary structures. Parallel measurements were made in buffers containing NaCl and in buffers containing KCl, spanning the range 0.09M ≤ [salt] ≤ 2.3M. Limiting values of φ′, extrapolated to [salt] = 0M, were: 22-mer (NaCl-form), 0.525 ± 0.004 mL/g; 22-mer (KCl-form), 0.531 ± 0.006 mL/g; 27-mer (NaCl-form), 0.548 ± 0.005 mL/g; 27-mer (KCl-form), 0.557 ± 0.006 mL/g; 26-mer (NaCl-form), 0.555 ± 0.004 mL/g; 26-mer (KCl-form), 0.564 ± 0.006 mL/g. Small changes in φ′ with [salt] suggest that large changes in counterion association or hydration are unlikely to take place over these concentration ranges. PMID:19238377

  18. Parallel Implementation of the Discontinuous Galerkin Method

    NASA Technical Reports Server (NTRS)

    Baggag, Abdalkader; Atkins, Harold; Keyes, David

    1999-01-01

    This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.

  19. Data parallel sorting for particle simulation

    NASA Technical Reports Server (NTRS)

    Dagum, Leonardo

    1992-01-01

    Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.

  20. Parallel/distributed direct method for solving linear systems

    NASA Technical Reports Server (NTRS)

    Lin, Avi

    1990-01-01

    A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.

  1. S-HARP: A parallel dynamic spectral partitioner

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sohn, A.; Simon, H.

    1998-01-01

    Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less

  2. A bioconvection model for a squeezing flow of nanofluid between parallel plates in the presence of gyrotactic microorganisms

    NASA Astrophysics Data System (ADS)

    Bin-Mohsin, Bandar; Ahmed, Naveed; Adnan; Khan, Umar; Tauseef Mohyud-Din, Syed

    2017-04-01

    This article deals with the bioconvection flow in a parallel-plate channel. The plates are parallel and the flowing fluid is saturated with nanoparticles, and water is considered as a base fluid because microorganisms can survive only in water. A highly nonlinear and coupled system of partial differential equations presenting the model of bioconvection flow between parallel plates is reduced to a nonlinear and coupled system (nondimensional bioconvection flow model) of ordinary differential equations with the help of feasible nondimensional variables. In order to find the convergent solution of the system, a semi-analytical technique is utilized called variation of parameters method (VPM). Numerical solution is also computed and the Runge-Kutta scheme of fourth order is employed for this purpose. Comparison between these solutions has been made on the domain of interest and found to be in excellent agreement. Also, influence of various parameters has been discussed for the nondimensional velocity, temperature, concentration and density of the motile microorganisms both for suction and injection cases. Almost inconsequential influence of thermophoretic and Brownian motion parameters on the temperature field is observed. An interesting variation are inspected for the density of the motile microorganisms due to the varying bioconvection parameter in suction and injection cases. At the end, we make some concluding remarks in the light of this article.

  3. Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.

    PubMed

    Ruymgaart, A Peter; Elber, Ron

    2012-11-13

    We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).

  4. Computational efficiency of parallel combinatorial OR-tree searches

    NASA Technical Reports Server (NTRS)

    Li, Guo-Jie; Wah, Benjamin W.

    1990-01-01

    The performance of parallel combinatorial OR-tree searches is analytically evaluated. This performance depends on the complexity of the problem to be solved, the error allowance function, the dominance relation, and the search strategies. The exact performance may be difficult to predict due to the nondeterminism and anomalies of parallelism. The authors derive the performance bounds of parallel OR-tree searches with respect to the best-first, depth-first, and breadth-first strategies, and verify these bounds by simulation. They show that a near-linear speedup can be achieved with respect to a large number of processors for parallel OR-tree searches. Using the bounds developed, the authors derive sufficient conditions for assuring that parallelism will not degrade performance and necessary conditions for allowing parallelism to have a speedup greater than the ratio of the numbers of processors. These bounds and conditions provide the theoretical foundation for determining the number of processors required to assure a near-linear speedup.

  5. The Parallel Axiom

    ERIC Educational Resources Information Center

    Rogers, Pat

    1972-01-01

    Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)

  6. Partially Observed Mixtures of IRT Models: An Extension of the Generalized Partial-Credit Model

    ERIC Educational Resources Information Center

    Von Davier, Matthias; Yamamoto, Kentaro

    2004-01-01

    The generalized partial-credit model (GPCM) is used frequently in educational testing and in large-scale assessments for analyzing polytomous data. Special cases of the generalized partial-credit model are the partial-credit model--or Rasch model for ordinal data--and the two parameter logistic (2PL) model. This article extends the GPCM to the…

  7. Observation of Bs-Bsbar Oscillations Using Partially Reconstructed Hadronic Bs Decays

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miles, Jeffrey Robert

    2008-02-01

    This thesis describes the contribution of partially reconstructed hadronic decays in the world's first observation of Bmore » $$0\\atop{s}$$-$$\\bar{B}$$$0\\atop{s}$$ oscillations. The analysis is a core member of a suite of closely related studies whose combined time-dependent measurement of the B$$0\\atop{s}$$-$$\\bar{B}$$$0\\atop{s}$$ oscillation frequency Δm s is of historic significance. Using a data sample of 1 fb -1 of p$$\\bar{p}$$ collisions at √s = 1.96 TeV collected with the CDF-II detector at the Fermilab Tevatron, they find signals of 3150 partially reconstructed hadronic B s decays from the combined decay channels B$$0\\atop{s}$$ → D*$$-\\atop{s}$$ π + and B$$0\\atop{s}$$ → D$$-\\atop{s}$$ ρ + with D$$-\\atop{s}$$ → Φπ -. These events are analyzed in parallel with 2000 fully reconstructed B$$0\\atop{s}$$ → D$$-\\atop{s}$$ π + (D$$-\\atop{s}$$ → Φπ -) decays. The treatment of the data is developed in stages of progressive complexity, using high-statistics samples of hadronic B 0and B + decays to study the attributes of partially reconstructed events. The analysis characterizes the data in mass and proper decay time, noting the potential of the partially reconstructed decays for precise measurement of B branching fractions and lifetimes, but consistently focusing on the effectiveness of the model for the oscillation measurement. They efficiently incorporate the measured quantities of each decay into a maximum likelihood fitting framework, from which they extract amplitude scans and a direct measurement of the oscillation frequency. The features of the amplitude scans are consistent with expected behavior, supporting the correctness of the calibrations for proper time uncertainty and flavor tagging dilution. The likelihood allows for the smooth combination of this analysis with results from other data samples, including 3500 fully reconstructed hadronic B s events and 61,500 partially reconstructed semileptonic B s events. The

  8. Analysis of composite ablators using massively parallel computation

    NASA Technical Reports Server (NTRS)

    Shia, David

    1995-01-01

    In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both

  9. k-t accelerated aortic 4D flow MRI in under two minutes: Feasibility and impact of resolution, k-space sampling patterns, and respiratory navigator gating on hemodynamic measurements.

    PubMed

    Bollache, Emilie; Barker, Alex J; Dolan, Ryan Scott; Carr, James C; van Ooij, Pim; Ahmadian, Rouzbeh; Powell, Alex; Collins, Jeremy D; Geiger, Julia; Markl, Michael

    2018-01-01

    To assess the performance of highly accelerated free-breathing aortic four-dimensional (4D) flow MRI acquired in under 2 minutes compared to conventional respiratory gated 4D flow. Eight k-t accelerated nongated 4D flow MRI (parallel MRI with extended and averaged generalized autocalibrating partially parallel acquisition kernels [PEAK GRAPPA], R = 5, TRes = 67.2 ms) using four k y -k z Cartesian sampling patterns (linear, center-out, out-center-out, random) and two spatial resolutions (SRes1 = 3.5 × 2.3 × 2.6 mm 3 , SRes2 = 4.5 × 2.3 × 2.6 mm 3 ) were compared in vitro (aortic coarctation flow phantom) and in 10 healthy volunteers, to conventional 4D flow (16 mm-navigator acceptance window; R = 2; TRes = 39.2 ms; SRes = 3.2 × 2.3 × 2.4 mm 3 ). The best k-t accelerated approach was further assessed in 10 patients with aortic disease. The k-t accelerated in vitro aortic peak flow (Qmax), net flow (Qnet), and peak velocity (Vmax) were lower than conventional 4D flow indices by ≤4.7%, ≤ 11%, and ≤22%, respectively. In vivo k-t accelerated acquisitions were significantly shorter but showed a trend to lower image quality compared to conventional 4D flow. Hemodynamic indices for linear and out-center-out k-space samplings were in agreement with conventional 4D flow (Qmax ≤ 13%, Qnet ≤ 13%, Vmax ≤ 17%, P > 0.05). Aortic 4D flow MRI in under 2 minutes is feasible with moderate underestimation of flow indices. Differences in k-space sampling patterns suggest an opportunity to mitigate image artifacts by an optimal trade-off between scan time, acceleration, and k-space sampling. Magn Reson Med 79:195-207, 2018. © 2018 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.

  10. Parallel processing in finite element structural analysis

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K.

    1987-01-01

    A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).

  11. Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

    NASA Technical Reports Server (NTRS)

    Sun, Xian-He

    1997-01-01

    Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm

  12. Design of co-existence parallel periodic surface structure induced by picosecond laser pulses on the Al/Ti multilayers

    NASA Astrophysics Data System (ADS)

    Petrović, Suzana; Peruško, D.; Kovač, J.; Panjan, P.; Mitrić, M.; Pjević, D.; Kovačević, A.; Jelenković, B.

    2017-09-01

    Formation of periodic nanostructures on the Ti/5x(Al/Ti)/Si multilayers induced by picosecond laser pulses is studied in order to better understand the formation of a laser-induced periodic surface structure (LIPSS). At fluence slightly below the ablation threshold, the formation of low spatial frequency-LIPSS (LSFL) oriented perpendicular to the direction of the laser polarization is observed on the irradiated area. Prolonged irradiation while scanning results in the formation of a high spatial frequency-LIPSS (HSFL), on top of the LSFLs, creating a co-existence parallel periodic structure. HSFL was oriented parallel to the incident laser polarization. Intermixing between the Al and Ti layers with the formation of Al-Ti intermetallic compounds was achieved during the irradiation. The intermetallic region was formed mostly within the heat affected zone of the sample. Surface segregation of aluminium with partial ablation of the top layer of titanium was followed by the formation of an ultra-thin Al2O3 film on the surface of the multi-layered structure.

  13. Enhanced Scattering of Diffuse Ions on Front of the Earth's Quasi-Parallel Bow Shock: a Case Study

    NASA Astrophysics Data System (ADS)

    Kis, A.; Matsukiyo, S.; Otsuka, F.; Hada, T.; Lemperger, I.; Dandouras, I. S.; Barta, V.; Facsko, G. I.

    2017-12-01

    In the analysis we present a case study of three energetic upstream ion events at the Earth's quasi-parallel bow shock based on multi-spacecraft data recorded by Cluster. The CIS-HIA instrument onboard Cluster provides partial energetic ion densities in 4 energy channels between 10 and 32 keV.The difference of the partial ion densities recorded by the individual spacecraft at various distances from the bow shock surface makes possible the determination of the spatial gradient of energetic ions.Using the gradient values we determined the spatial profile of the energetic ion partial densities as a function of distance from the bow shock and we calculated the e-folding distance and the diffusion coefficient for each event and each ion energy range. Results show that in two cases the scattering of diffuse ions takes place in a normal way, as "by the book", and the e-folding distance and diffusion coefficient values are comparable with previous results. On the other hand, in the third case the e-folding distance and the diffusion coefficient values are significantly lower, which suggests that in this case the scattering process -and therefore the diffusive shock acceleration (DSA) mechanism also- is much more efficient. Our analysis provides an explanation for this "enhanced" scattering process recorded in the third case.

  14. Parallel-Processing Test Bed For Simulation Software

    NASA Technical Reports Server (NTRS)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  15. [Three-dimensional parallel collagen scaffold promotes tendon extracellular matrix formation].

    PubMed

    Zheng, Zefeng; Shen, Weiliang; Le, Huihui; Dai, Xuesong; Ouyang, Hongwei; Chen, Weishan

    2016-03-01

    To investigate the effects of three-dimensional parallel collagen scaffold on the cell shape, arrangement and extracellular matrix formation of tendon stem cells. Parallel collagen scaffold was fabricated by unidirectional freezing technique, while random collagen scaffold was fabricated by freeze-drying technique. The effects of two scaffolds on cell shape and extracellular matrix formation were investigated in vitro by seeding tendon stem/progenitor cells and in vivo by ectopic implantation. Parallel and random collagen scaffolds were produced successfully. Parallel collagen scaffold was more akin to tendon than random collagen scaffold. Tendon stem/progenitor cells were spindle-shaped and unified orientated in parallel collagen scaffold, while cells on random collagen scaffold had disorder orientation. Two weeks after ectopic implantation, cells had nearly the same orientation with the collagen substance. In parallel collagen scaffold, cells had parallel arrangement, and more spindly cells were observed. By contrast, cells in random collagen scaffold were disorder. Parallel collagen scaffold can induce cells to be in spindly and parallel arrangement, and promote parallel extracellular matrix formation; while random collagen scaffold can induce cells in random arrangement. The results indicate that parallel collagen scaffold is an ideal structure to promote tendon repairing.

  16. Biomechanical Effects of Stiffness in Parallel With the Knee Joint During Walking.

    PubMed

    Shamaei, Kamran; Cenciarini, Massimo; Adams, Albert A; Gregorczyk, Karen N; Schiffman, Jeffrey M; Dollar, Aaron M

    2015-10-01

    The human knee behaves similarly to a linear torsional spring during the stance phase of walking with a stiffness referred to as the knee quasi-stiffness. The spring-like behavior of the knee joint led us to hypothesize that we might partially replace the knee joint contribution during stance by utilizing an external spring acting in parallel with the knee joint. We investigated the validity of this hypothesis using a pair of experimental robotic knee exoskeletons that provided an external stiffness in parallel with the knee joints in the stance phase. We conducted a series of experiments involving walking with the exoskeletons with four levels of stiffness, including 0%, 33%, 66%, and 100% of the estimated human knee quasi-stiffness, and a pair of joint-less replicas. The results indicated that the ankle and hip joints tend to retain relatively invariant moment and angle patterns under the effects of the exoskeleton mass, articulation, and stiffness. The results also showed that the knee joint responds in a way such that the moment and quasi-stiffness of the knee complex (knee joint and exoskeleton) remains mostly invariant. A careful analysis of the knee moment profile indicated that the knee moment could fully adapt to the assistive moment; whereas, the knee quasi-stiffness fully adapts to values of the assistive stiffness only up to ∼80%. Above this value, we found biarticular consequences emerge at the hip joint.

  17. Application of Direct Parallel Methods to Reconstruction and Forecasting Problems

    NASA Astrophysics Data System (ADS)

    Song, Changgeun

    Many important physical processes in nature are represented by partial differential equations. Numerical weather prediction in particular, requires vast computational resources. We investigate the significance of parallel processing technology to the real world problem of atmospheric prediction. In this paper we consider the classic problem of decomposing the observed wind field into the irrotational and nondivergent components. Recognizing the fact that on a limited domain this problem has a non-unique solution, Lynch (1989) described eight different ways to accomplish the decomposition. One set of elliptic equations is associated with the decomposition--this determines the initial nondivergent state for the forecast model. It is shown that the entire decomposition problem can be solved in a fraction of a second using multi-vector processor such as ALLIANT FX/8. Secondly, the barotropic model is used to track hurricanes. Also, one set of elliptic equations is solved to recover the streamfunction from the forecasted vorticity. A 72 h prediction of Elena is made while it is in the Gulf of Mexico. During this time the hurricane executes a dramatic re-curvature that is captured by the model. Furthermore, an improvement in the track prediction results when a simple assimilation strategy is used. This technique makes use of the wind fields in the 24 h period immediately preceding the initial time for the prediction. In this particular application, solutions to systems of elliptic equations are the center of the computational mechanics. We demonstrate that direct, parallel methods based on accelerated block cyclic reduction (BCR) significantly reduce the computational time required to solve the elliptic equations germane to the decomposition, the forecast and adjoint assimilation.

  18. Graphics applications utilizing parallel processing

    NASA Technical Reports Server (NTRS)

    Rice, John R.

    1990-01-01

    The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.

  19. Visualizing Parallel Computer System Performance

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.

    1988-01-01

    Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.

  20. Fast parallel algorithm for slicing STL based on pipeline

    NASA Astrophysics Data System (ADS)

    Ma, Xulong; Lin, Feng; Yao, Bo

    2016-05-01

    In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.

  1. Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

    NASA Technical Reports Server (NTRS)

    Fricker, David M.

    1997-01-01

    The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.

  2. A possibility of parallel and anti-parallel diffraction measurements on neu- tron diffractometer employing bent perfect crystal monochromator at the monochromatic focusing condition

    NASA Astrophysics Data System (ADS)

    Choi, Yong Nam; Kim, Shin Ae; Kim, Sung Kyu; Kim, Sung Baek; Lee, Chang-Hee; Mikula, Pavel

    2004-07-01

    In a conventional diffractometer having single monochromator, only one position, parallel position, is used for the diffraction experiment (i.e. detection) because the resolution property of the other one, anti-parallel position, is very poor. However, a bent perfect crystal (BPC) monochromator at monochromatic focusing condition can provide a quite flat and equal resolution property at both parallel and anti-parallel positions and thus one can have a chance to use both sides for the diffraction experiment. From the data of the FWHM and the Delta d/d measured on three diffraction geometries (symmetric, asymmetric compression and asymmetric expansion), we can conclude that the simultaneous diffraction measurement in both parallel and anti-parallel positions can be achieved.

  3. Broadcasting collective operation contributions throughout a parallel computer

    DOEpatents

    Faraj, Ahmad [Rochester, MN

    2012-02-21

    Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

  4. Parallel, Distributed Scripting with Python

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less

  5. Parallel optoelectronic trinary signed-digit division

    NASA Astrophysics Data System (ADS)

    Alam, Mohammad S.

    1999-03-01

    The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.

  6. A high-speed linear algebra library with automatic parallelism

    NASA Technical Reports Server (NTRS)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  7. Parallel Digital Phase-Locked Loops

    NASA Technical Reports Server (NTRS)

    Sadr, Ramin; Shah, Biren N.; Hinedi, Sami M.

    1995-01-01

    Wide-band microwave receivers of proposed type include digital phase-locked loops in which band-pass filtering and down-conversion of input signals implemented by banks of multirate digital filters operating in parallel. Called "parallel digital phase-locked loops" to distinguish them from other digital phase-locked loops. Systems conceived as cost-effective solution to problem of filtering signals at high sampling rates needed to accommodate wide input frequency bands. Each of M filters process 1/M of spectrum of signal.

  8. Fringe Capacitance of a Parallel-Plate Capacitor.

    ERIC Educational Resources Information Center

    Hale, D. P.

    1978-01-01

    Describes an experiment designed to measure the forces between charged parallel plates, and determines the relationship among the effective electrode area, the measured capacitance values, and the electrode spacing of a parallel plate capacitor. (GA)

  9. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele

    2001-01-01

    This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.

  10. Parallel community climate model: Description and user`s guide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

    This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less

  11. Spatial and temporal accuracy of asynchrony-tolerant finite difference schemes for partial differential equations at extreme scales

    NASA Astrophysics Data System (ADS)

    Kumari, Komal; Donzis, Diego

    2017-11-01

    Highly resolved computational simulations on massively parallel machines are critical in understanding the physics of a vast number of complex phenomena in nature governed by partial differential equations. Simulations at extreme levels of parallelism present many challenges with communication between processing elements (PEs) being a major bottleneck. In order to fully exploit the computational power of exascale machines one needs to devise numerical schemes that relax global synchronizations across PEs. This asynchronous computations, however, have a degrading effect on the accuracy of standard numerical schemes.We have developed asynchrony-tolerant (AT) schemes that maintain order of accuracy despite relaxed communications. We show, analytically and numerically, that these schemes retain their numerical properties with multi-step higher order temporal Runge-Kutta schemes. We also show that for a range of optimized parameters,the computation time and error for AT schemes is less than their synchronous counterpart. Stability of the AT schemes which depends upon history and random nature of delays, are also discussed. Support from NSF is gratefully acknowledged.

  12. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for themore » context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.« less

  13. Programming Probabilistic Structural Analysis for Parallel Processing Computer

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

    1991-01-01

    The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.

  14. Parallel processing of genomics data

    NASA Astrophysics Data System (ADS)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  15. Highly parallel sparse Cholesky factorization

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1990-01-01

    Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.

  16. Parallel Adaptive Mesh Refinement Library

    NASA Technical Reports Server (NTRS)

    Mac-Neice, Peter; Olson, Kevin

    2005-01-01

    Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.

  17. Single-agent parallel window search

    NASA Technical Reports Server (NTRS)

    Powley, Curt; Korf, Richard E.

    1991-01-01

    Parallel window search is applied to single-agent problems by having different processes simultaneously perform iterations of Iterative-Deepening-A(asterisk) (IDA-asterisk) on the same problem but with different cost thresholds. This approach is limited by the time to perform the goal iteration. To overcome this disadvantage, the authors consider node ordering. They discuss how global node ordering by minimum h among nodes with equal f = g + h values can reduce the time complexity of serial IDA-asterisk by reducing the time to perform the iterations prior to the goal iteration. Finally, the two ideas of parallel window search and node ordering are combined to eliminate the weaknesses of each approach while retaining the strengths. The resulting approach, called simply parallel window search, can be used to find a near-optimal solution quickly, improve the solution until it is optimal, and then finally guarantee optimality, depending on the amount of time available.

  18. Partial lesions of the intratemporal segment of the facial nerve: graft versus partial reconstruction.

    PubMed

    Bento, Ricardo F; Salomone, Raquel; Brito, Rubens; Tsuji, Robinson K; Hausen, Mariana

    2008-09-01

    In cases of partial lesions of the intratemporal segment of the facial nerve, should the surgeon perform an intraoperative partial reconstruction, or partially remove the injured segment and place a graft? We present results from partial lesion reconstruction on the intratemporal segment of the facial nerve. A retrospective study on 42 patients who presented partial lesions on the intratemporal segment of the facial nerve was performed between 1988 and 2005. The patients were divided into 3 groups based on the procedure used: interposition of the partial graft on the injured area of the nerve (group 1; 12 patients); keeping the preserved part and performing tubulization (group 2; 8 patients); and dividing the parts of the injured nerve (proximal and distal) and placing a total graft of the sural nerve (group 3; 22 patients). Fracture of the temporal bone was the most frequent cause of the lesion in all groups, followed by iatrogenic causes (p < 0.005). Those who obtained results lower than or equal to III on the House-Brackmann scale were 1 (8.3%) of the patients in group 1, none (0.0%) of the patients in group 2, and 15 (68.2%) of the patients in group 3 (p <0.001). The best surgical technique for therapy of a partial lesion of the facial nerve is still questionable. Among these 42 patients, the best results were those from the total graft of the facial nerve.

  19. Parallel Algorithms for Switching Edges in Heterogeneous Graphs.

    PubMed

    Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

    2017-06-01

    An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.

  20. Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆

    PubMed Central

    Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

    2017-01-01

    An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680

  1. Parallel software support for computational structural mechanics

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.

    1987-01-01

    The application of the parallel programming methodology known as the Force was conducted. Two application issues were addressed. The first involves the efficiency of the implementation and its completeness in terms of satisfying the needs of other researchers implementing parallel algorithms. Support for, and interaction with, other Computational Structural Mechanics (CSM) researchers using the Force was the main issue, but some independent investigation of the Barrier construct, which is extremely important to overall performance, was also undertaken. Another efficiency issue which was addressed was that of relaxing the strong synchronization condition imposed on the self-scheduled parallel DO loop. The Force was extended by the addition of logical conditions to the cases of a parallel case construct and by the inclusion of a self-scheduled version of this construct. The second issue involved applying the Force to the parallelization of finite element codes such as those found in the NICE/SPAR testbed system. One of the more difficult problems encountered is the determination of what information in COMMON blocks is actually used outside of a subroutine and when a subroutine uses a COMMON block merely as scratch storage for internal temporary results.

  2. P-HARP: A parallel dynamic spectral partitioner

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sohn, A.; Biswas, R.; Simon, H.D.

    1997-05-01

    Partitioning unstructured graphs is central to the parallel solution of problems in computational science and engineering. The authors have introduced earlier the sequential version of an inertial spectral partitioner called HARP which maintains the quality of recursive spectral bisection (RSB) while forming the partitions an order of magnitude faster than RSB. The serial HARP is known to be the fastest spectral partitioner to date, three to four times faster than similar partitioners on a variety of meshes. This paper presents a parallel version of HARP, called P-HARP. Two types of parallelism have been exploited: loop level parallelism and recursive parallelism.more » P-HARP has been implemented in MPI on the SGI/Cray T3E and the IBM SP2. Experimental results demonstrate that P-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.25 seconds on a 64-processor T3E. Experimental results further show that P-HARP can give nearly a 20-fold speedup on 64 processors. These results indicate that graph partitioning is no longer a major bottleneck that hinders the advancement of computational science and engineering for dynamically-changing real-world applications.« less

  3. A Technique to Facilitate Tooth Modification for Removable Partial Denture Prosthesis Guide Planes.

    PubMed

    Haeberle, C Brent; Abreu, Amara; Metzler, Kurt

    2016-07-01

    The technique in this article was developed to provide a means to create prepared guide planes of proper dimension to ensure a more stable and retentive removable partial denture prosthesis (RPDP) framework when providing this service for a patient. Using commonly found clinical materials, a paralleling device can be fabricated from the modified diagnostic cast of the patient's dental arch requiring an RPDP. Polymethyl methacrylate or composite added to an altered thermoplastic form can be positioned intraorally and used as a guide to predictably adjust tooth structure for guide planes. Since it can potentially minimize the number of impressions and diagnostic casts made during the procedure, this can help achieve the desired result more efficiently and quickly for the patient. © 2015 by the American College of Prosthodontists.

  4. The Complexity of Parallel Algorithms,

    DTIC Science & Technology

    1985-11-01

    programns have been written for se(luiential coiipn ters. Many p~eop~le want coimp ~ilers dihal. will c(nimpile t he, code for parallel machines, to avoid...between two vertices. We also rely on parallel algorithms for maintaining data structures and manipulating graphs. We do not go into the details of these...Jpatlis and maintain connected coimp ~onents. The routine is: - 35 .- ExtendPath(r, Q, V) begin P +-0; s 4- while there is a path in V - P from s to a vertex

  5. Supercomputing on massively parallel bit-serial architectures

    NASA Technical Reports Server (NTRS)

    Iobst, Ken

    1985-01-01

    Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.

  6. Implementation of parallel moment equations in NIMROD

    NASA Astrophysics Data System (ADS)

    Lee, Hankyu Q.; Held, Eric D.; Ji, Jeong-Young

    2017-10-01

    As collisionality is low (the Knudsen number is large) in many plasma applications, kinetic effects become important, particularly in parallel dynamics for magnetized plasmas. Fluid models can capture some kinetic effects when integral parallel closures are adopted. The adiabatic and linear approximations are used in solving general moment equations to obtain the integral closures. In this work, we present an effort to incorporate non-adiabatic (time-dependent) and nonlinear effects into parallel closures. Instead of analytically solving the approximate moment system, we implement exact parallel moment equations in the NIMROD fluid code. The moment code is expected to provide a natural convergence scheme by increasing the number of moments. Work in collaboration with the PSI Center and supported by the U.S. DOE under Grant Nos. DE-SC0014033, DE-SC0016256, and DE-FG02-04ER54746.

  7. Runtime support for parallelizing data mining algorithms

    NASA Astrophysics Data System (ADS)

    Jin, Ruoming; Agrawal, Gagan

    2002-03-01

    With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.

  8. Partial quantum information.

    PubMed

    Horodecki, Michał; Oppenheim, Jonathan; Winter, Andreas

    2005-08-04

    Information--be it classical or quantum--is measured by the amount of communication needed to convey it. In the classical case, if the receiver has some prior information about the messages being conveyed, less communication is needed. Here we explore the concept of prior quantum information: given an unknown quantum state distributed over two systems, we determine how much quantum communication is needed to transfer the full state to one system. This communication measures the partial information one system needs, conditioned on its prior information. We find that it is given by the conditional entropy--a quantity that was known previously, but lacked an operational meaning. In the classical case, partial information must always be positive, but we find that in the quantum world this physical quantity can be negative. If the partial information is positive, its sender needs to communicate this number of quantum bits to the receiver; if it is negative, then sender and receiver instead gain the corresponding potential for future quantum communication. We introduce a protocol that we term 'quantum state merging' which optimally transfers partial information. We show how it enables a systematic understanding of quantum network theory, and discuss several important applications including distributed compression, noiseless coding with side information, multiple access channels and assisted entanglement distillation.

  9. Parallel approach in RDF query processing

    NASA Astrophysics Data System (ADS)

    Vajgl, Marek; Parenica, Jan

    2017-07-01

    Parallel approach is nowadays a very cheap solution to increase computational power due to possibility of usage of multithreaded computational units. This hardware became typical part of nowadays personal computers or notebooks and is widely spread. This contribution deals with experiments how evaluation of computational complex algorithm of the inference over RDF data can be parallelized over graphical cards to decrease computational time.

  10. Parallelizing alternating direction implicit solver on GPUs

    USDA-ARS?s Scientific Manuscript database

    We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...

  11. National Combustion Code Parallel Performance Enhancements

    NASA Technical Reports Server (NTRS)

    Quealy, Angela; Benyo, Theresa (Technical Monitor)

    2002-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. The unstructured grid, reacting flow code uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC code to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This report describes recent parallel processing modifications to NCC that have improved the parallel scalability of the code, enabling a two hour turnaround for a 1.3 million element fully reacting combustion simulation on an SGI Origin 2000.

  12. Feasibility and accuracy of computational robot-assisted partial nephrectomy planning by virtual partial nephrectomy analysis.

    PubMed

    Isotani, Shuji; Shimoyama, Hirofumi; Yokota, Isao; China, Toshiyuki; Hisasue, Shin-ichi; Ide, Hisamitsu; Muto, Satoru; Yamaguchi, Raizo; Ukimura, Osamu; Horie, Shigeo

    2015-05-01

    To evaluate the feasibility and accuracy of virtual partial nephrectomy analysis, including a color-coded three-dimensional virtual surgical planning and a quantitative functional analysis, in predicting the surgical outcomes of robot-assisted partial nephrectomy. Between 2012 and 2014, 20 patients underwent virtual partial nephrectomy analysis before undergoing robot-assisted partial nephrectomy. Virtual partial nephrectomy analysis was carried out with the following steps: (i) evaluation of the arterial branch for selective clamping by showing the vascular-supplied area; (ii) simulation of the optimal surgical margin in precise segmented three-dimensional model for prediction of collecting system opening; and (iii) detailed volumetric analyses and estimates of postoperative renal function based on volumetric change. At operation, the surgeon identified the targeted artery and determined the surgical margin according to the virtual partial nephrectomy analysis. The surgical outcomes between the virtual partial nephrectomy analysis and the actual robot-assisted partial nephrectomy were compared. All 20 patients had negative cancer surgical margins and no urological complications. The tumor-specific renal arterial supply areas were shown in color-coded three-dimensional model visualization in all cases. The prediction value of collecting system opening was 85.7% for sensitivity and 100% for specificity. The predicted renal resection volume was significantly correlated with actual resected specimen volume (r(2) = 0.745, P < 0.001). The predicted estimated glomerular filtration rate was significantly correlated with actual postoperative estimated glomerular filtration rate (r(2) = 0.736, P < 0.001). Virtual partial nephrectomy analysis is able to provide the identification of tumor-specific renal arterial supply, prediction of collecting system opening and prediction of postoperative renal function. This technique might allow urologists to compare

  13. The BLAZE language: A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  14. The BLAZE language - A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  15. MULTIOBJECTIVE PARALLEL GENETIC ALGORITHM FOR WASTE MINIMIZATION

    EPA Science Inventory

    In this research we have developed an efficient multiobjective parallel genetic algorithm (MOPGA) for waste minimization problems. This MOPGA integrates PGAPack (Levine, 1996) and NSGA-II (Deb, 2000) with novel modifications. PGAPack is a master-slave parallel implementation of a...

  16. Partitioning problems in parallel, pipelined and distributed computing

    NASA Technical Reports Server (NTRS)

    Bokhari, S.

    1985-01-01

    The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.

  17. Optimisation of a parallel ocean general circulation model

    NASA Astrophysics Data System (ADS)

    Beare, M. I.; Stevens, D. P.

    1997-10-01

    This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  18. Parallel computation and the basis system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, G.R.

    1993-05-01

    A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communications costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis andmore » Parallel Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less

  19. The science of computing - Parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1985-01-01

    Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.

  20. Symmetry Breaking by Parallel Flow Shear

    NASA Astrophysics Data System (ADS)

    Li, Jiacong; Diamond, Patrick

    2015-11-01

    Plasma rotation is important in reducing turbulent transport, suppressing MHD instabilities, and is beneficial to confinement. Intrinsic rotation without an external momentum input is of interest for its plausible application on ITER. k∥ spectrum asymmetry is required for residual Reynolds stress that drives the intrinsic rotation. Parallel flows are reported in linear devices without magnetic shear. In CSDX, parallel flows are mostly peaked in the core [Thakur et al., 2014]; more robust flows and reversed profiles are seen in PANTA [Oldenburger, et al. 2012]. A novel mechanism for symmetry breaking in momentum transport is proposed. Magnetic shear or mean flow profile are not required. A seed parallel flow shear (PFS) sets the sign of residual stress by selecting certain modes to grow faster. The resulted spectrum imbalance leads to a nonzero residual stress, which further drives a parallel flow with ∇n as the free energy source, adding to the shear until saturated by diffusion. Balanced flow gradient is set by Π∥Res /χϕ . Residual stress is calculated for ITG turbulence and collisional drift wave turbulence where electron-ion and electron-neutral collisions are discussed and compared. Numerical simulation is proposed for testing the effect of PFS.

  1. Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

    2003-01-01

    Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.

  2. Rubus: A compiler for seamless and extensible parallelism.

    PubMed

    Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

    2017-01-01

    Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been

  3. Rubus: A compiler for seamless and extensible parallelism

    PubMed Central

    Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

    2017-01-01

    Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been

  4. 3D printed soft parallel actuator

    NASA Astrophysics Data System (ADS)

    Zolfagharian, Ali; Kouzani, Abbas Z.; Khoo, Sui Yang; Noshadi, Amin; Kaynak, Akif

    2018-04-01

    This paper presents a 3-dimensional (3D) printed soft parallel contactless actuator for the first time. The actuator involves an electro-responsive parallel mechanism made of two segments namely active chain and passive chain both 3D printed. The active chain is attached to the ground from one end and constitutes two actuator links made of responsive hydrogel. The passive chain, on the other hand, is attached to the active chain from one end and consists of two rigid links made of polymer. The actuator links are printed using an extrusion-based 3D-Bioplotter with polyelectrolyte hydrogel as printer ink. The rigid links are also printed by a 3D fused deposition modelling (FDM) printer with acrylonitrile butadiene styrene (ABS) as print material. The kinematics model of the soft parallel actuator is derived via transformation matrices notations to simulate and determine the workspace of the actuator. The printed soft parallel actuator is then immersed into NaOH solution with specific voltage applied to it via two contactless electrodes. The experimental data is then collected and used to develop a parametric model to estimate the end-effector position and regulate kinematics model in response to specific input voltage over time. It is observed that the electroactive actuator demonstrates expected behaviour according to the simulation of its kinematics model. The use of 3D printing for the fabrication of parallel soft actuators opens a new chapter in manufacturing sophisticated soft actuators with high dexterity and mechanical robustness for biomedical applications such as cell manipulation and drug release.

  5. A Chip-Capillary Hybrid Device for Automated Transfer of Sample Pre-Separated by Capillary Isoelectric Focusing to Parallel Capillary Gel Electrophoresis for Two-Dimensional Protein Separation

    PubMed Central

    Lu, Joann J.; Wang, Shili; Li, Guanbin; Wang, Wei; Pu, Qiaosheng; Liu, Shaorong

    2012-01-01

    In this report, we introduce a chip-capillary hybrid device to integrate capillary isoelectric focusing (CIEF) with parallel capillary sodium dodecyl sulfate – polyacrylamide gel electrophoresis (SDS-PAGE) or capillary gel electrophoresis (CGE) toward automating two-dimensional (2D) protein separations. The hybrid device consists of three chips that are butted together. The middle chip can be moved between two positions to re-route the fluidic paths, which enables the performance of CIEF and injection of proteins partially resolved by CIEF to CGE capillaries for parallel CGE separations in a continuous and automated fashion. Capillaries are attached to the other two chips to facilitate CIEF and CGE separations and to extend the effective lengths of CGE columns. Specifically, we illustrate the working principle of the hybrid device, develop protocols for producing and preparing the hybrid device, and demonstrate the feasibility of using this hybrid device for automated injection of CIEF-separated sample to parallel CGE for 2D protein separations. Potentials and problems associated with the hybrid device are also discussed. PMID:22830584

  6. MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

    NASA Astrophysics Data System (ADS)

    Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

    2018-02-01

    We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.

  7. Aerodynamic simulation on massively parallel systems

    NASA Technical Reports Server (NTRS)

    Haeuser, Jochem; Simon, Horst D.

    1992-01-01

    This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but

  8. Parallel/Vector Integration Methods for Dynamical Astronomy

    NASA Astrophysics Data System (ADS)

    Fukushima, Toshio

    1999-01-01

    This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).

  9. Screwless fixed detachable partial overdenture treatment for atrophic partial edentulism of the anterior maxilla.

    PubMed

    Flanagan, Dennis

    2008-01-01

    This is a case report of the restoration of a partially edentulous atrophic anterior maxilla and atrophic mandibular posterior ridges. This case report demonstrates one method for successful treatment of partial edentulism at No. 7 to 10, where interlock attachments on natural cuspids and mini dental implants support an acrylic-based screwless fixed detachable partial denture to provide lip support and masticatory function in the anterior maxilla. The presenting qualities of this case were similar to combination syndrome.

  10. Modelling parallel programs and multiprocessor architectures with AXE

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Fineman, Charles E.

    1991-01-01

    AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.

  11. Parallel Narrative Structure in Paul Harding's "Tinkers"

    ERIC Educational Resources Information Center

    Çirakli, Mustafa Zeki

    2014-01-01

    The present paper explores the implications of parallel narrative structure in Paul Harding's "Tinkers" (2009). Besides primarily recounting the two sets of parallel narratives, "Tinkers" also comprises of seemingly unrelated fragments such as excerpts from clock repair manuals and diaries. The main stories, however, told…

  12. NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism Correctness

    DTIC Science & Technology

    2011-12-16

    other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a...Lab affiliates National Instruments, NEC, Nokia , NVIDIA, and Samsung. NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism...concurrently update x, some of these CAS’s will fail and those parallel loop iterations will recompute their updates to x and try again. Consider the parallel

  13. Parallel Algorithms for Least Squares and Related Computations.

    DTIC Science & Technology

    1991-03-22

    for dense computations in linear algebra . The work has recently been published in a general reference book on parallel algorithms by SIAM. AFO SR...written his Ph.D. dissertation with the principal investigator. (See publication 6.) • Parallel Algorithms for Dense Linear Algebra Computations. Our...and describe and to put into perspective a selection of the more important parallel algorithms for numerical linear algebra . We give a major new

  14. Improving operating room productivity via parallel anesthesia processing.

    PubMed

    Brown, Michael J; Subramanian, Arun; Curry, Timothy B; Kor, Daryl J; Moran, Steven L; Rohleder, Thomas R

    2014-01-01

    Parallel processing of regional anesthesia may improve operating room (OR) efficiency in patients undergoes upper extremity surgical procedures. The purpose of this paper is to evaluate whether performing regional anesthesia outside the OR in parallel increases total cases per day, improve efficiency and productivity. Data from all adult patients who underwent regional anesthesia as their primary anesthetic for upper extremity surgery over a one-year period were used to develop a simulation model. The model evaluated pure operating modes of regional anesthesia performed within and outside the OR in a parallel manner. The scenarios were used to evaluate how many surgeries could be completed in a standard work day (555 minutes) and assuming a standard three cases per day, what was the predicted end-of-day time overtime. Modeling results show that parallel processing of regional anesthesia increases the average cases per day for all surgeons included in the study. The average increase was 0.42 surgeries per day. Where it was assumed that three cases per day would be performed by all surgeons, the days going to overtime was reduced by 43 percent with parallel block. The overtime with parallel anesthesia was also projected to be 40 minutes less per day per surgeon. Key limitations include the assumption that all cases used regional anesthesia in the comparisons. Many days may have both regional and general anesthesia. Also, as a case study, single-center research may limit generalizability. Perioperative care providers should consider parallel administration of regional anesthesia where there is a desire to increase daily upper extremity surgical case capacity. Where there are sufficient resources to do parallel anesthesia processing, efficiency and productivity can be significantly improved. Simulation modeling can be an effective tool to show practice change effects at a system-wide level.

  15. Line-drawing algorithms for parallel machines

    NASA Technical Reports Server (NTRS)

    Pang, Alex T.

    1990-01-01

    The fact that conventional line-drawing algorithms, when applied directly on parallel machines, can lead to very inefficient codes is addressed. It is suggested that instead of modifying an existing algorithm for a parallel machine, a more efficient implementation can be produced by going back to the invariants in the definition. Popular line-drawing algorithms are compared with two alternatives; distance to a line (a point is on the line if sufficiently close to it) and intersection with a line (a point on the line if an intersection point). For massively parallel single-instruction-multiple-data (SIMD) machines (with thousands of processors and up), the alternatives provide viable line-drawing algorithms. Because of the pixel-per-processor mapping, their performance is independent of the line length and orientation.

  16. Genetic algorithms using SISAL parallel programming language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tejada, S.

    1994-05-06

    Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.

  17. Nemesis I: Parallel Enhancements to ExodusII

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hennigan, Gary L.; John, Matthew S.; Shadid, John N.

    2006-03-28

    NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on files which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.

  18. Optics Program Modified for Multithreaded Parallel Computing

    NASA Technical Reports Server (NTRS)

    Lou, John; Bedding, Dave; Basinger, Scott

    2006-01-01

    A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.

  19. PISCES: An environment for parallel scientific computation

    NASA Technical Reports Server (NTRS)

    Pratt, T. W.

    1985-01-01

    The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.

  20. Run-time parallelization and scheduling of loops

    NASA Technical Reports Server (NTRS)

    Saltz, Joel H.; Mirchandaney, Ravi; Baxter, Doug

    1988-01-01

    The class of problems that can be effectively compiled by parallelizing compilers is discussed. This is accomplished with the doconsider construct which would allow these compilers to parallelize many problems in which substantial loop-level parallelism is available but cannot be detected by standard compile-time analysis. We describe and experimentally analyze mechanisms used to parallelize the work required for these types of loops. In each of these methods, a new loop structure is produced by modifying the loop to be parallelized. We also present the rules by which these loop transformations may be automated in order that they be included in language compilers. The main application area of the research involves problems in scientific computations and engineering. The workload used in our experiment includes a mixture of real problems as well as synthetically generated inputs. From our extensive tests on the Encore Multimax/320, we have reached the conclusion that for the types of workloads we have investigated, self-execution almost always performs better than pre-scheduling. Further, the improvement in performance that accrues as a result of global topological sorting of indices as opposed to the less expensive local sorting, is not very significant in the case of self-execution.

  1. Thread concept for automatic task parallelization in image analysis

    NASA Astrophysics Data System (ADS)

    Lueckenhaus, Maximilian; Eckstein, Wolfgang

    1998-09-01

    Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.

  2. Methods of parallel computation applied on granular simulations

    NASA Astrophysics Data System (ADS)

    Martins, Gustavo H. B.; Atman, Allbens P. F.

    2017-06-01

    Every year, parallel computing has becoming cheaper and more accessible. As consequence, applications were spreading over all research areas. Granular materials is a promising area for parallel computing. To prove this statement we study the impact of parallel computing in simulations of the BNE (Brazil Nut Effect). This property is due the remarkable arising of an intruder confined to a granular media when vertically shaken against gravity. By means of DEM (Discrete Element Methods) simulations, we study the code performance testing different methods to improve clock time. A comparison between serial and parallel algorithms, using OpenMP® is also shown. The best improvement was obtained by optimizing the function that find contacts using Verlet's cells.

  3. Parallel computation using boundary elements in solid mechanics

    NASA Technical Reports Server (NTRS)

    Chien, L. S.; Sun, C. T.

    1990-01-01

    The inherent parallelism of the boundary element method is shown. The boundary element is formulated by assuming the linear variation of displacements and tractions within a line element. Moreover, MACSYMA symbolic program is employed to obtain the analytical results for influence coefficients. Three computational components are parallelized in this method to show the speedup and efficiency in computation. The global coefficient matrix is first formed concurrently. Then, the parallel Gaussian elimination solution scheme is applied to solve the resulting system of equations. Finally, and more importantly, the domain solutions of a given boundary value problem are calculated simultaneously. The linear speedups and high efficiencies are shown for solving a demonstrated problem on Sequent Symmetry S81 parallel computing system.

  4. Randomized controlled trial of zonisamide for the treatment of refractory partial-onset seizures.

    PubMed

    Faught, E; Ayala, R; Montouris, G G; Leppik, I E

    2001-11-27

    Zonisamide is a sulfonamide antiepilepsy drug with sodium and calcium channel-blocking actions. Experience in Japan and a previous European double-blind study have demonstrated its efficacy against partial-onset seizures. A randomized, double-blind, placebo-controlled trial enrolling 203 patients was conducted at 20 United States sites to assess zonisamide efficacy and dose response as adjunctive therapy for refractory partial-onset seizures. Zonisamide dosages were elevated by 100 mg/d each week. The study design allowed parallel comparisons with placebo for three dosages and a final crossover to 400 mg/d of zonisamide for all patients. The primary efficacy comparison was change in seizure frequency from a 4-week placebo baseline to weeks 8 through 12 on blinded therapy. At 400 mg/d, zonisamide reduced the median frequency of all seizures by 40.5% from baseline, compared with a 9% reduction (p = 0.0009) with placebo treatment, and produced a > or =50% seizure reduction (responder rate) in 42% of patients. A dosage of 100 mg/d produced a 20.5% reduction in median seizure frequency (p = 0.038 compared with placebo) and a dosage of 200 mg/d produced a 24.7% reduction in median seizure frequency (p = 0.004 compared with placebo). Dropouts from adverse events (10%) did not differ from placebo (8.2%, NS). The only adverse event differing significantly from placebo was weight loss, though somnolence, anorexia, and ataxia were slightly more common with zonisamide treatment. Serum zonisamide concentrations rose with increasing dose. Zonisamide is effective and well tolerated as an adjunctive agent for refractory partial-onset seizures. The minimal effective dosage was 100 mg/d, but 400 mg/d was the most effective dosage.

  5. Parallel Continuous Flow: A Parallel Suffix Tree Construction Tool for Whole Genomes

    PubMed Central

    Farreras, Montse

    2014-01-01

    Abstract The construction of suffix trees for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become more complex everyday, requiring fast queries to multiple genomes. In this article, we present parallel continuous flow (PCF), a parallel suffix tree construction method that is suitable for very long genomes. We tested our method for the suffix tree construction of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input genome grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the human genome in 7 minutes using 172 processes. PMID:24597675

  6. Series and parallel arc-fault circuit interrupter tests.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnson, Jay Dean; Fresquez, Armando J.; Gudgel, Bob

    2013-07-01

    While the 2011 National Electrical Codeª (NEC) only requires series arc-fault protection, some arc-fault circuit interrupter (AFCI) manufacturers are designing products to detect and mitigate both series and parallel arc-faults. Sandia National Laboratories (SNL) has extensively investigated the electrical differences of series and parallel arc-faults and has offered possible classification and mitigation solutions. As part of this effort, Sandia National Laboratories has collaborated with MidNite Solar to create and test a 24-string combiner box with an AFCI which detects, differentiates, and de-energizes series and parallel arc-faults. In the case of the MidNite AFCI prototype, series arc-faults are mitigated by openingmore » the PV strings, whereas parallel arc-faults are mitigated by shorting the array. A range of different experimental series and parallel arc-fault tests with the MidNite combiner box were performed at the Distributed Energy Technologies Laboratory (DETL) at SNL in Albuquerque, NM. In all the tests, the prototype de-energized the arc-faults in the time period required by the arc-fault circuit interrupt testing standard, UL 1699B. The experimental tests confirm series and parallel arc-faults can be successfully mitigated with a combiner box-integrated solution.« less

  7. Parallelization of ARC3D with Computer-Aided Tools

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang; Hribar, Michelle; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    A series of efforts have been devoted to investigating methods of porting and parallelizing applications quickly and efficiently for new architectures, such as the SCSI Origin 2000 and Cray T3E. This report presents the parallelization of a CFD application, ARC3D, using the computer-aided tools, Cesspools. Steps of parallelizing this code and requirements of achieving better performance are discussed. The generated parallel version has achieved reasonably well performance, for example, having a speedup of 30 for 36 Cray T3E processors. However, this performance could not be obtained without modification of the original serial code. It is suggested that in many cases improving serial code and performing necessary code transformations are important parts for the automated parallelization process although user intervention in many of these parts are still necessary. Nevertheless, development and improvement of useful software tools, such as Cesspools, can help trim down many tedious parallelization details and improve the processing efficiency.

  8. Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang; VanderWijngaart, Rob F.

    2003-01-01

    We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.

  9. Graphical Representation of Parallel Algorithmic Processes

    DTIC Science & Technology

    1990-12-01

    interface with the AAARF main process . The source code for the AAARF class-common library is in the common subdi- rectory and consists of the following files... for public release; distribution unlimited AFIT/GCE/ENG/90D-07 Graphical Representation of Parallel Algorithmic Processes THESIS Presented to the...goal of this study is to develop an algorithm animation facility for parallel processes executing on different architectures, from multiprocessor

  10. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    NASA Technical Reports Server (NTRS)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  11. Partially confined configuration for the growth of semiconductor crystals from the melt in zero-gravity environment

    NASA Technical Reports Server (NTRS)

    Lagowski, J.; Gatos, H. C.; Dabkowski, F. P.

    1985-01-01

    A novel partially confined configuration is proposed for the crystal growth of semiconductors from the melt, including those with volatile constituents. A triangular prism is employed to contain the growth melt. Due to surface tension, the melt will acquire a cylindrical-like shape and thus contact the prism along three parallel lines. The three empty spaces between the cylindrical melt and the edges of the prism will accommodate the expansion of the solidifying semiconductor, and in the case of semiconductor compounds with a volatile constituent, will permit the presence of the desired vapor phase in contact with the melt for controlling the melt stoichiometry. Theoretical and experimental evidence in support of this new type of confinement is presented.

  12. Fast inner-volume imaging of the lumbar spine with a spatially focused excitation using a 3D-TSE sequence.

    PubMed

    Riffel, Philipp; Michaely, Henrik J; Morelli, John N; Paul, Dominik; Kannengiesser, Stephan; Schoenberg, Stefan O; Haneder, Stefan

    2015-04-01

    The purpose of this study was to evaluate the feasibility and technical quality of a zoomed three-dimensional (3D) turbo spin-echo (TSE) sampling perfection with application optimized contrasts using different flip-angle evolutions (SPACE) sequence of the lumbar spine. In this prospective feasibility study, nine volunteers underwent a 3-T magnetic resonance examination of the lumbar spine including 1) a conventional 3D T2-weighted (T2w) SPACE sequence with generalized autocalibrating partially parallel acquisition technique acceleration factor 2 and 2) a zoomed 3D T2w SPACE sequence with a reduced field of view (reduction factor 2). Images were evaluated with regard to image sharpness, signal homogeneity, and the presence of artifacts by two experienced radiologists. For quantitative analysis, signal-to-noise ratio (SNR) values were calculated. Image sharpness of anatomic structures was statistically significantly greater with zoomed SPACE (P < .0001), whereas the signal homogeneity was statistically significantly greater with conventional SPACE (cSPACE; P = .0003). There were no statistically significant differences in extent of artifacts. Acquisition times were 8:20 minutes for cSPACE and 6:30 minutes for zoomed SPACE. Readers 1 and 2 selected zSPACE as the preferred sequence in five of nine cases. In two of nine cases, both sequences were rated as equally preferred by both the readers. SNR values were statistically significantly greater with cSPACE. In comparison to a cSPACE sequences, zoomed SPACE imaging of the lumbar spine provides sharper images in conjunction with a 25% reduction in acquisition time. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.

  13. Partial least squares based identification of Duchenne muscular dystrophy specific genes.

    PubMed

    An, Hui-bo; Zheng, Hua-cheng; Zhang, Li; Ma, Lin; Liu, Zheng-yan

    2013-11-01

    Large-scale parallel gene expression analysis has provided a greater ease for investigating the underlying mechanisms of Duchenne muscular dystrophy (DMD). Previous studies typically implemented variance/regression analysis, which would be fundamentally flawed when unaccounted sources of variability in the arrays existed. Here we aim to identify genes that contribute to the pathology of DMD using partial least squares (PLS) based analysis. We carried out PLS-based analysis with two datasets downloaded from the Gene Expression Omnibus (GEO) database to identify genes contributing to the pathology of DMD. Except for the genes related to inflammation, muscle regeneration and extracellular matrix (ECM) modeling, we found some genes with high fold change, which have not been identified by previous studies, such as SRPX, GPNMB, SAT1, and LYZ. In addition, downregulation of the fatty acid metabolism pathway was found, which may be related to the progressive muscle wasting process. Our results provide a better understanding for the downstream mechanisms of DMD.

  14. NDL-v2.0: A new version of the numerical differentiation library for parallel architectures

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.

    2014-07-01

    We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180

  15. Parallel Architecture, Parallel Acquisition Cross-Linguistic Evidence from Nominal and Verbal Domains

    ERIC Educational Resources Information Center

    Sutton, Brett R.

    2017-01-01

    This dissertation explores parallels between Complementizer Phrase (CP) and Determiner Phrase (DP) semantics, syntax, and morphology--including similarities in case-assignment, subject-verb and possessor-possessum agreement, subject and possessor semantics, and overall syntactic structure--in first language acquisition. Applying theoretical…

  16. Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

    Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less

  17. Parallel heuristics for scalable community detection

    DOE PAGES

    Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth

    2015-08-14

    Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less

  18. Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction

    NASA Technical Reports Server (NTRS)

    Padovan, Joseph; Krishna, Lala; Gute, Douglas

    1997-01-01

    Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.

  19. Partial Deconvolution with Inaccurate Blur Kernel.

    PubMed

    Ren, Dongwei; Zuo, Wangmeng; Zhang, David; Xu, Jun; Zhang, Lei

    2017-10-17

    Most non-blind deconvolution methods are developed under the error-free kernel assumption, and are not robust to inaccurate blur kernel. Unfortunately, despite the great progress in blind deconvolution, estimation error remains inevitable during blur kernel estimation. Consequently, severe artifacts such as ringing effects and distortions are likely to be introduced in the non-blind deconvolution stage. In this paper, we tackle this issue by suggesting: (i) a partial map in the Fourier domain for modeling kernel estimation error, and (ii) a partial deconvolution model for robust deblurring with inaccurate blur kernel. The partial map is constructed by detecting the reliable Fourier entries of estimated blur kernel. And partial deconvolution is applied to wavelet-based and learning-based models to suppress the adverse effect of kernel estimation error. Furthermore, an E-M algorithm is developed for estimating the partial map and recovering the latent sharp image alternatively. Experimental results show that our partial deconvolution model is effective in relieving artifacts caused by inaccurate blur kernel, and can achieve favorable deblurring quality on synthetic and real blurry images.Most non-blind deconvolution methods are developed under the error-free kernel assumption, and are not robust to inaccurate blur kernel. Unfortunately, despite the great progress in blind deconvolution, estimation error remains inevitable during blur kernel estimation. Consequently, severe artifacts such as ringing effects and distortions are likely to be introduced in the non-blind deconvolution stage. In this paper, we tackle this issue by suggesting: (i) a partial map in the Fourier domain for modeling kernel estimation error, and (ii) a partial deconvolution model for robust deblurring with inaccurate blur kernel. The partial map is constructed by detecting the reliable Fourier entries of estimated blur kernel. And partial deconvolution is applied to wavelet-based and learning

  20. A partially reflecting random walk on spheres algorithm for electrical impedance tomography

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maire, Sylvain, E-mail: maire@univ-tln.fr; Simon, Martin, E-mail: simon@math.uni-mainz.de

    2015-12-15

    In this work, we develop a probabilistic estimator for the voltage-to-current map arising in electrical impedance tomography. This novel so-called partially reflecting random walk on spheres estimator enables Monte Carlo methods to compute the voltage-to-current map in an embarrassingly parallel manner, which is an important issue with regard to the corresponding inverse problem. Our method uses the well-known random walk on spheres algorithm inside subdomains where the diffusion coefficient is constant and employs replacement techniques motivated by finite difference discretization to deal with both mixed boundary conditions and interface transmission conditions. We analyze the global bias and the variance ofmore » the new estimator both theoretically and experimentally. Subsequently, the variance of the new estimator is considerably reduced via a novel control variate conditional sampling technique which yields a highly efficient hybrid forward solver coupling probabilistic and deterministic algorithms.« less

  1. On the transmission of partial information: inferences from movement-related brain potentials

    NASA Technical Reports Server (NTRS)

    Osman, A.; Bashore, T. R.; Coles, M. G.; Donchin, E.; Meyer, D. E.

    1992-01-01

    Results are reported from a new paradigm that uses movement-related brain potentials to detect response preparation based on partial information. The paradigm uses a hybrid choice-reaction go/nogo procedure in which decisions about response hand and whether to respond are based on separate stimulus attributes. A lateral asymmetry in the movement-related brain potential was found on nogo trials without overt movement. The direction of this asymmetry depended primarily on the signaled response hand rather than on properties of the stimulus. When the asymmetry first appeared was influenced by the time required to select the signaled hand, and when it began to differ on go and nogo trials was influenced by the time to decide whether to respond. These findings indicate that both stimulus attributes were processed in parallel and that the asymmetry reflected preparation of the response hand that began before the go/nogo decision was completed.

  2. Methods for producing partially digested restriction DNA fragments and for producing a partially modified PCR product

    DOEpatents

    Wong, Kwong-Kwok

    2000-01-01

    The present invention is an improved method of making a partially modified PCR product from a DNA fragment with a polymerase chain reaction (PCR). In a standard PCR process, the DNA fragment is combined with starting deoxynucleoside triphosphates, a primer, a buffer and a DNA polymerase in a PCR mixture. The PCR mixture is then reacted in the PCR producing copies of the DNA fragment. The improvement of the present invention is adding an amount of a modifier at any step prior to completion of the PCR process thereby randomly and partially modifying the copies of the DNA fragment as a partially modified PCR product. The partially modified PCR product may then be digested with an enzyme that cuts the partially modified PCR product at unmodified sites thereby producing an array of DNA restriction fragments.

  3. Image Processing Using a Parallel Architecture.

    DTIC Science & Technology

    1987-12-01

    ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than

  4. The cost of parallel consolidation into visual working memory.

    PubMed

    Rideaux, Reuben; Edwards, Mark

    2016-01-01

    A growing body of evidence indicates that information can be consolidated into visual working memory in parallel. Initially, it was suggested that color information could be consolidated in parallel while orientation was strictly limited to serial consolidation (Liu & Becker, 2013). However, we recently found evidence suggesting that both orientation and motion direction items can be consolidated in parallel, with different levels of accuracy (Rideaux, Apthorp, & Edwards, 2015). Here we examine whether there is a cost associated with parallel consolidation of orientation and direction information by comparing performance, in terms of precision and guess rate, on a target recall task where items are presented either sequentially or simultaneously. The results compellingly indicate that motion direction can be consolidated in parallel, but the evidence for orientation is less conclusive. Further, we find that there is a twofold cost associated with parallel consolidation of direction: Both the probability of failing to consolidate one (or both) item/s increases and the precision at which representations are encoded is reduced. Additionally, we find evidence indicating that the increased consolidation failure may be due to interference between items presented simultaneously, and is moderated by item similarity. These findings suggest that a biased competition model may explain differences in parallel consolidation between features.

  5. Parallel ptychographic reconstruction

    DOE PAGES

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; ...

    2014-12-19

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps tomore » take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.« less

  6. Search asymmetries: parallel processing of uncertain sensory information.

    PubMed

    Vincent, Benjamin T

    2011-08-01

    What is the mechanism underlying search phenomena such as search asymmetry? Two-stage models such as Feature Integration Theory and Guided Search propose parallel pre-attentive processing followed by serial post-attentive processing. They claim search asymmetry effects are indicative of finding pairs of features, one processed in parallel, the other in serial. An alternative proposal is that a 1-stage parallel process is responsible, and search asymmetries occur when one stimulus has greater internal uncertainty associated with it than another. While the latter account is simpler, only a few studies have set out to empirically test its quantitative predictions, and many researchers still subscribe to the 2-stage account. This paper examines three separate parallel models (Bayesian optimal observer, max rule, and a heuristic decision rule). All three parallel models can account for search asymmetry effects and I conclude that either people can optimally utilise the uncertain sensory data available to them, or are able to select heuristic decision rules which approximate optimal performance. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. ["Dual Guidance"? - parallel combination of ultrasound-guidance and nerve stimulation - Contra].

    PubMed

    Maecken, Tim

    2015-07-01

    Sonography is a highly user-dependent technology. It presupposes a considerable degree of sonoanatomic and sonographic knowledge and requires good practical skills of the examiner. Sonography allows the identification of the puncture target, observes the needle feed and assesses the spread pattern of the local anesthetic in real time. Peripheral electrical nerve stimulation (PNS) cannot offer these advantages to the same degree, but may allow nerve localization under difficult sonographic conditions. The combination of the two locating techniques is complex in its practical implementation. Partially, the use of one location technique is made even more difficult by the combination with the second. PNS in parallel to sonography serves primarily as a warning technology in the case of an invisible cannula tip. It should not be construed as a compensation technique for the lack of sonographic skills or knowledge. However, PNS may be helpful in the sense of a bridging technology as long as the user is aware of its limitations. © Georg Thieme Verlag Stuttgart · New York.

  8. Methanol partial oxidation reformer

    DOEpatents

    Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael

    1999-01-01

    A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.

  9. Methanol partial oxidation reformer

    DOEpatents

    Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael

    2001-01-01

    A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.

  10. The collisional drift mode in a partially ionized plasma. [in the F region

    NASA Technical Reports Server (NTRS)

    Hudson, M. K.; Kennel, C. F.

    1974-01-01

    The structure of the drift instability was examined in several density regimes. Let sub e be the total electron mean free path, k sub z the wave-vector component along the magnetic field, and the ratio of perpendicular ion diffusion to parallel electron streaming rates. At low densities (k sub z lambda 1) the drift mode is isothermal and should be treated kineticly. In the finite heat conduction regime square root of m/M k sub z Lambda sub 1) the drift instability threshold is reduced at low densities and increased at high densities as compared to the isothermal threshold. Finally, in the energy transfer limit (k sub z kambda sub e square root of m/M) the drift instability behaves adiabatically in a fully ionized plasma and isothermally in a partially ionized plasma for an ion-neutral to Coulomb collision frequency ratio.

  11. Parallelized Stochastic Cutoff Method for Long-Range Interacting Systems

    NASA Astrophysics Data System (ADS)

    Endo, Eishin; Toga, Yuta; Sasaki, Munetaka

    2015-07-01

    We present a method of parallelizing the stochastic cutoff (SCO) method, which is a Monte-Carlo method for long-range interacting systems. After interactions are eliminated by the SCO method, we subdivide a lattice into noninteracting interpenetrating sublattices. This subdivision enables us to parallelize the Monte-Carlo calculation in the SCO method. Such subdivision is found by numerically solving the vertex coloring of a graph created by the SCO method. We use an algorithm proposed by Kuhn and Wattenhofer to solve the vertex coloring by parallel computation. This method was applied to a two-dimensional magnetic dipolar system on an L × L square lattice to examine its parallelization efficiency. The result showed that, in the case of L = 2304, the speed of computation increased about 102 times by parallel computation with 288 processors.

  12. Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.

    PubMed

    Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas

    2016-04-01

    Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .

  13. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Davis, Kristan D.; Faraj, Daniel A.

    2014-07-22

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  14. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Davis, Kristan D; Faraj, Daniel A

    2013-07-09

    Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.

  15. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-09-02

    Eager send data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address.

  16. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-09-16

    Eager send data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address.

  17. Data decomposition method for parallel polygon rasterization considering load balancing

    NASA Astrophysics Data System (ADS)

    Zhou, Chen; Chen, Zhenjie; Liu, Yongxue; Li, Feixue; Cheng, Liang; Zhu, A.-xing; Li, Manchun

    2015-12-01

    It is essential to adopt parallel computing technology to rapidly rasterize massive polygon data. In parallel rasterization, it is difficult to design an effective data decomposition method. Conventional methods ignore load balancing of polygon complexity in parallel rasterization and thus fail to achieve high parallel efficiency. In this paper, a novel data decomposition method based on polygon complexity (DMPC) is proposed. First, four factors that possibly affect the rasterization efficiency were investigated. Then, a metric represented by the boundary number and raster pixel number in the minimum bounding rectangle was developed to calculate the complexity of each polygon. Using this metric, polygons were rationally allocated according to the polygon complexity, and each process could achieve balanced loads of polygon complexity. To validate the efficiency of DMPC, it was used to parallelize different polygon rasterization algorithms and tested on different datasets. Experimental results showed that DMPC could effectively parallelize polygon rasterization algorithms. Furthermore, the implemented parallel algorithms with DMPC could achieve good speedup ratios of at least 15.69 and generally outperformed conventional decomposition methods in terms of parallel efficiency and load balancing. In addition, the results showed that DMPC exhibited consistently better performance for different spatial distributions of polygons.

  18. Tolerance of a standard intact protein formula versus a partially hydrolyzed formula in healthy, term infants

    PubMed Central

    Berseth, Carol Lynn; Mitmesser, Susan Hazels; Ziegler, Ekhard E; Marunycz, John D; Vanderhoof, Jon

    2009-01-01

    Background Parents who perceive common infant behaviors as formula intolerance-related often switch formulas without consulting a health professional. Up to one-half of formula-fed infants experience a formula change during the first six months of life. Methods The objective of this study was to assess discontinuance due to study physician-assessed formula intolerance in healthy, term infants. Infants (335) were randomized to receive either a standard intact cow milk protein formula (INTACT) or a partially hydrolyzed cow milk protein formula (PH) in a 60 day non-inferiority trial. Discontinuance due to study physician-assessed formula intolerance was the primary outcome. Secondary outcomes included number of infants who discontinued for any reason, including parent-assessed. Results Formula intolerance between groups (INTACT, 12.3% vs. PH, 13.7%) was similar for infants who completed the study or discontinued due to study physician-assessed formula intolerance. Overall study discontinuance based on parent- vs. study physician-assessed intolerance for all infants (14.4 vs.11.1%) was significantly different (P = 0.001). Conclusion This study demonstrated no difference in infant tolerance of intact vs. partially hydrolyzed cow milk protein formulas for healthy, term infants over a 60-day feeding trial, suggesting nonstandard partially hydrolyzed formulas are not necessary as a first-choice for healthy infants. Parents frequently perceived infant behavior as formula intolerance, paralleling previous reports of unnecessary formula changes. Trial Registration clinicaltrials.gov: NCT00666120 PMID:19545360

  19. Parallel computer vision

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Uhr, L.

    1987-01-01

    This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.

  20. TECA: A Parallel Toolkit for Extreme Climate Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Prabhat, Mr; Ruebel, Oliver; Byna, Surendra

    2012-03-12

    We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.

  1. Developments in parallel grafts for aortic arch lesions.

    PubMed

    Kolvenbach, Ralf R; Rabin, Asaf; Karmeli, Ron; Alpaslan, Alper; Schwierz, Elizabeth

    2016-06-01

    Due to the shortage of commercially available off the shelf aortic arch grafts since the last years parallel grafts or chimney grafts have played an increasing role in the treatment of patients with aortic arch lesions. Although there are still issues with type endoleaks and gutters between the chimney graft and the aortic stent-graft remaining. We report our results with the Medtronic thoracic graft in combination with long self-expanding parallel grafts, to ensure an overlapping zone of more than 7 cm between the different grafts. Alternatively, sandwich configurations are used where a direct contact between the parallel graft and the aortic wall is avoided. We have placed a total of 65 parallel grafts into supra-aortic branches. In 21 cases chimney grafts were placed into the carotid artery, in most cases into the left common carotid artery. In 36 cases chimney grafts were placed into left subclavian artery. A maximum number of 4 parallel grafts were placed for total endovascular debranching. In addition, in 8 patients a parallel graft had to be placed into the innominate artery. There was a patency of 69% for all subclavian artery chimney grafts versus 73% for carotid artery parallel grafts. Of note is a stroke rate of 5.2% in all these cases. Only 2 of the patients with an occluded left subclavian artery chimney graft required a bypass procedure for arm claudication or ischemia. We had a primary type I endoleak rate of 28%. In almost 25% secondary interventions were required mainly to treat type I leaks, in those cases where the leak did not resolve spontaneously. The overall mortality rate was 3.5%. The results of parallel graft in the aortic arch are promising, but of major concern is still the high rate of type I endoleaks as well as the neurological complication rate, most probably due to catheter manipulation in patients with severe atherosclerotic arch lesions.

  2. Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers

    NASA Technical Reports Server (NTRS)

    Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

    1996-01-01

    This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.

  3. Parallel and fault-tolerant algorithms for hypercube multiprocessors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aykanat, C.

    1988-01-01

    Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less

  4. A Parallel Saturation Algorithm on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Ezekiel, Jonathan; Siminiceanu

    2007-01-01

    Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.

  5. A survey of parallel programming tools

    NASA Technical Reports Server (NTRS)

    Cheng, Doreen Y.

    1991-01-01

    This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.

  6. Application of partially coherent modes for studying generation of a Gaussian partially coherent laser beam

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Suvorov, A A

    2010-10-15

    The problem of steady-state generation of a Gaussian partially coherent beam in a stable-cavity laser is considered within the framework of the method of expansion of the radiation coherence function in partially coherent modes. We discuss the conditions whose fulfilment makes it possible to neglect the intermode beatings of the radiation field and the effect of the gain dispersion on the steady-state generation of multimode partially coherent radiation. Based on the simplified model, we solve the self-consistent problem of generation of a Gaussian partially coherent beam for the given laser pump conditions and the resonator parameters. The dependence of themore » beam characteristics (power, radius, etc.) on the active medium properties and the resonator parameters is obtained. (laser beams)« less

  7. [CMACPAR an modified parallel neuro-controller for control processes].

    PubMed

    Ramos, E; Surós, R

    1999-01-01

    CMACPAR is a Parallel Neurocontroller oriented to real time systems as for example Control Processes. Its characteristics are mainly a fast learning algorithm, a reduced number of calculations, great generalization capacity, local learning and intrinsic parallelism. This type of neurocontroller is used in real time applications required by refineries, hydroelectric centers, factories, etc. In this work we present the analysis and the parallel implementation of a modified scheme of the Cerebellar Model CMAC for the n-dimensional space projection using a mean granularity parallel neurocontroller. The proposed memory management allows for a significant memory reduction in training time and required memory size.

  8. File-access characteristics of parallel scientific workloads

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David; Purakayastha, Apratim; Best, Michael; Ellis, Carla Schlatter

    1995-01-01

    Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving the performance of parallel file systems. The design of a high-performance parallel file system requires a comprehensive understanding of the expected workload. Unfortunately, until recently, no general workload studies of parallel file systems have been conducted. The goal of the CHARISMA project was to remedy this problem by characterizing the behavior of several production workloads, on different machines, at the level of individual reads and writes. The first set of results from the CHARISMA project describe the workloads observed on an Intel iPSC/860 and a Thinking Machines CM-5. This paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences, isolating common trends and platform-dependent variances. Using this comparison, we are able to gain more insight into the general principles that should guide parallel file-system design.

  9. Parallelized reliability estimation of reconfigurable computer networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Das, Subhendu; Palumbo, Dan

    1990-01-01

    A parallelized system, ASSURE, for computing the reliability of embedded avionics flight control systems which are able to reconfigure themselves in the event of failure is described. ASSURE accepts a grammar that describes a reliability semi-Markov state-space. From this it creates a parallel program that simultaneously generates and analyzes the state-space, placing upper and lower bounds on the probability of system failure. ASSURE is implemented on a 32-node Intel iPSC/860, and has achieved high processor efficiencies on real problems. Through a combination of improved algorithms, exploitation of parallelism, and use of an advanced microprocessor architecture, ASSURE has reduced the execution time on substantial problems by a factor of one thousand over previous workstation implementations. Furthermore, ASSURE's parallel execution rate on the iPSC/860 is an order of magnitude faster than its serial execution rate on a Cray-2 supercomputer. While dynamic load balancing is necessary for ASSURE's good performance, it is needed only infrequently; the particular method of load balancing used does not substantially affect performance.

  10. Aggregation and Gelation of Aromatic Polyamides with Parallel and Anti-parallel Alignment of Molecular Dipole Along the Backbone

    NASA Astrophysics Data System (ADS)

    Zhu, Dan; Shang, Jing; Ye, Xiaodong; Shen, Jian

    2016-12-01

    The understanding of macromolecular structures and interactions is important but difficult, due to the facts that a macromolecules are of versatile conformations and aggregate states, which vary with environmental conditions and histories. In this work two polyamides with parallel or anti-parallel dipoles along the linear backbone, named as ABAB (parallel) and AABB (anti-parallel) have been studied. By using a combination of methods, the phase behaviors of the polymers during the aggregate and gelation, i.e., the forming or dissociation processes of nuclei and fibril, cluster of fibrils, and cluster-cluster aggregation have been revealed. Such abundant phase behaviors are dominated by the inter-chain interactions, including dispersion, polarity and hydrogen bonding, and correlatd with the solubility parameters of solvents, the temperature, and the polymer concentration. The results of X-ray diffraction and fast-mode dielectric relaxation indicate that AABB possesses more rigid conformation than ABAB, and because of that AABB aggregates are of long fibers while ABAB is of hairy fibril clusters, the gelation concentration in toluene is 1 w/v% for AABB, lower than the 3 w/v% for ABAB.

  11. Scalable Unix commands for parallel processors : a high-performance implementation.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ong, E.; Lusk, E.; Gropp, W.

    2001-06-22

    We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.

  12. Ultrascalable petaflop parallel supercomputer

    DOEpatents

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  13. Parallel Anisotropic Tetrahedral Adaptation

    NASA Technical Reports Server (NTRS)

    Park, Michael A.; Darmofal, David L.

    2008-01-01

    An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.

  14. Xyce parallel electronic simulator users guide, version 6.1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

  15. Xyce Parallel Electronic Simulator Users' Guide Version 6.8

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

  16. Xyce parallel electronic simulator users guide, version 6.0.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

  17. A compositional reservoir simulator on distributed memory parallel computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rame, M.; Delshad, M.

    1995-12-31

    This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less

  18. Increasing morphological complexity in multiple parallel lineages of the Crustacea

    PubMed Central

    Adamowicz, Sarah J.; Purvis, Andy; Wills, Matthew A.

    2008-01-01

    The prospect of finding macroevolutionary trends and rules in the history of life is tremendously appealing, but very few pervasive trends have been found. Here, we demonstrate a parallel increase in the morphological complexity of most of the deep lineages within a major clade. We focus on the Crustacea, measuring the morphological differentiation of limbs. First, we show a clear trend of increasing complexity among 66 free-living, ordinal-level taxa from the Phanerozoic fossil record. We next demonstrate that this trend is pervasive, occurring in 10 or 11 of 12 matched-pair comparisons (across five morphological diversity indices) between extinct Paleozoic and related Recent taxa. This clearly differentiates the pattern from the effects of lineage sorting. Furthermore, newly appearing taxa tend to have had more types of limbs and a higher degree of limb differentiation than the contemporaneous average, whereas those going extinct showed higher-than-average limb redundancy. Patterns of contemporary species diversity partially reflect the paleontological trend. These results provide a rare demonstration of a large-scale and probably driven trend occurring across multiple independent lineages and influencing both the form and number of species through deep time and in the present day. PMID:18347335

  19. Methanol partial oxidation reformer

    DOEpatents

    Ahmed, S.; Kumar, R.; Krumpelt, M.

    1999-08-17

    A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.

  20. Methanol partial oxidation reformer

    DOEpatents

    Ahmed, S.; Kumar, R.; Krumpelt, M.

    1999-08-24

    A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.