Andre, J.B.; Zaharchuk, G.; Fischbein, N.J.; Augustin, M.; Skare, S.; Straka, M.; Rosenberg, J.; Lansberg, M.G.; Kemp, S.; Wijman, C.A.C.; Albers, G.W.; Schwartz, N.E.; Bammer, R.
2012-01-01
BACKGROUND AND PURPOSE PI improves routine EPI-based DWI by enabling higher spatial resolution and reducing geometric distortion, though it remains unclear which of these is most important. We evaluated the relative contribution of these factors and assessed their ability to increase lesion conspicuity and diagnostic confidence by using a GRAPPA technique. MATERIALS AND METHODS Four separate DWI scans were obtained at 1.5T in 48 patients with independent variation of in-plane spatial resolution (1.88 mm2 versus 1.25 mm2) and/or reduction factor (R = 1 versus R = 3). A neuroradiologist with access to clinical history and additional imaging sequences provided a reference standard diagnosis for each case. Three blinded neuroradiologists assessed scans for abnormalities and also evaluated multiple imaging-quality metrics by using a 5-point ordinal scale. Logistic regression was used to determine the impact of each factor on subjective image quality and confidence. RESULTS Reference standard diagnoses in the patient cohort were acute ischemic stroke (n = 30), ischemic stroke with hemorrhagic conversion (n = 4), intraparenchymal hemorrhage (n = 9), or no acute lesion (n = 5). While readers preferred both a higher reduction factor and a higher spatial resolution, the largest effect was due to an increased reduction factor (odds ratio, 47 ± 16). Small lesions were more confidently discriminated from artifacts on R = 3 images. The diagnosis changed in 5 of 48 scans, always toward the reference standard reading and exclusively for posterior fossa lesions. CONCLUSIONS PI improves DWI primarily by reducing geometric distortion rather than by increasing spatial resolution. This outcome leads to a more accurate and confident diagnosis of small lesions. PMID:22403781
NASA Astrophysics Data System (ADS)
Lee, Mike M.; Cho, Byung Lok
2001-11-01
In this paper, we proposed a new First Partial product Addition (FPA) architecture with new compressor (or parallel counter) to CSA tree built in the process of adding partial product for improving speed in the fast parallel multiplier to improve the speed of calculating partial product by about 20% compared with existing parallel counter using full Adder. The new circuit reduces the CLA bit finding final sum by N/2 using the novel FPA architecture. A 5.14ns of multiplication speed of the 16X16 multiplier is obtained using 0.25um CMOS technology. The architecture of the multiplier is easily opted for pipeline design and demonstrates high speed performance.
The Force Singularity for Partially Immersed Parallel Plates
NASA Astrophysics Data System (ADS)
Bhatnagar, Rajat; Finn, Robert
2016-12-01
In earlier work, we provided a general description of the forces of attraction and repulsion, encountered by two parallel vertical plates of infinite extent and of possibly differing materials, when partially immersed in an infinite liquid bath and subject to surface tension forces. In the present study, we examine some unusual details of the exotic behavior that can occur at the singular configuration separating infinite rise from infinite descent of the fluid between the plates, as the plates approach each other. In connection with this singular behavior, we present also some new estimates on meniscus height details.
Solution of partial differential equations on vector and parallel computers
NASA Technical Reports Server (NTRS)
Ortega, J. M.; Voigt, R. G.
1985-01-01
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Polarization Imaging Apparatus with Auto-Calibration
NASA Technical Reports Server (NTRS)
Zou, Yingyin Kevin (Inventor); Zhao, Hongzhi (Inventor); Chen, Qiushui (Inventor)
2013-01-01
A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5 deg, a second variable phase retarder with its optical axis aligned 45 deg, a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I(sub 0), I(sub 1), I(sub 2) and I(sub 3) of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (pi,0), (pi,pi) and (pi/2,pi), respectively. Then four Stokes components of a Stokes image, S(sub 0), S(sub 1), S(sub 2) and S(sub 3) were calculated using the four intensity images.
Polarization imaging apparatus with auto-calibration
Zou, Yingyin Kevin; Zhao, Hongzhi; Chen, Qiushui
2013-08-20
A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5.degree., a second variable phase retarder with its optical axis aligned 45.degree., a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I.sub.0, I.sub.1, I.sub.2 and I.sub.3 of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (.pi.,0), (.pi.,.pi.) and (.pi./2,.pi.), respectively. Then four Stokes components of a Stokes image, S.sub.0, S.sub.1, S.sub.2 and S.sub.3 were calculated using the four intensity images.
Parallel Reconstruction Using Null Operations (PRUNO)
Zhang, Jian; Liu, Chunlei; Moseley, Michael E.
2011-01-01
A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Bankson, James A; Stafford, R Jason; Hazle, John D
2005-03-01
Magnetic resonance temperature imaging can be used to monitor the progress of thermal ablation therapies, increasing treatment efficacy and improving patient safety. High temporal resolution is important when therapies rapidly heat tissue, but many approaches to faster image acquisition compromise image resolution, slice coverage, or phase sensitivity. Partially parallel imaging techniques offer the potential for improved temporal resolution without forcing such concessions. Although these techniques perturb image phase, relative phase changes between dynamically acquired phase-sensitive images, such as those acquired for MR temperature imaging, can be reliably measured through partially parallel imaging techniques using reconstruction filters that remain constant across the series. Partially parallel and non-accelerated phase-difference-sensitive data can be obtained through arrays of surface coils using this method. Average phase differences measured through partially parallel and fully Fourier encoded images are virtually identical, while phase noise increases with g(sqrt)L as in standard partially parallel image acquisitions..
Software Compression for Partially Parallel Imaging with Multi-channels.
Huang, Feng; Vijayakumar, Sathya; Akao, James
2005-01-01
In magnetic resonance imaging, multi-channel phased array coils enjoy a high signal to noise ratio (SNR) and better parallel imaging performance. But with the increase in number of channels, the reconstruction time and requirement for computer memory become inevitable problems. In this work, principle component analysis is applied to reduce the size of data and protect the performance of parallel imaging. Clinical data collected using a 32-channel cardiac coil are used in the experiments. Experimental results show that the proposed method dramatically reduces the processing time without much damage to the reconstructed image.
Generating Parallel Execution Plans with a Partial Order Planner
1994-05-01
the atomic act ion assumptipon01 atid they c an be executed in parallel. Thie setniant ins of stents front the fact that th lie S1 11’s-it% yle repri...1976), O-PLAN (Currie & Tate and only if, for all conditions that are relevant 1991), MP, and MPI (Kambhampati 1994). The class to achieving G, the
NASA Technical Reports Server (NTRS)
Toomarian, N.; Fijany, A.; Barhen, J.
1993-01-01
Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.
A parallel performance study of the Cartesian method for partial differential equations on a sphere
Drake, J.B.; Coddington, M.P.
1997-04-01
A 3-D Cartesian method for integration of partial differential equations on a spherical surface is developed for parallel computation. The target computer architectures are distributed memory, message passing computers such as the Intel Paragon. The parallel algorithms are described along with mesh partitioning strategies. Performance of the algorithms is considered for a standard test case of the shallow water equations on the sphere. The authors find the computation time scale well with increasing numbers of processors.
NASA Technical Reports Server (NTRS)
Nguyen, Howard; Willacy, Karen; Allen, Mark
2012-01-01
KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.
NASA Astrophysics Data System (ADS)
Lyu, Jingyuan; Nakarmi, Ukash; Zhang, Chaoyi; Ying, Leslie
2016-05-01
This paper presents a new approach to highly accelerated dynamic parallel MRI using low rank matrix completion, partial separability (PS) model. In data acquisition, k-space data is moderately randomly undersampled at the center kspace navigator locations, but highly undersampled at the outer k-space for each temporal frame. In reconstruction, the navigator data is reconstructed from undersampled data using structured low-rank matrix completion. After all the unacquired navigator data is estimated, the partial separable model is used to obtain partial k-t data. Then the parallel imaging method is used to acquire the entire dynamic image series from highly undersampled data. The proposed method has shown to achieve high quality reconstructions with reduction factors up to 31, and temporal resolution of 29ms, when the conventional PS method fails.
Analysis and Modeling of Parallel Photovoltaic Systems under Partial Shading Conditions
NASA Astrophysics Data System (ADS)
Buddala, Santhoshi Snigdha
Since the industrial revolution, fossil fuels like petroleum, coal, oil, natural gas and other non-renewable energy sources have been used as the primary energy source. The consumption of fossil fuels releases various harmful gases into the atmosphere as byproducts which are hazardous in nature and they tend to deplete the protective layers and affect the overall environmental balance. Also the fossil fuels are bounded resources of energy and rapid depletion of these sources of energy, have prompted the need to investigate alternate sources of energy called renewable energy. One such promising source of renewable energy is the solar/photovoltaic energy. This work focuses on investigating a new solar array architecture with solar cells connected in parallel configuration. By retaining the structural simplicity of the parallel architecture, a theoretical small signal model of the solar cell is proposed and modeled to analyze the variations in the module parameters when subjected to partial shading conditions. Simulations were run in SPICE to validate the model implemented in Matlab. The voltage limitations of the proposed architecture are addressed by adopting a simple dc-dc boost converter and evaluating the performance of the architecture in terms of efficiencies by comparing it with the traditional architectures. SPICE simulations are used to compare the architectures and identify the best one in terms of power conversion efficiency under partial shading conditions.
NASA Technical Reports Server (NTRS)
Hunt, L. R.; Villarreal, Ramiro
1987-01-01
System theorists understand that the same mathematical objects which determine controllability for nonlinear control systems of ordinary differential equations (ODEs) also determine hypoellipticity for linear partial differentail equations (PDEs). Moreover, almost any study of ODE systems begins with linear systems. It is remarkable that Hormander's paper on hypoellipticity of second order linear p.d.e.'s starts with equations due to Kolmogorov, which are shown to be analogous to the linear PDEs. Eigenvalue placement by state feedback for a controllable linear system can be paralleled for a Kolmogorov equation if an appropriate type of feedback is introduced. Results concerning transformations of nonlinear systems to linear systems are similar to results for transforming a linear PDE to a Kolmogorov equation.
Parallelizing across time when solving time-dependent partial differential equations
Worley, P.H.
1991-09-01
The standard numerical algorithms for solving time-dependent partial differential equations (PDEs) are inherently sequential in the time direction. This paper describes algorithms for the time-accurate solution of certain classes of linear hyperbolic and parabolic PDEs that can be parallelized in both time and space and have serial complexities that are proportional to the serial complexities of the best known algorithms. The algorithms for parabolic PDEs are variants of the waveform relaxation multigrid method (WFMG) of Lubich and Ostermann where the scalar ordinary differential equations (ODEs) that make up the kernel of WFMG are solved using a cyclic reduction type algorithm. The algorithms for hyperbolic PDEs use the cyclic reduction algorithm to solve ODEs along characteristics. 43 refs.
NASA Astrophysics Data System (ADS)
Martin, I.; Tirado, F.; Vazquez, L.
We present a process to achieve the solution of the two dimensional nonlinear Schrödinger equation using a multigrid technique on a distributed memory machine. Some features about the multigrid technique as its good convergence and parallel properties are explained in this paper. This makes multigrid method the optimal one to solve the systems of equations arising at each time step from an implicit numerical scheme. We give some experimental results about the parallel numerical simulation of this equation on a message passing parallel machine.
Wang, Yilei; Pillai, Suresh Kumar Raman; Chan-Park, Mary B
2013-09-09
Single-walled carbon nanotubes (SWNTs) are widely thought to be a strong contender for next-generation printed electronic transistor materials. However, large-scale solution-based parallel assembly of SWNTs to obtain high-performance transistor devices is challenging. SWNTs have anisotropic properties and, although partial alignment of the nanotubes has been theoretically predicted to achieve optimum transistor device performance, thus far no parallel solution-based technique can achieve this. Herein a novel solution-based technique, the immersion-cum-shake method, is reported to achieve partially aligned SWNT networks using semiconductive (99% enriched) SWNTs (s-SWNTs). By immersing an aminosilane-treated wafer into a solution of nanotubes placed on a rotary shaker, the repetitive flow of the nanotube solution over the wafer surface during the deposition process orients the nanotubes toward the fluid flow direction. By adjusting the nanotube concentration in the solution, the nanotube density of the partially aligned network can be controlled; linear densities ranging from 5 to 45 SWNTs/μm are observed. Through control of the linear SWNT density and channel length, the optimum SWNT-based field-effect transistor devices achieve outstanding performance metrics (with an on/off ratio of ~3.2 × 10(4) and mobility 46.5 cm(2) /Vs). Atomic force microscopy shows that the partial alignment is uniform over an area of 20 × 20 mm(2) and confirms that the orientation of the nanotubes is mostly along the fluid flow direction, with a narrow orientation scatter characterized by a full width at half maximum (FWHM) of <15° for all but the densest film, which is 35°. This parallel process is large-scale applicable and exploits the anisotropic properties of the SWNTs, presenting a viable path forward for industrial adoption of SWNTs in printed, flexible, and large-area electronics.
NASA Astrophysics Data System (ADS)
Cikalova, Ulana; Schreiber, Jürgen; Hillmann, Susanne; Meyendorf, Norbert
2014-02-01
The magnetic Barkhausen Noise (BN) is well suited to evaluate the effects of mechanical stresses of ferromagnetic materials, e.g. the indirect detection of residual stress states. The most common causes for the occurrence of residual stresses are manufacturing processes, such as casting, welding, machining, forming, heat treatment, etc., consecutive repairs and design changes, and installation or assembly and overloads during the operating life of a construction. A significant calibration effort based on a set of reference values and/or test samples is needed for these measurements, which require a great deal of time and material resources. Additionally, it is impossible to determine the stress states of different components (σxx and σyy) at the surface. Therefore, a new auto-calibration method was developed to analyze two-dimensional stresses. A fixed calibration function based on defined parameters (determined experimentally) was applied. To adjust the auto-calibration function to the experimental reference values by varying functional parameters, a large number of measurement points were used. We present a method that can calculate, based on the multi-dimensional stress state at the measuring point, the stress components σxx and σyy for two perpendicular magnetization directions using the Barkhausen Noise effect.
Instrument Variables for Reducing Noise in Parallel MRI Reconstruction
Lin, Hong
2017-01-01
Generalized autocalibrating partially parallel acquisition (GRAPPA) has been a widely used parallel MRI technique. However, noise deteriorates the reconstructed image when reduction factor increases or even at low reduction factor for some noisy datasets. Noise, initially generated from scanner, propagates noise-related errors during fitting and interpolation procedures of GRAPPA to distort the final reconstructed image quality. The basic idea we proposed to improve GRAPPA is to remove noise from a system identification perspective. In this paper, we first analyze the GRAPPA noise problem from a noisy input-output system perspective; then, a new framework based on errors-in-variables (EIV) model is developed for analyzing noise generation mechanism in GRAPPA and designing a concrete method—instrument variables (IV) GRAPPA to remove noise. The proposed EIV framework provides possibilities that noiseless GRAPPA reconstruction could be achieved by existing methods that solve EIV problem other than IV method. Experimental results show that the proposed reconstruction algorithm can better remove the noise compared to the conventional GRAPPA, as validated with both of phantom and in vivo brain data. PMID:28197419
Single-shot magnetic resonance spectroscopic imaging with partial parallel imaging.
Posse, Stefan; Otazo, Ricardo; Tsai, Shang-Yueh; Yoshimoto, Akio Ernesto; Lin, Fa-Hsuan
2009-03-01
A magnetic resonance spectroscopic imaging (MRSI) pulse sequence based on proton-echo-planar-spectroscopic-imaging (PEPSI) is introduced that measures two-dimensional metabolite maps in a single excitation. Echo-planar spatial-spectral encoding was combined with interleaved phase encoding and parallel imaging using SENSE to reconstruct absorption mode spectra. The symmetrical k-space trajectory compensates phase errors due to convolution of spatial and spectral encoding. Single-shot MRSI at short TE was evaluated in phantoms and in vivo on a 3-T whole-body scanner equipped with a 12-channel array coil. Four-step interleaved phase encoding and fourfold SENSE acceleration were used to encode a 16 x 16 spatial matrix with a 390-Hz spectral width. Comparison with conventional PEPSI and PEPSI with fourfold SENSE acceleration demonstrated comparable sensitivity per unit time when taking into account g-factor-related noise increases and differences in sampling efficiency. LCModel fitting enabled quantification of inositol, choline, creatine, and N-acetyl-aspartate (NAA) in vivo with concentration values in the ranges measured with conventional PEPSI and SENSE-accelerated PEPSI. Cramer-Rao lower bounds were comparable to those obtained with conventional SENSE-accelerated PEPSI at the same voxel size and measurement time. This single-shot MRSI method is therefore suitable for applications that require high temporal resolution to monitor temporal dynamics or to reduce sensitivity to tissue movement.
Sparse Auto-Calibration for Radar Coincidence Imaging with Gain-Phase Errors
Zhou, Xiaoli; Wang, Hongqiang; Cheng, Yongqiang; Qin, Yuliang
2015-01-01
Radar coincidence imaging (RCI) is a high-resolution staring imaging technique without the limitation of relative motion between target and radar. The sparsity-driven approaches are commonly used in RCI, while the prior knowledge of imaging models needs to be known accurately. However, as one of the major model errors, the gain-phase error exists generally, and may cause inaccuracies of the model and defocus the image. In the present report, the sparse auto-calibration method is proposed to compensate the gain-phase error in RCI. The method can determine the gain-phase error as part of the imaging process. It uses an iterative algorithm, which cycles through steps of target reconstruction and gain-phase error estimation, where orthogonal matching pursuit (OMP) and Newton’s method are used, respectively. Simulation results show that the proposed method can improve the imaging quality significantly and estimate the gain-phase error accurately. PMID:26528981
Gerdes, Lee; Gerdes, Peter; Lee, Sung W; H Tegeler, Charles
2013-01-01
Disturbances of neural oscillation patterns have been reported with many disease states. We introduce methodology for HIRREM™ (high-resolution, relational, resonance-based electroencephalic mirroring), also known as Brainwave Optimization™, a noninvasive technology to facilitate relaxation and auto-calibration of neural oscillations. HIRREM is a precision-guided technology for allostatic therapeutics, intended to help the brain calibrate its own functional set points to optimize fitness. HIRREM technology collects electroencephalic data through two-channel recordings and delivers a series of audible musical tones in near real time. Choices of tone pitch and timing are made by mathematical algorithms, principally informed by the dominant frequency in successive instants of time, to permit resonance between neural oscillatory frequencies and the musical tones. Relaxation of neural oscillations through HIRREM appears to permit auto-calibration toward greater hemispheric symmetry and more optimized proportionation of regional spectral power. To illustrate an application of HIRREM, we present data from a randomized clinical trial of HIRREM as an intervention for insomnia (n = 19). On average, there was reduction of right-dominant temporal lobe high-frequency (23–36 Hz) EEG asymmetry over the course of eight successive HIRREM sessions. There was a trend for correlation between reduction of right temporal lobe dominance and magnitude of insomnia symptom reduction. Disturbances of neural oscillation have implications for both neuropsychiatric health and downstream peripheral (somatic) physiology. The possibility of noninvasive optimization for neural oscillatory set points through HIRREM suggests potentially multitudinous roles for this technology. Research is currently ongoing to further explore its potential applications and mechanisms of action. PMID:23532171
Auto-Calibration of SOL-ACES in the EUV Spectral Region
NASA Astrophysics Data System (ADS)
Schmidtke, G.; Brunner, R.; Eberhard, D.; Hofmann, A.; Klocke, U.; Knothe, M.; Konz, W.; Riedel, W.-J.; Wolf, H.
The Sol-ACES (SOLAR Auto-Calibrating EUV/UV Spectrometers) experiment is prepared to be flown with the ESA SOLAR payload to the International Space Station as planned for the Shuttle mission E1 in August 2006. Four grazing incidence spectrometers of planar geometry cover the wavelength range from 16-220 nm with a spectral resolution from 0.5-2.3 nm. These high-efficiency spectrometers will be re-calibrated by two three-signal ionization chambers to be operated with 44 band pass filters on routine during the mission. Re-measuring the filter transmissions with the spectrometers also allows a very accurate determination of the changing second (optical) order efficiencies of the spectrometers as well as the stray light contributions to the spectral recording in different wavelength ranges. In this context the primary requirements for measurements of high radiometric accuracy will be discussed in detail. - The absorption gases of the ionization chambers are neon, xenon and a mixture of 10 % nitric oxide and 90 % xenon. As the laboratory measurements show that by this method secondary effects can be determined to a high degree resulting in very accurate irradiance measurements that is ranging from 5 to 3 % in absolute terms depending on the wavelegth range.
NASA Astrophysics Data System (ADS)
Vijayalekshmy, S.; Rama Iyer, S.; Beevi, Bisharathu
2015-09-01
The output power from the photovoltaic (PV) array decreases and the array exhibit multiple peaks when it is subjected to partial shading (PS). The power loss in the PV array varies with the array configuration, physical location and the shading pattern. This paper compares the relative performance of a PV array consisting of a short string of three PV modules for two different configurations. The mismatch loss, shading loss, fill factor and the power loss due to the failure in tracking of the global maximum power point, of a series string with bypass diodes and short parallel string are analysed using MATLAB/Simulink model. The performance of the system is investigated for three different conditions of solar insolation for the same shading pattern. Results indicate that there is considerable power loss due to shading in a series string during PS than in a parallel string with same number of modules.
NASA Astrophysics Data System (ADS)
Sheikhnejad, Yahya; Hosseini, Reza; Saffar Avval, Majid
2017-02-01
In this study, steady state laminar ferroconvection through circular horizontal tube partially filled with porous media under constant heat flux is experimentally investigated. Transverse magnetic fields were applied on ferrofluid flow by two fixed parallel magnet bar positioned on a certain distance from beginning of the test section. The results show promising notable enhancement in heat transfer as a consequence of partially filled porous media and magnetic field, up to 2.2 and 1.4 fold enhancement were observed in heat transfer coefficient respectively. It was found that presence of both porous media and magnetic field simultaneously can highly improve heat transfer up to 2.4 fold. Porous media of course plays a major role in this configuration. Virtually, application of Magnetic field and porous media also insert higher pressure loss along the pipe which again porous media contribution is higher that magnetic field.
Schiller, Ofer; Burns, Kristin M; Sinha, Pranava; Cummings, Susan D
2012-02-01
Cor triatriatum sinister is an uncommon congenital cardiac defect that has rarely been described in association with left-sided partial anomalous pulmonary venous return. We present a case of such rare anatomy with multilevel obstruction that presented in infancy as cardiogenic shock. The patient underwent staged treatment with extracorporeal membrane oxygenation stabilization, catheter-based balloon dilatation of the cor triatriatum and atrial septostomy, followed by definitive surgical repair, with excellent result.
Xie, Jingsi; Lai, Peng; Huang, Feng; Li, Yu; Li, Debiao
2010-05-01
Radial sampling has been demonstrated to be potentially useful in cardiac magnetic resonance imaging because it is less susceptible to motion than Cartesian sampling. Nevertheless, its capability of imaging acceleration remains limited by undersampling-induced streaking artifacts. In this study, a self-calibrated reconstruction method was developed to suppress streaking artifacts for highly accelerated parallel radial acquisitions in cardiac magnetic resonance imaging. Two- (2D) and three-dimensional (3D) radial k-space data were collected from a phantom and healthy volunteers. Images reconstructed using the proposed method and the conventional regridding method were compared based on statistical analysis on a four-point scale imaging scoring. It was demonstrated that the proposed method can effectively remove undersampling streaking artifacts and significantly improve image quality (P<.05). With the use of the proposed method, image score (1-4, 1=poor, 2=good, 3=very good, 4=excellent) was improved from 2.14 to 3.34 with the use of an undersampling factor of 4 and from 1.09 to 2.5 with the use of an undersampling factor of 8. Our study demonstrates that the proposed reconstruction method is effective for highly accelerated cardiac imaging applications using parallel radial acquisitions without calibration data.
NASA Astrophysics Data System (ADS)
Ma, Sangback
In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The
NASA Astrophysics Data System (ADS)
Pereira, Tiago M. D.; Uitenbroek, Han
2015-02-01
The emergence of three-dimensional magneto-hydrodynamic simulations of stellar atmospheres has sparked a need for efficient radiative transfer codes to calculate detailed synthetic spectra. We present RH 1.5D, a massively parallel code based on the RH code and capable of performing Zeeman polarised multi-level non-local thermodynamical equilibrium calculations with partial frequency redistribution for an arbitrary amount of chemical species. The code calculates spectra from 3D, 2D or 1D atmospheric models on a column-by-column basis (or 1.5D). While the 1.5D approximation breaks down in the cores of very strong lines in an inhomogeneous environment, it is nevertheless suitable for a large range of scenarios and allows for faster convergence with finer control over the iteration of each simulation column. The code scales well to at least tens of thousands of CPU cores, and is publicly available. In the present work we briefly describe its inner workings, strategies for convergence optimisation, its parallelism, and some possible applications.
Kaufmann, Tobias; Völker, Stefan; Gunesch, Laura; Kübler, Andrea
2012-01-01
Brain–computer interfaces (BCI) based on event-related potentials (ERP) allow for selection of characters from a visually presented character-matrix and thus provide a communication channel for users with neurodegenerative disease. Although they have been topic of research for more than 20 years and were multiply proven to be a reliable communication method, BCIs are almost exclusively used in experimental settings, handled by qualified experts. This study investigates if ERP–BCIs can be handled independently by laymen without expert support, which is inevitable for establishing BCIs in end-user’s daily life situations. Furthermore we compared the classic character-by-character text entry against a predictive text entry (PTE) that directly incorporates predictive text into the character-matrix. N = 19 BCI novices handled a user-centered ERP–BCI application on their own without expert support. The software individually adjusted classifier weights and control parameters in the background, invisible to the user (auto-calibration). All participants were able to operate the software on their own and to twice correctly spell a sentence with the auto-calibrated classifier (once with PTE, once without). Our PTE increased spelling speed and, importantly, did not reduce accuracy. In sum, this study demonstrates feasibility of auto-calibrating ERP–BCI use, independently by laymen and the strong benefit of integrating predictive text directly into the character-matrix. PMID:22833713
Automatic High-Bandwidth Calibration and Reconstruction of Arbitrarily Sampled Parallel MRI
Aelterman, Jan; Naeyaert, Maarten; Gutierrez, Shandra; Luong, Hiep; Goossens, Bart; Pižurica, Aleksandra; Philips, Wilfried
2014-01-01
Today, many MRI reconstruction techniques exist for undersampled MRI data. Regularization-based techniques inspired by compressed sensing allow for the reconstruction of undersampled data that would lead to an ill-posed reconstruction problem. Parallel imaging enables the reconstruction of MRI images from undersampled multi-coil data that leads to a well-posed reconstruction problem. Autocalibrating pMRI techniques encompass pMRI techniques where no explicit knowledge of the coil sensivities is required. A first purpose of this paper is to derive a novel autocalibration approach for pMRI that allows for the estimation and use of smooth, but high-bandwidth coil profiles instead of a compactly supported kernel. These high-bandwidth models adhere more accurately to the physics of an antenna system. The second purpose of this paper is to demonstrate the feasibility of a parameter-free reconstruction algorithm that combines autocalibrating pMRI and compressed sensing. Therefore, we present several techniques for automatic parameter estimation in MRI reconstruction. Experiments show that a higher reconstruction accuracy can be had using high-bandwidth coil models and that the automatic parameter choices yield an acceptable result. PMID:24915203
NASA Astrophysics Data System (ADS)
Zhang, Y. Y.; Shao, Q. X.; Ye, A. Z.; Xing, H. T.; Xia, J.
2016-02-01
Integrated water system modeling is a feasible approach to understanding severe water crises in the world and promoting the implementation of integrated river basin management. In this study, a classic hydrological model (the time variant gain model: TVGM) was extended to an integrated water system model by coupling multiple water-related processes in hydrology, biogeochemistry, water quality, and ecology, and considering the interference of human activities. A parameter analysis tool, which included sensitivity analysis, autocalibration and model performance evaluation, was developed to improve modeling efficiency. To demonstrate the model performances, the Shaying River catchment, which is the largest highly regulated and heavily polluted tributary of the Huai River basin in China, was selected as the case study area. The model performances were evaluated on the key water-related components including runoff, water quality, diffuse pollution load (or nonpoint sources) and crop yield. Results showed that our proposed model simulated most components reasonably well. The simulated daily runoff at most regulated and less-regulated stations matched well with the observations. The average correlation coefficient and Nash-Sutcliffe efficiency were 0.85 and 0.70, respectively. Both the simulated low and high flows at most stations were improved when the dam regulation was considered. The daily ammonium-nitrogen (NH4-N) concentration was also well captured with the average correlation coefficient of 0.67. Furthermore, the diffuse source load of NH4-N and the corn yield were reasonably simulated at the administrative region scale. This integrated water system model is expected to improve the simulation performances with extension to more model functionalities, and to provide a scientific basis for the implementation in integrated river basin managements.
Carter, Shelly L.; Karanes, Chatchada; Costa, Luciano J.; Wu, Juan; Devine, Steven M.; Wingard, John R.; Aljitawi, Omar S.; Cutler, Corey S.; Jagasia, Madan H.; Ballen, Karen K.; Eapen, Mary; O'Donnell, Paul V.
2011-01-01
The Blood and Marrow Transplant Clinical Trials Network conducted 2 parallel multicenter phase 2 trials for individuals with leukemia or lymphoma and no suitable related donor. Reduced intensity conditioning (RIC) was used with either unrelated double umbilical cord blood (dUCB) or HLA-haploidentical related donor bone marrow (Haplo-marrow) transplantation. For both trials, the transplantation conditioning regimen incorporated cyclophosphamide, fludarabine, and 200 cGy of total body irradiation. The 1-year probabilities of overall and progression-free survival were 54% and 46%, respectively, after dUCB transplantation (n = 50) and 62% and 48%, respectively, after Haplo-marrow transplantation (n = 50). The day +56 cumulative incidence of neutrophil recovery was 94% after dUCB and 96% after Haplo-marrow transplantation. The 100-day cumulative incidence of grade II-IV acute GVHD was 40% after dUCB and 32% after Haplo-marrow transplantation. The 1-year cumulative incidences of nonrelapse mortality and relapse after dUCB transplantation were 24% and 31%, respectively, with corresponding results of 7% and 45%, respectively, after Haplo-marrow transplantation. These multicenter studies confirm the utility of dUCB and Haplo-marrow as alternative donor sources and set the stage for a multicenter randomized clinical trial to assess the relative efficacy of these 2 strategies. The trials are registered at www.clinicaltrials.gov under NCT00864227 (BMT CTN 0604) and NCT00849147 (BMT CTN 0603). PMID:21527516
Krogh, M.; Painter, J.; Hansen, C.
1996-10-01
Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Dinç, Erdal; Ertekin, Zehra Ceren
2016-01-01
An application of parallel factor analysis (PARAFAC) and three-way partial least squares (3W-PLS1) regression models to ultra-performance liquid chromatography-photodiode array detection (UPLC-PDA) data with co-eluted peaks in the same wavelength and time regions was described for the multicomponent quantitation of hydrochlorothiazide (HCT) and olmesartan medoxomil (OLM) in tablets. Three-way dataset of HCT and OLM in their binary mixtures containing telmisartan (IS) as an internal standard was recorded with a UPLC-PDA instrument. Firstly, the PARAFAC algorithm was applied for the decomposition of three-way UPLC-PDA data into the chromatographic, spectral and concentration profiles to quantify the concerned compounds. Secondly, 3W-PLS1 approach was subjected to the decomposition of a tensor consisting of three-way UPLC-PDA data into a set of triads to build 3W-PLS1 regression for the analysis of the same compounds in samples. For the proposed three-way analysis methods in the regression and prediction steps, the applicability and validity of PARAFAC and 3W-PLS1 models were checked by analyzing the synthetic mixture samples, inter-day and intra-day samples, and standard addition samples containing HCT and OLM. Two different three-way analysis methods, PARAFAC and 3W-PLS1, were successfully applied to the quantitative estimation of the solid dosage form containing HCT and OLM. Regression and prediction results provided from three-way analysis were compared with those obtained by traditional UPLC method.
Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.
1995-05-01
Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.
Wald, Ingo; Ize, Santiago
2015-07-28
Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
Parallel machines: Parallel machine languages
Iannucci, R.A. )
1990-01-01
This book presents a framework for understanding the tradeoffs between the conventional view and the dataflow view with the objective of discovering the critical hardware structures which must be present in any scalable, general-purpose parallel computer to effectively tolerate latency and synchronization costs. The author presents an approach to scalable general purpose parallel computation. Linguistic Concerns, Compiling Issues, Intermediate Language Issues, and hardware/technological constraints are presented as a combined approach to architectural Develoement. This book presents the notion of a parallel machine language.
Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.
1995-09-01
In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.
The physics of parallel machines
NASA Technical Reports Server (NTRS)
Chan, Tony F.
1988-01-01
The idea is considered that architectures for massively parallel computers must be designed to go beyond supporting a particular class of algorithms to supporting the underlying physical processes being modelled. Physical processes modelled by partial differential equations (PDEs) are discussed. Also discussed is the idea that an efficient architecture must go beyond nearest neighbor mesh interconnections and support global and hierarchical communications.
Parallel pivoting combined with parallel reduction
NASA Technical Reports Server (NTRS)
Alaghband, Gita
1987-01-01
Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.
Tolerant (parallel) Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Special parallel processing workshop
1994-12-01
This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.
Iterative algorithms for large sparse linear systems on parallel computers
NASA Technical Reports Server (NTRS)
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Parallel rendering techniques for massively parallel visualization
Hansen, C.; Krogh, M.; Painter, J.
1995-07-01
As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.
Multilist Scheduling. A New Parallel Programming Model.
1993-07-30
fluid simulation [531; differential equation solving such as weather prediction [24, 25]; digital circuit simulation such as gate-level simulation [201...Champaign, 1986. [53] Johnson, C. Numerical Solutions of Partial Differential Equations by the Finite Element Method. Cambridge University Press, 1987. 131...Ortega, J. and Voigt, R. Solution of Partial Differential Equations on Vector and Parallel Computers. SIAM Review, vol. 27 (1985), pp. 149-240. [73
Parallel-In-Time For Moving Meshes
Falgout, R. D.; Manteuffel, T. A.; Southworth, B.; Schroder, J. B.
2016-02-04
With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
Parallel flow diffusion battery
Yeh, Hsu-Chi; Cheng, Yung-Sung
1984-08-07
A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
Parallel flow diffusion battery
Yeh, H.C.; Cheng, Y.S.
1984-01-01
A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
Fan, W.C.; Halbleib, J.A. Sr.
1996-09-01
This report provides a users` guide for parallel processing ITS on a UNIX workstation network, a shared-memory multiprocessor or a massively-parallel processor. The parallelized version of ITS is based on a master/slave model with message passing. Parallel issues such as random number generation, load balancing, and communication software are briefly discussed. Timing results for example problems are presented for demonstration purposes.
Research in parallel computing
NASA Technical Reports Server (NTRS)
Ortega, James M.; Henderson, Charles
1994-01-01
This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Parallel algorithm development
Adams, T.F.
1996-06-01
Rapid changes in parallel computing technology are causing significant changes in the strategies being used for parallel algorithm development. One approach is simply to write computer code in a standard language like FORTRAN 77 or with the expectation that the compiler will produce executable code that will run in parallel. The alternatives are: (1) to build explicit message passing directly into the source code; or (2) to write source code without explicit reference to message passing or parallelism, but use a general communications library to provide efficient parallel execution. Application of these strategies is illustrated with examples of codes currently under development.
Parallel Atomistic Simulations
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Parallel Adaptive Mesh Refinement
Diachin, L; Hornung, R; Plassmann, P; WIssink, A
2005-03-04
As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the
Visualization and Tracking of Parallel CFD Simulations
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kremenetsky, Mark
1995-01-01
We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Parallel digital forensics infrastructure.
Liebrock, Lorie M.; Duggan, David Patrick
2009-10-01
This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.
Introduction to Parallel Computing
1992-05-01
Topology C, Ada, C++, Data-parallel FORTRAN, 2D mesh of node boards, each node FORTRAN-90 (late 1992) board has 1 application processor Devopment Tools ...parallel machines become the wave of the present, tools are increasingly needed to assist programmers in creating parallel tasks and coordinating...their activities. Linda was designed to be such a tool . Linda was designed with three important goals in mind: to be portable, efficient, and easy to use
Parallel Wolff Cluster Algorithms
NASA Astrophysics Data System (ADS)
Bae, S.; Ko, S. H.; Coddington, P. D.
The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan
1994-01-01
A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Application Portable Parallel Library
NASA Technical Reports Server (NTRS)
Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott
1995-01-01
Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Parallel Algorithms and Patterns
Robey, Robert W.
2016-06-16
This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Parallel preconditioning techniques for sparse CG solvers
Basermann, A.; Reichel, B.; Schelthoff, C.
1996-12-31
Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.
Weening, J.S.
1988-05-01
CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.
Parallel and Distributed Computing.
1986-12-12
program was devoted to parallel and distributed computing . Support for this part of the program was obtained from the present Army contract and a...Umesh Vazirani. A workshop on parallel and distributed computing was held from May 19 to May 23, 1986 and drew 141 participants. Keywords: Mathematical programming; Protocols; Randomized algorithms. (Author)
Massively parallel mathematical sieves
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Totally parallel multilevel algorithms
NASA Technical Reports Server (NTRS)
Frederickson, Paul O.
1988-01-01
Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.
Not Available
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
A parallel Lanczos method for symmetric generalized eigenvalue problems
Wu, K.; Simon, H.D.
1997-12-01
Lanczos algorithm is a very effective method for finding extreme eigenvalues of symmetric matrices. It requires less arithmetic operations than similar algorithms, such as, the Arnoldi method. In this paper, the authors present their parallel version of the Lanczos method for symmetric generalized eigenvalue problem, PLANSO. PLANSO is based on a sequential package called LANSO which implements the Lanczos algorithm with partial re-orthogonalization. It is portable to all parallel machines that support MPI and easy to interface with most parallel computing packages. Through numerical experiments, they demonstrate that it achieves similar parallel efficiency as PARPACK, but uses considerably less time.
Some fast elliptic solvers on parallel architectures and their complexities
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Youcef
1989-01-01
The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR.
Some fast elliptic solvers on parallel architectures and their complexities
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Y.
1989-01-01
The discretization of separable elliptic partial differential equations leads to linear systems with special block tridiagonal matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconstant coefficients. A method was recently proposed to parallelize and vectorize BCR. In this paper, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational compelxity lower than that of parallel BCR.
Parallel adaptive wavelet collocation method for PDEs
Nejadmalayeri, Alireza; Vezolainen, Alexei; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allows fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.
Parallel adaptive wavelet collocation method for PDEs
NASA Astrophysics Data System (ADS)
Nejadmalayeri, Alireza; Vezolainen, Alexei; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allows fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 20483 using as many as 2048 CPU cores.
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
Foster, I.; Tuecke, S.
1991-09-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, a set of tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory at info.mcs.anl.gov.
ERIC Educational Resources Information Center
Rogers, Pat
1972-01-01
Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)
Scalable parallel communications
NASA Technical Reports Server (NTRS)
Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.
1992-01-01
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth
NASA Technical Reports Server (NTRS)
Reif, John H.
1987-01-01
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
Revisiting and parallelizing SHAKE
NASA Astrophysics Data System (ADS)
Weinbach, Yael; Elber, Ron
2005-10-01
An algorithm is presented for running SHAKE in parallel. SHAKE is a widely used approach to compute molecular dynamics trajectories with constraints. An essential step in SHAKE is the solution of a sparse linear problem of the type Ax = b, where x is a vector of unknowns. Conjugate gradient minimization (that can be done in parallel) replaces the widely used iteration process that is inherently serial. Numerical examples present good load balancing and are limited only by communication time.
... Jacksonian seizure; Seizure - partial (focal); Temporal lobe seizure; Epilepsy - partial seizures ... Abou-Khalil BW, Gallagher MJ, Macdonald RL. Epilepsies. In: Daroff ... Practice . 7th ed. Philadelphia, PA: Elsevier; 2016:chap 101. ...
NASA Technical Reports Server (NTRS)
Vranish, John M. (Inventor)
2010-01-01
A partial gear bearing including an upper half, comprising peak partial teeth, and a lower, or bottom, half, comprising valley partial teeth. The upper half also has an integrated roller section between each of the peak partial teeth with a radius equal to the gear pitch radius of the radially outwardly extending peak partial teeth. Conversely, the lower half has an integrated roller section between each of the valley half teeth with a radius also equal to the gear pitch radius of the peak partial teeth. The valley partial teeth extend radially inwardly from its roller section. The peak and valley partial teeth are exactly out of phase with each other, as are the roller sections of the upper and lower halves. Essentially, the end roller bearing of the typical gear bearing has been integrated into the normal gear tooth pattern.
Parallel algorithms for the spectral transform method
Foster, I.T.; Worley, P.H.
1997-05-01
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, the authors describe these different parallel algorithms and report on computational experiments that they have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations on a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. The authors focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but they also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional fast Fourier transforms (FFTs) and other parallel transforms.
Parallel algorithms for the spectral transform method
Foster, I.T.; Worley, P.H.
1994-04-01
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, we describe these different parallel algorithms and report on computational experiments that we have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations or a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. We focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional FFTs and other parallel transforms.
Parallel architectures for iterative methods on adaptive, block structured grids
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1983-01-01
A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Parallel architectures for vision
Maresca, M. ); Lavin, M.A. ); Li, H. )
1988-08-01
Vision computing involves the execution of a large number of operations on large sets of structured data. Sequential computers cannot achieve the speed required by most of the current applications and therefore parallel architectural solutions have to be explored. In this paper the authors examine the options that drive the design of a vision oriented computer, starting with the analysis of the basic vision computation and communication requirements. They briefly review the classical taxonomy for parallel computers, based on the multiplicity of the instruction and data stream, and apply a recently proposed criterion, the degree of autonomy of each processor, to further classify fine-grain SIMD massively parallel computers. They identify three types of processor autonomy, namely operation autonomy, addressing autonomy, and connection autonomy. For each type they give the basic definitions and show some examples. They focus on the concept of connection autonomy, which they believe is a key point in the development of massively parallel architectures for vision. They show two examples of parallel computers featuring different types of connection autonomy - the Connection Machine and the Polymorphic-Torus - and compare their cost and benefit.
Sublattice parallel replica dynamics
NASA Astrophysics Data System (ADS)
Martínez, Enrique; Uberuaga, Blas P.; Voter, Arthur F.
2014-06-01
Exascale computing presents a challenge for the scientific community as new algorithms must be developed to take full advantage of the new computing paradigm. Atomistic simulation methods that offer full fidelity to the underlying potential, i.e., molecular dynamics (MD) and parallel replica dynamics, fail to use the whole machine speedup, leaving a region in time and sample size space that is unattainable with current algorithms. In this paper, we present an extension of the parallel replica dynamics algorithm [A. F. Voter, Phys. Rev. B 57, R13985 (1998), 10.1103/PhysRevB.57.R13985] by combining it with the synchronous sublattice approach of Shim and Amar [Y. Shim and J. G. Amar, Phys. Rev. B 71, 125432 (2005), 10.1103/PhysRevB.71.125432], thereby exploiting event locality to improve the algorithm scalability. This algorithm is based on a domain decomposition in which events happen independently in different regions in the sample. We develop an analytical expression for the speedup given by this sublattice parallel replica dynamics algorithm and compare it with parallel MD and traditional parallel replica dynamics. We demonstrate how this algorithm, which introduces a slight additional approximation of event locality, enables the study of physical systems unreachable with traditional methodologies and promises to better utilize the resources of current high performance and future exascale computers.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A
2014-05-20
An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.
Shumaker, Dana E.; Steefel, Carl I.
2016-06-21
The code CRUNCH_PARALLEL is a parallel version of the CRUNCH code. CRUNCH code version 2.0 was previously released by LLNL, (UCRL-CODE-200063). Crunch is a general purpose reactive transport code developed by Carl Steefel and Yabusake (Steefel Yabsaki 1996). The code handles non-isothermal transport and reaction in one, two, and three dimensions. The reaction algorithm is generic in form, handling an arbitrary number of aqueous and surface complexation as well as mineral dissolution/precipitation. A standardized database is used containing thermodynamic and kinetic data. The code includes advective, dispersive, and diffusive transport.
Bailey, David H.
2009-11-15
The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Among the highly parallel computing architectures required for advanced scientific computation, those designated 'MIMD' and 'SIMD' have yielded the best results to date. The present development status evaluation of such architectures shown neither to have attained a decisive advantage in most near-homogeneous problems' treatment; in the cases of problems involving numerous dissimilar parts, however, such currently speculative architectures as 'neural networks' or 'data flow' machines may be entailed. Data flow computers are the most practical form of MIMD fine-grained parallel computers yet conceived; they automatically solve the problem of assigning virtual processors to the real processors in the machine.
Adaptive parallel logic networks
NASA Technical Reports Server (NTRS)
Martinez, Tony R.; Vidal, Jacques J.
1988-01-01
Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
ERIC Educational Resources Information Center
Friedlander, Alex; And Others
1982-01-01
Several methods of numerical mappings other than the usual cartesian coordinate system are considered. Some examples using parallel axes representation, which are seen to lead to aesthetically pleasing or interesting configurations, are presented. Exercises with alternative representations can stimulate pupil imagination and exploration in…
Foster, I.; Tuecke, S.
1993-01-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.
Parallel Dislocation Simulator
2006-10-30
ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.
Massively parallel processor computer
NASA Technical Reports Server (NTRS)
Fung, L. W. (Inventor)
1983-01-01
An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
High performance parallel architectures
Anderson, R.E. )
1989-09-01
In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.
Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan
2010-01-01
We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.
Parallel hierarchical radiosity rendering
Carter, M.
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Parallel hierarchical global illumination
Snell, Quinn O.
1997-10-08
Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
Parallel Multigrid Equation Solver
Adams, Mark
2001-09-07
Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.
Adapting implicit methods to parallel processors
Reeves, L.; McMillin, B.; Okunbor, D.; Riggins, D.
1994-12-31
When numerically solving many types of partial differential equations, it is advantageous to use implicit methods because of their better stability and more flexible parameter choice, (e.g. larger time steps). However, since implicit methods usually require simultaneous knowledge of the entire computational domain, these methods axe difficult to implement directly on distributed memory parallel processors. This leads to infrequent use of implicit methods on parallel/distributed systems. The usual implementation of implicit methods is inefficient due to the nature of parallel systems where it is common to take the computational domain and distribute the grid points over the processors so as to maintain a relatively even workload per processor. This creates a problem at the locations in the domain where adjacent points are not on the same processor. In order for the values at these points to be calculated, messages have to be exchanged between the corresponding processors. Without special adaptation, this will result in idle processors during part of the computation, and as the number of idle processors increases, the lower the effective speed improvement by using a parallel processor.
Extendability of parallel sections in vector bundles
NASA Astrophysics Data System (ADS)
Kirschner, Tim
2016-01-01
I address the following question: Given a differentiable manifold M, what are the open subsets U of M such that, for all vector bundles E over M and all linear connections ∇ on E, any ∇-parallel section in E defined on U extends to a ∇-parallel section in E defined on M? For simply connected manifolds M (among others) I describe the entirety of all such sets U which are, in addition, the complement of a C1 submanifold, boundary allowed, of M. This delivers a partial positive answer to a problem posed by Antonio J. Di Scala and Gianni Manno (2014). Furthermore, in case M is an open submanifold of Rn, n ≥ 2, I prove that the complement of U in M, not required to be a submanifold now, can have arbitrarily large n-dimensional Lebesgue measure.
Parallel Anisotropic Tetrahedral Adaptation
NASA Technical Reports Server (NTRS)
Park, Michael A.; Darmofal, David L.
2008-01-01
An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.
Parallel Subconvolution Filtering Architectures
NASA Technical Reports Server (NTRS)
Gray, Andrew A.
2003-01-01
These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.
Parallel multilevel preconditioners
Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.
1989-01-01
In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Homology, convergence and parallelism
Ghiselin, Michael T.
2016-01-01
Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721
Parallel unstructured grid generation
NASA Technical Reports Server (NTRS)
Loehner, Rainald; Camberos, Jose; Merriam, Marshal
1991-01-01
A parallel unstructured grid generation algorithm is presented and implemented on the Hypercube. Different processor hierarchies are discussed, and the appropraite hierarchies for mesh generation and mesh smoothing are selected. A domain-splitting algorithm for unstructured grids which tries to minimize the surface-to-volume ratio of each subdomain is described. This splitting algorithm is employed both for grid generation and grid smoothing. Results obtained on the Hypercube demonstrate the effectiveness of the algorithms developed.
2013-09-01
C en te r Paul R. Eller , Jing-Ru C. Cheng, Aaron R. Byrd, Charles W. Downer, and Nawa Pradhan September 2013 Approved for public release...Program ERDC TR-13-8 September 2013 Development of Parallel GSSHA Paul R. Eller and Jing-Ru C. Cheng Information Technology Laboratory US Army Engineer...5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Paul Eller , Ruth Cheng, Aaron Byrd, Chuck Downer, and Nawa Pradhan 5d. PROJECT NUMBER
Xyce parallel electronic simulator.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Shendure, Jay; Fields, Stanley
2016-06-01
Human genetics has historically depended on the identification of individuals whose natural genetic variation underlies an observable trait or disease risk. Here we argue that new technologies now augment this historical approach by allowing the use of massively parallel assays in model systems to measure the functional effects of genetic variation in many human genes. These studies will help establish the disease risk of both observed and potential genetic variants and to overcome the problem of "variants of uncertain significance."
Implementation of Parallel Algorithms
1993-06-30
their socia ’ relations or to achieve some goals. For example, we define a pair-wise force law of i epulsion and attraction for a group of identical...quantization based compression schemes. Photo-refractive crystals, which provide high density recording in real time, are used as our holographic media . The...of Parallel Algorithms (J. Reif, ed.). Kluwer Academic Pu’ ishers, 1993. (4) "A Dynamic Separator Algorithm", D. Armon and J. Reif. To appear in
Trajectory optimization using parallel shooting method on parallel computer
Wirthman, D.J.; Park, S.Y.; Vadali, S.R.
1995-03-01
The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.
Evidence for parallel elongated structures in the mesosphere
NASA Technical Reports Server (NTRS)
Adams, G. W.; Brosnahan, J. W.; Walden, D. C.
1983-01-01
The physical cause of partial reflection from the mesosphere is of interest. Data are presented from an image-forming radar at Brighton, Colorado, that suggest that some of the radar scattering is caused by parallel elongated structures lying almost directly overhead. Possible physical sources for such structures include gravity waves and roll vortices.
The Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.
Asynchronous interpretation of parallel microprograms
Bandman, O.L.
1984-03-01
In this article, the authors demonstrate how to pass from a given synchronous interpretation of a parallel microprogram to an equivalent asynchronous interpretation, and investigate the cost associated with the rejection of external synchronization in parallel microprogram structures.
Status of TRANSP Parallel Services
NASA Astrophysics Data System (ADS)
Indireshkumar, K.; Andre, Robert; McCune, Douglas; Randerson, Lewis
2006-10-01
The PPPL TRANSP code suite has been used successfully over many years to carry out time dependent simulations of tokamak plasmas. However, accurately modeling certain phenomena such as RF heating and fast ion behavior using TRANSP requires extensive computational power and will benefit from parallelization. Parallelizing all of TRANSP is not required and parts will run sequentially while other parts run parallelized. To efficiently use a site's parallel services, the parallelized TRANSP modules are deployed to a shared ``parallel service'' on a separate cluster. The PPPL Monte Carlo fast ion module NUBEAM and the MIT RF module TORIC are the first TRANSP modules to be so deployed. This poster will show the performance scaling of these modules within the parallel server. Communications between the serial client and the parallel server will be described in detail, and measurements of startup and communications overhead will be shown. Physics modeling benefits for TRANSP users will be assessed.
Resistor Combinations for Parallel Circuits.
ERIC Educational Resources Information Center
McTernan, James P.
1978-01-01
To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)
NASA Astrophysics Data System (ADS)
Olmedo, Oscar; Zhang, J.
2010-05-01
Flux ropes are now generally accepted to be the magnetic configuration of Coronal Mass Ejections (CMEs), which may be formed prior or during solar eruptions. In this study, we model the flux rope as a current-carrying partial torus loop with its two footpoints anchored in the photosphere, and investigate its instability in the context of the torus instability (TI). Previous studies on TI have focused on the configuration of a circular torus and revealed the existence of a critical decay index. Our study reveals that the critical index is a function of the fractional number of the partial torus, defined by the ratio between the arc length of the partial torus above the photosphere and the circumference of a circular torus of equal radius. We refer to this finding the partial torus instability (PTI). It is found that a partial torus with a smaller fractional number has a smaller critical index, thus requiring a more gradually decreasing magnetic field to stabilize the flux rope. On the other hand, the partial torus with a larger fractional number has a larger critical index. In the limit of a circular torus when the fractional number approaches one, the critical index goes to a maximum value that depends on the distribution of the external magnetic field. We demonstrate that the partial torus instability helps us to understand the confinement, growth, and eventual eruption of a flux rope CME.
Parallel Debugging Using Graphical Views
1988-03-01
Voyeur , a prototype system for creating graphical views of parallel programs, provid(s a cost-effective way to construct such views for any parallel...programming system. We illustrate Voyeur by discussing four views created for debugging Poker programs. One is a vteneral trace facility for any Poker...Graphical views are essential for debugging parallel programs because of the large quan- tity of state information contained in parallel programs. Voyeur
Parallel Pascal - An extended Pascal for parallel computers
NASA Technical Reports Server (NTRS)
Reeves, A. P.
1984-01-01
Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.
CSM parallel structural methods research
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1989-01-01
Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.
Roo: A parallel theorem prover
Lusk, E.L.; McCune, W.W.; Slaney, J.K.
1991-11-01
We describe a parallel theorem prover based on the Argonne theorem-proving system OTTER. The parallel system, called Roo, runs on shared-memory multiprocessors such as the Sequent Symmetry. We explain the parallel algorithm used and give performance results that demonstrate near-linear speedups on large problems.
Parallel Eclipse Project Checkout
NASA Technical Reports Server (NTRS)
Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.
2011-01-01
Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Partial knee replacement - slideshow
... page: //medlineplus.gov/ency/presentations/100225.htm Partial knee replacement - series—Normal anatomy To use the sharing ... A.M. Editorial team. Related MedlinePlus Health Topics Knee Replacement A.D.A.M., Inc. is accredited ...
Twisted partially pure spinors
NASA Astrophysics Data System (ADS)
Herrera, Rafael; Tellez, Ivan
2016-08-01
Motivated by the relationship between orthogonal complex structures and pure spinors, we define twisted partially pure spinors in order to characterize spinorially subspaces of Euclidean space endowed with a complex structure.
Parallelizing quantum circuit synthesis
NASA Astrophysics Data System (ADS)
Di Matteo, Olivia; Mosca, Michele
2016-03-01
Quantum circuit synthesis is the process in which an arbitrary unitary operation is decomposed into a sequence of gates from a universal set, typically one which a quantum computer can implement both efficiently and fault-tolerantly. As physical implementations of quantum computers improve, the need is growing for tools that can effectively synthesize components of the circuits and algorithms they will run. Existing algorithms for exact, multi-qubit circuit synthesis scale exponentially in the number of qubits and circuit depth, leaving synthesis intractable for circuits on more than a handful of qubits. Even modest improvements in circuit synthesis procedures may lead to significant advances, pushing forward the boundaries of not only the size of solvable circuit synthesis problems, but also in what can be realized physically as a result of having more efficient circuits. We present a method for quantum circuit synthesis using deterministic walks. Also termed pseudorandom walks, these are walks in which once a starting point is chosen, its path is completely determined. We apply our method to construct a parallel framework for circuit synthesis, and implement one such version performing optimal T-count synthesis over the Clifford+T gate set. We use our software to present examples where parallelization offers a significant speedup on the runtime, as well as directly confirm that the 4-qubit 1-bit full adder has optimal T-count 7 and T-depth 3.
Parallel ptychographic reconstruction
Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris
2014-01-01
Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174
Applied Parallel Metadata Indexing
Jacobi, Michael R
2012-08-01
The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.
Partially coherent nonparaxial beams.
Duan, Kailiang; Lü, Baida
2004-04-15
The concept of a partially coherent nonparaxial beam is proposed. A closed-form expression for the propagation of nonparaxial Gaussian Schell model (GSM) beams in free space is derived and applied to study the propagation properties of nonparaxial GSM beams. It is shown that for partially coherent nonparaxial beams a new parameter f(sigma) has to be introduced, which together with the parameter f, determines the beam nonparaxiality.
Olmedo, Oscar; Zhang Jie
2010-07-20
Flux ropes are now generally accepted to be the magnetic configuration of coronal mass ejections (CMEs), which may be formed prior to or during solar eruptions. In this study, we model the flux rope as a current-carrying partial torus loop with its two footpoints anchored in the photosphere, and investigate its stability in the context of the torus instability (TI). Previous studies on TI have focused on the configuration of a circular torus and revealed the existence of a critical decay index of the overlying constraining magnetic field. Our study reveals that the critical index is a function of the fractional number of the partial torus, defined by the ratio between the arc length of the partial torus above the photosphere and the circumference of a circular torus of equal radius. We refer to this finding as the partial torus instability (PTI). It is found that a partial torus with a smaller fractional number has a smaller critical index, thus requiring a more gradually decreasing magnetic field to stabilize the flux rope. On the other hand, a partial torus with a larger fractional number has a larger critical index. In the limit of a circular torus when the fractional number approaches 1, the critical index goes to a maximum value. We demonstrate that the PTI helps us to understand the confinement, growth, and eventual eruption of a flux-rope CME.
NASA Astrophysics Data System (ADS)
Olmedo, Oscar; Zhang, Jie
2010-07-01
Flux ropes are now generally accepted to be the magnetic configuration of coronal mass ejections (CMEs), which may be formed prior to or during solar eruptions. In this study, we model the flux rope as a current-carrying partial torus loop with its two footpoints anchored in the photosphere, and investigate its stability in the context of the torus instability (TI). Previous studies on TI have focused on the configuration of a circular torus and revealed the existence of a critical decay index of the overlying constraining magnetic field. Our study reveals that the critical index is a function of the fractional number of the partial torus, defined by the ratio between the arc length of the partial torus above the photosphere and the circumference of a circular torus of equal radius. We refer to this finding as the partial torus instability (PTI). It is found that a partial torus with a smaller fractional number has a smaller critical index, thus requiring a more gradually decreasing magnetic field to stabilize the flux rope. On the other hand, a partial torus with a larger fractional number has a larger critical index. In the limit of a circular torus when the fractional number approaches 1, the critical index goes to a maximum value. We demonstrate that the PTI helps us to understand the confinement, growth, and eventual eruption of a flux-rope CME.
A systolic array parallelizing compiler
Tseng, P.S. )
1990-01-01
This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.
DeHart, Mark D; Williams, Mark L; Bowman, Stephen M
2010-01-01
The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement
Toward Parallel Document Clustering
Mogill, Jace A.; Haglin, David J.
2011-09-01
A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.
Parallel Polarization State Generation
NASA Astrophysics Data System (ADS)
She, Alan; Capasso, Federico
2016-05-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Parallel Polarization State Generation
She, Alan; Capasso, Federico
2016-01-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813
A parallel programming environment supporting multiple data-parallel modules
Seevers, B.K.; Quinn, M.J. ); Hatcher, P.J. )
1992-10-01
We describe a system that allows programmers to take advantage of both control and data parallelism through multiple intercommunicating data-parallel modules. This programming environment extends C-type stream I/O to include intermodule communication channels. The progammer writes each module as a separate data-parallel program, then develops a channel linker specification describing how to connect the modules together. A channel linker we have developed loads the separate modules on the parallel machine and binds the communication channels together as specified. We present performance data that demonstrates a mixed control- and data-parallel solution can yield better performance than a strictly data-parallel solution. The system described currently runs on the Intel iWarp multicomputer.
Parallel imaging microfluidic cytometer.
Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching
2011-01-01
By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take.
Oxygen partial pressure sensor
Dees, D.W.
1994-09-06
A method for detecting oxygen partial pressure and an oxygen partial pressure sensor are provided. The method for measuring oxygen partial pressure includes contacting oxygen to a solid oxide electrolyte and measuring the subsequent change in electrical conductivity of the solid oxide electrolyte. A solid oxide electrolyte is utilized that contacts both a porous electrode and a nonporous electrode. The electrical conductivity of the solid oxide electrolyte is affected when oxygen from an exhaust stream permeates through the porous electrode to establish an equilibrium of oxygen anions in the electrolyte, thereby displacing electrons throughout the electrolyte to form an electron gradient. By adapting the two electrodes to sense a voltage potential between them, the change in electrolyte conductivity due to oxygen presence can be measured. 1 fig.
Oxygen partial pressure sensor
Dees, Dennis W.
1994-01-01
A method for detecting oxygen partial pressure and an oxygen partial pressure sensor are provided. The method for measuring oxygen partial pressure includes contacting oxygen to a solid oxide electrolyte and measuring the subsequent change in electrical conductivity of the solid oxide electrolyte. A solid oxide electrolyte is utilized that contacts both a porous electrode and a nonporous electrode. The electrical conductivity of the solid oxide electrolyte is affected when oxygen from an exhaust stream permeates through the porous electrode to establish an equilibrium of oxygen anions in the electrolyte, thereby displacing electrons throughout the electrolyte to form an electron gradient. By adapting the two electrodes to sense a voltage potential between them, the change in electrolyte conductivity due to oxygen presence can be measured.
Methanol partial oxidation reformer
Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael
1999-01-01
A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.
Methanol partial oxidation reformer
Ahmed, S.; Kumar, R.; Krumpelt, M.
1999-08-17
A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.
Methanol partial oxidation reformer
Ahmed, S.; Kumar, R.; Krumpelt, M.
1999-08-24
A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.
Methanol partial oxidation reformer
Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael
2001-01-01
A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.
Partial Arc Curvilinear Direct Drive Servomotor
NASA Technical Reports Server (NTRS)
Sun, Xiuhong (Inventor)
2014-01-01
A partial arc servomotor assembly having a curvilinear U-channel with two parallel rare earth permanent magnet plates facing each other and a pivoted ironless three phase coil armature winding moves between the plates. An encoder read head is fixed to a mounting plate above the coil armature winding and a curvilinear encoder scale is curved to be co-axis with the curvilinear U-channel permanent magnet track formed by the permanent magnet plates. Driven by a set of miniaturized power electronics devices closely looped with a positioning feedback encoder, the angular position and velocity of the pivoted payload is programmable and precisely controlled.
Partially orthogonal resonators for magnetic resonance imaging
Chacon-Caldera, Jorge; Malzacher, Matthias; Schad, Lothar R.
2017-01-01
Resonators for signal reception in magnetic resonance are traditionally planar to restrict coil material and avoid coil losses. Here, we present a novel concept to model resonators partially in a plane with maximum sensitivity to the magnetic resonance signal and partially in an orthogonal plane with reduced signal sensitivity. Thus, properties of individual elements in coil arrays can be modified to optimize physical planar space and increase the sensitivity of the overall array. A particular case of the concept is implemented to decrease H-field destructive interferences in planar concentric in-phase arrays. An increase in signal to noise ratio of approximately 20% was achieved with two resonators placed over approximately the same planar area compared to common approaches at a target depth of 10 cm at 3 Tesla. Improved parallel imaging performance of this configuration is also demonstrated. The concept can be further used to increase coil density. PMID:28186135
Partially orthogonal resonators for magnetic resonance imaging
NASA Astrophysics Data System (ADS)
Chacon-Caldera, Jorge; Malzacher, Matthias; Schad, Lothar R.
2017-02-01
Resonators for signal reception in magnetic resonance are traditionally planar to restrict coil material and avoid coil losses. Here, we present a novel concept to model resonators partially in a plane with maximum sensitivity to the magnetic resonance signal and partially in an orthogonal plane with reduced signal sensitivity. Thus, properties of individual elements in coil arrays can be modified to optimize physical planar space and increase the sensitivity of the overall array. A particular case of the concept is implemented to decrease H-field destructive interferences in planar concentric in-phase arrays. An increase in signal to noise ratio of approximately 20% was achieved with two resonators placed over approximately the same planar area compared to common approaches at a target depth of 10 cm at 3 Tesla. Improved parallel imaging performance of this configuration is also demonstrated. The concept can be further used to increase coil density.
Partially strong WW scattering
Cheung Kingman; Chiang Chengwei; Yuan Tzuchiang
2008-09-01
What if only a light Higgs boson is discovered at the CERN LHC? Conventional wisdom tells us that the scattering of longitudinal weak gauge bosons would not grow strong at high energies. However, this is generally not true. In some composite models or general two-Higgs-doublet models, the presence of a light Higgs boson does not guarantee complete unitarization of the WW scattering. After partial unitarization by the light Higgs boson, the WW scattering becomes strongly interacting until it hits one or more heavier Higgs bosons or other strong dynamics. We analyze how LHC experiments can reveal this interesting possibility of partially strong WW scattering.
Parallel processor engine model program
NASA Technical Reports Server (NTRS)
Mclaughlin, P.
1984-01-01
The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Lau, Sonie
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Parallel Programming in the Age of Ubiquitous Parallelism
NASA Astrophysics Data System (ADS)
Pingali, Keshav
2014-04-01
Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs
Trajectories in parallel optics.
Klapp, Iftach; Sochen, Nir; Mendlovic, David
2011-10-01
In our previous work we showed the ability to improve the optical system's matrix condition by optical design, thereby improving its robustness to noise. It was shown that by using singular value decomposition, a target point-spread function (PSF) matrix can be defined for an auxiliary optical system, which works parallel to the original system to achieve such an improvement. In this paper, after briefly introducing the all optics implementation of the auxiliary system, we show a method to decompose the target PSF matrix. This is done through a series of shifted responses of auxiliary optics (named trajectories), where a complicated hardware filter is replaced by postprocessing. This process manipulates the pixel confined PSF response of simple auxiliary optics, which in turn creates an auxiliary system with the required PSF matrix. This method is simulated on two space variant systems and reduces their system condition number from 18,598 to 197 and from 87,640 to 5.75, respectively. We perform a study of the latter result and show significant improvement in image restoration performance, in comparison to a system without auxiliary optics and to other previously suggested hybrid solutions. Image restoration results show that in a range of low signal-to-noise ratio values, the trajectories method gives a significant advantage over alternative approaches. A third space invariant study case is explored only briefly, and we present a significant improvement in the matrix condition number from 1.9160e+013 to 34,526.
High Performance Parallel Architectures
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek; Kaewpijit, Sinthop
1998-01-01
Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.
NASA Technical Reports Server (NTRS)
Title, A. M. (Inventor)
1978-01-01
A birefringent filter module comprises, in seriatum. (1) an entrance polarizer, (2) a first birefringent crystal responsive to optical energy exiting the entrance polarizer, (3) a partial polarizer responsive to optical energy exiting the first polarizer, (4) a second birefringent crystal responsive to optical energy exiting the partial polarizer, and (5) an exit polarizer. The first and second birefringent crystals have fast axes disposed + or -45 deg from the high transmitivity direction of the partial polarizer. Preferably, the second crystal has a length 1/2 that of the first crystal and the high transmitivity direction of the partial polarizer is nine times as great as the low transmitivity direction. To provide tuning, the polarizations of the energy entering the first crystal and leaving the second crystal are varied by either rotating the entrance and exit polarizers, or by sandwiching the entrance and exit polarizers between pairs of half wave plates that are rotated relative to the polarizers. A plurality of the filter modules may be cascaded.
Dilemmas of partial cooperation.
Stark, Hans-Ulrich
2010-08-01
Related to the often applied cooperation models of social dilemmas, we deal with scenarios in which defection dominates cooperation, but an intermediate fraction of cooperators, that is, "partial cooperation," would maximize the overall performance of a group of individuals. Of course, such a solution comes at the expense of cooperators that do not profit from the overall maximum. However, because there are mechanisms accounting for mutual benefits after repeated interactions or through evolutionary mechanisms, such situations can constitute "dilemmas" of partial cooperation. Among the 12 ordinally distinct, symmetrical 2 x 2 games, three (barely considered) variants are correspondents of such dilemmas. Whereas some previous studies investigated particular instances of such games, we here provide the unifying framework and concisely relate it to the broad literature on cooperation in social dilemmas. Complementing our argumentation, we study the evolution of partial cooperation by deriving the respective conditions under which coexistence of cooperators and defectors, that is, partial cooperation, can be a stable outcome of evolutionary dynamics in these scenarios. Finally, we discuss the relevance of such models for research on the large biodiversity and variation in cooperative efforts both in biological and social systems.
Parallel Computational Protein Design
Zhou, Yichao; Donald, Bruce R.; Zeng, Jianyang
2016-01-01
Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab [1] to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE [2] and DEEPer [3] to also consider continuous backbone and side-chain flexibility. PMID:27914056
A Parallel Particle Swarm Optimizer
2003-01-01
by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based...concurrent computation. The parallelization of the Particle Swarm Optimization (PSO) algorithm is detailed and its performance and characteristics demonstrated for the biomechanical system identification problem as example.
The Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Parallel contingency statistics with Titan.
Thompson, David C.; Pebay, Philippe Pierre
2009-09-01
This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.
Partially coherent ultrafast spectrography
Bourassin-Bouchet, C.; Couprie, M.-E.
2015-01-01
Modern ultrafast metrology relies on the postulate that the pulse to be measured is fully coherent, that is, that it can be completely described by its spectrum and spectral phase. However, synthesizing fully coherent pulses is not always possible in practice, especially in the domain of emerging ultrashort X-ray sources where temporal metrology is strongly needed. Here we demonstrate how frequency-resolved optical gating (FROG), the first and one of the most widespread techniques for pulse characterization, can be adapted to measure partially coherent pulses even down to the attosecond timescale. No modification of experimental apparatuses is required; only the processing of the measurement changes. To do so, we take our inspiration from other branches of physics where partial coherence is routinely dealt with, such as quantum optics and coherent diffractive imaging. This will have important and immediate applications, such as enabling the measurement of X-ray free-electron laser pulses despite timing jitter. PMID:25744080
Laparoscopic partial splenic resection.
Uranüs, S; Pfeifer, J; Schauer, C; Kronberger, L; Rabl, H; Ranftl, G; Hauser, H; Bahadori, K
1995-04-01
Twenty domestic pigs with an average weight of 30 kg were subjected to laparoscopic partial splenic resection with the aim of determining the feasibility, reliability, and safety of this procedure. Unlike the human spleen, the pig spleen is perpendicular to the body's long axis, and it is long and slender. The parenchyma was severed through the middle third, where the organ is thickest. An 18-mm trocar with a 60-mm Endopath linear cutter was used for the resection. The tissue was removed with a 33-mm trocar. The operation was successfully concluded in all animals. No capsule tears occurred as a result of applying the stapler. Optimal hemostasis was achieved on the resected edges in all animals. Although these findings cannot be extended to human surgery without reservations, we suggest that diagnostic partial resection and minor cyst resections are ideal initial indications for this minimally invasive approach.
Hierarchical partial order ranking.
Carlsen, Lars
2008-09-01
Assessing the potential impact on environmental and human health from the production and use of chemicals or from polluted sites involves a multi-criteria evaluation scheme. A priori several parameters are to address, e.g., production tonnage, specific release scenarios, geographical and site-specific factors in addition to various substance dependent parameters. Further socio-economic factors may be taken into consideration. The number of parameters to be included may well appear to be prohibitive for developing a sensible model. The study introduces hierarchical partial order ranking (HPOR) that remedies this problem. By HPOR the original parameters are initially grouped based on their mutual connection and a set of meta-descriptors is derived representing the ranking corresponding to the single groups of descriptors, respectively. A second partial order ranking is carried out based on the meta-descriptors, the final ranking being disclosed though average ranks. An illustrative example on the prioritization of polluted sites is given.
Partially coherent ultrafast spectrography
NASA Astrophysics Data System (ADS)
Bourassin-Bouchet, C.; Couprie, M.-E.
2015-03-01
Modern ultrafast metrology relies on the postulate that the pulse to be measured is fully coherent, that is, that it can be completely described by its spectrum and spectral phase. However, synthesizing fully coherent pulses is not always possible in practice, especially in the domain of emerging ultrashort X-ray sources where temporal metrology is strongly needed. Here we demonstrate how frequency-resolved optical gating (FROG), the first and one of the most widespread techniques for pulse characterization, can be adapted to measure partially coherent pulses even down to the attosecond timescale. No modification of experimental apparatuses is required; only the processing of the measurement changes. To do so, we take our inspiration from other branches of physics where partial coherence is routinely dealt with, such as quantum optics and coherent diffractive imaging. This will have important and immediate applications, such as enabling the measurement of X-ray free-electron laser pulses despite timing jitter.
Partially integrated exhaust manifold
Hayman, Alan W; Baker, Rodney E
2015-01-20
A partially integrated manifold assembly is disclosed which improves performance, reduces cost and provides efficient packaging of engine components. The partially integrated manifold assembly includes a first leg extending from a first port and terminating at a mounting flange for an exhaust gas control valve. Multiple additional legs (depending on the total number of cylinders) are integrally formed with the cylinder head assembly and extend from the ports of the associated cylinder and terminate at an exit port flange. These additional legs are longer than the first leg such that the exit port flange is spaced apart from the mounting flange. This configuration provides increased packaging space adjacent the first leg for any valving that may be required to control the direction and destination of exhaust flow in recirculation to an EGR valve or downstream to a catalytic converter.
Activated partial thromboplastin time.
Ignjatovic, Vera
2013-01-01
Activated partial thromboplastin time (APTT) is a commonly used coagulation assay that is easy to perform, is affordable, and is therefore performed in most coagulation laboratories, both clinical and research, worldwide. The APTT is based on the principle that in citrated plasma, the addition of a platelet substitute, factor XII activator, and CaCl2 allows for formation of a stable clot. The time required for the formation of a stable clot is recorded in seconds and represents the actual APTT result.
Problem size, parallel architecture and optimal speedup
NASA Technical Reports Server (NTRS)
Nicol, David M.; Willard, Frank H.
1987-01-01
The communication and synchronization overhead inherent in parallel processing can lead to situations where adding processors to the solution method actually increases execution time. Problem type, problem size, and architecture type all affect the optimal number of processors to employ. The numerical solution of an elliptic partial differential equation is examined in order to study the relationship between problem size and architecture. The equation's domain is discretized into n sup 2 grid points which are divided into partitions and mapped onto the individual processor memories. The relationships between grid size, stencil type, partitioning strategy, processor execution time, and communication network type are analytically quantified. In so doing, the optimal number of processors was determined to assign to the solution, and identified (1) the smallest grid size which fully benefits from using all available processors, (2) the leverage on performance given by increasing processor speed or communication network speed, and (3) the suitability of various architectures for large numerical problems.
Laparoscopic partial adrenalectomy.
Ikeda, Y; Takami, H; Tajima, G; Sasaki, Y; Takayama, J; Kurihara, H; Niimi, M
2002-01-01
Since corticosteroids are indispensable hormones, partial or cortical-sparing adrenalectomies may be adopted for the surgical treatment of adrenal diseases. In this article, we describe the technique and results of these procedures. Laparoscopic partial or cortical-sparing adrenalectomy has been performed in 10 patients. Seven cases had an aldosterone-producing adenoma (APA) and three had a pheochromocytoma. Three cases with an APA and a case with a pheochromocytoma had tumors located far from the adrenal central vein, and the vein could be preserved. Four cases with an APA and two with a pheochromocytoma had tumors located close to the adrenal central vein, and it was necessary to section the central vein to resect them. All endoscopic procedures were performed successfully. There were no postoperative complications. At follow-up, adrenal 131I-adosterol scintigrams showed the preservation of remnant adrenal function in all patients. Laparoscopic partial or cortical-sparing adrenal surgery was safely performed, and adrenal function was preserved irrespective of whether the adrenal central vein could be preserved or not. We consider this to be a useful operative technique for selected cases.
Parallel NPARC: Implementation and Performance
NASA Technical Reports Server (NTRS)
Townsend, S. E.
1996-01-01
Version 3 of the NPARC Navier-Stokes code includes support for large-grain (block level) parallelism using explicit message passing between a heterogeneous collection of computers. This capability has the potential for significant performance gains, depending upon the block data distribution. The parallel implementation uses a master/worker arrangement of processes. The master process assigns blocks to workers, controls worker actions, and provides remote file access for the workers. The processes communicate via explicit message passing using an interface library which provides portability to a number of message passing libraries, such as PVM (Parallel Virtual Machine). A Bourne shell script is used to simplify the task of selecting hosts, starting processes, retrieving remote files, and terminating a computation. This script also provides a simple form of fault tolerance. An analysis of the computational performance of NPARC is presented, using data sets from an F/A-18 inlet study and a Rocket Based Combined Cycle Engine analysis. Parallel speedup and overall computational efficiency were obtained for various NPARC run parameters on a cluster of IBM RS6000 workstations. The data show that although NPARC performance compares favorably with the estimated potential parallelism, typical data sets used with previous versions of NPARC will often need to be reblocked for optimum parallel performance. In one of the cases studied, reblocking increased peak parallel speedup from 3.2 to 11.8.
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Lau, Sonie; Yan, Jerry C.
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.
Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library
2015-02-19
ParFELAG is a parallel distributed memory C++ library for numerical upscaling of finite element discretizations. It provides optimal complesity algorithms ro build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured mesh (under the assumption that the topology of the agglomerated entities is correct). Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.
Parallel integer sorting with medium and fine-scale parallelism
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS
F. PETRINI; W. FENG
1999-09-01
We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens; Inglett, Todd Alan
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Parallel Architecture For Robotics Computation
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1990-01-01
Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.
Multigrid on massively parallel architectures
Falgout, R D; Jones, J E
1999-09-17
The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.
Solving unstructured grid problems on massively parallel computers
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1990-01-01
A highly parallel graph mapping technique that enables one to efficiently solve unstructured grid problems on massively parallel computers is presented. Many implicit and explicit methods for solving discretized partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The cost of this communication can negate the high performance promised by massively parallel computing. To eliminate this bottleneck, the graph of the irregular problem is mapped into the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. It is shown that using the heuristic mapping algorithm significantly reduces the communication time compared to a naive assignment of processes to processors.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations
NASA Technical Reports Server (NTRS)
Chrisochoides, Nikos
1995-01-01
We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
IOPA: I/O-aware parallelism adaption for parallel programs
Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei
2017-01-01
With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
Partial Southwest Elevation Mill #5 West (Part 3), Partial ...
Partial Southwest Elevation - Mill #5 West (Part 3), Partial Southwest Elevation - Mill #5 West (with Section of Courtyard) (Parts 1 & 2) - Boott Cotton Mills, John Street at Merrimack River, Lowell, Middlesex County, MA
Appendix E: Parallel Pascal development system
NASA Technical Reports Server (NTRS)
1985-01-01
The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.
Paternalism and partial autonomy.
O'Neill, O
1984-01-01
A contrast is often drawn between standard adult capacities for autonomy, which allow informed consent to be given or withheld, and patients' reduced capacities, which demand paternalistic treatment. But patients may not be radically different from the rest of us, in that all human capacities for autonomous action are limited. An adequate account of paternalism and the role that consent and respect for persons can play in medical and other practice has to be developed within an ethical theory that does not impose an idealised picture of unlimited autonomy but allows for the variable and partial character of actual human autonomy. PMID:6520849
Experts' Understanding of Partial Derivatives Using the Partial Derivative Machine
ERIC Educational Resources Information Center
Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.
2015-01-01
Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of…
Parallel hierarchical method in networks
NASA Astrophysics Data System (ADS)
Malinochka, Olha; Tymchenko, Leonid
2007-09-01
This method of parallel-hierarchical Q-transformation offers new approach to the creation of computing medium - of parallel -hierarchical (PH) networks, being investigated in the form of model of neurolike scheme of data processing [1-5]. The approach has a number of advantages as compared with other methods of formation of neurolike media (for example, already known methods of formation of artificial neural networks). The main advantage of the approach is the usage of multilevel parallel interaction dynamics of information signals at different hierarchy levels of computer networks, that enables to use such known natural features of computations organization as: topographic nature of mapping, simultaneity (parallelism) of signals operation, inlaid cortex, structure, rough hierarchy of the cortex, spatially correlated in time mechanism of perception and training [5].
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
"Feeling" Series and Parallel Resistances.
ERIC Educational Resources Information Center
Morse, Robert A.
1993-01-01
Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)
Demonstrating Forces between Parallel Wires.
ERIC Educational Resources Information Center
Baker, Blane
2000-01-01
Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)
Parallel programming of industrial applications
Heroux, M; Koniges, A; Simon, H
1998-07-21
In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from these applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).
Distinguishing serial and parallel parsing.
Gibson, E; Pearlmutter, N J
2000-03-01
This paper discusses ways of determining whether the human parser is serial maintaining at most, one structural interpretation at each parse state, or whether it is parallel, maintaining more than one structural interpretation in at least some circumstances. We make four points. The first two counterclaims made by Lewis (2000): (1) that the availability of alternative structures should not vary as a function of the disambiguating material in some ranked parallel models; and (2) that parallel models predict a slow down during the ambiguous region for more syntactically ambiguous structures. Our other points concern potential methods for seeking experimental evidence relevant to the serial/parallel question. We discuss effects of the plausibility of a secondary structure in the ambiguous region (Pearlmutter & Mendelsohn, 1999) and suggest examining the distribution of reaction times in the disambiguating region.
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Parallel Algorithms for Image Analysis.
1982-06-01
8217 _ _ _ _ _ _ _ 4. TITLE (aid Subtitle) S. TYPE OF REPORT & PERIOD COVERED PARALLEL ALGORITHMS FOR IMAGE ANALYSIS TECHNICAL 6. PERFORMING O4G. REPORT NUMBER TR-1180...Continue on reverse side it neceesary aid Identlfy by block number) Image processing; image analysis ; parallel processing; cellular computers. 20... IMAGE ANALYSIS TECHNICAL 6. PERFORMING ONG. REPORT NUMBER TR-1180 - 7. AUTHOR(&) S. CONTRACT OR GRANT NUMBER(s) Azriel Rosenfeld AFOSR-77-3271 9
Debugging in a parallel environment
Wasserman, H.J.; Griffin, J.H.
1985-01-01
This paper describes the preliminary results of a project investigating approaches to dynamic debugging in parallel processing systems. Debugging programs in a multiprocessing environment is particularly difficult because of potential errors in synchronization of tasks, data dependencies, sharing of data among tasks, and irreproducibility of specific machine instruction sequences from one job to the next. The basic methodology involved in predicate-based debuggers is given as well as other desirable features of dynamic parallel debugging. 13 refs.
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Architectures for reasoning in parallel
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.
1989-01-01
The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
Efficiency of parallel direct optimization.
Janies, D A; Wheeler, W C
2001-03-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size.
Is Titan Partially Differentiated?
NASA Astrophysics Data System (ADS)
Mitri, G.; Pappalardo, R. T.; Stevenson, D. J.
2009-12-01
The recent measurement of the gravity coefficients from the Radio Doppler data of the Cassini spacecraft has improved our knowledge of the interior structure of Titan (Rappaport et al. 2008 AGU, P21A-1343). The measured gravity field of Titan is dominated by near hydrostatic quadrupole components. We have used the measured gravitational coefficients, thermal models and the hydrostatic equilibrium theory to derive Titan's interior structure. The axial moment of inertia gives us an indication of the degree of the interior differentiation. The inferred axial moment of inertia, calculated using the quadrupole gravitational coefficients and the Radau-Darwin approximation, indicates that Titan is partially differentiated. If Titan is partially differentiated then the interior must avoid melting of the ice during its evolution. This suggests a relatively late formation of Titan to avoid the presence of short-lived radioisotopes (Al-26). This also suggests the onset of convection after accretion to efficiently remove the heat from the interior. The outer layer is likely composed mainly of water in solid phase. Thermal modeling indicates that water could be present also in liquid phase forming a subsurface ocean between an outer ice I shell and a high pressure ice layer. Acknowledgments: This work was conducted at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration.
Furnace brazing under partial vacuum
NASA Technical Reports Server (NTRS)
Mckown, R. D.
1979-01-01
Brazing furnace utilizing partial-vacuum technique reduces tooling requirements and produces better bond. Benefit in that partial vacuum helps to dissociate metal oxides that inhibit metal flow and eliminates heavy tooling required to hold parts together during brazing.
Electrical conductivity anisotropy of partially molten peridotite under shear deformation
NASA Astrophysics Data System (ADS)
Zhang, B.; Yoshino, T.; Yamazaki, D.; Manthilake, G. M.; Katsura, T.
2013-12-01
Recent ocean bottom magnetotelluric investigations have revealed a high-conductivity layer (HCL) with high anisotropy characterized by higher conductivity values in the direction parallel to the plate motion beneath the southern East Pacific Rise (Evans et al., 2005) and beneath the edge of the Cocos plate at the Middle America trench offshore of Nicaragua (Naif et al., 2013). These geophysical observations have been attributed to either hydration (water) of mantle minerals or the presence of partial melt. Currently, aligned partial melt has been regarded as the most preferable candidate for explaining the conductivity anisotropy because of the implausibility of proton conduction (Yoshino et al., 2006). In this study, we report development of the conductivity anisotropy between parallel and normal to shear direction on the shear plane in partial molten peridotite as a function of time and shear strain. Starting samples were pre-synthesized partial molten peridotite, showing homogeneous melt distribution. The partially molten peridotite samples were deformed in simple shear geometry at 1 GPa and 1723 K in a DIA-type apparatus with uniaxial deformation facility. Conductivity difference between parallel and normal to shear direction reached one order, which is equivalent to that observed beneath asthenosphere. In contrast, such anisotropic behavior was not found in the melt-free samples, suggesting that development of the conductivity anisotropy was generated under shear stress. Microstructure of the deformed partial molten peridotite shows partial melt tends to preferentially locate grain boundaries parallel to shear direction, and forms continuously thin melt layer sub-parallel to the shear direction, whereas apparently isolated distribution was observed on the section perpendicular to the shear direction. The resultant melt morphology can be approximated by tube like geometry parallel to the shear direction. This observation suggests that the development of
Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
The economics of parallel trade.
Danzon, P M
1998-03-01
The potential for parallel trade in the European Union (EU) has grown with the accession of low price countries and the harmonisation of registration requirements. Parallel trade implies a conflict between the principle of autonomy of member states to set their own pharmaceutical prices, the principle of free trade and the industrial policy goal of promoting innovative research and development (R&D). Parallel trade in pharmaceuticals does not yield the normal efficiency gains from trade because countries achieve low pharmaceutical prices by aggressive regulation, not through superior efficiency. In fact, parallel trade reduces economic welfare by undermining price differentials between markets. Pharmaceutical R&D is a global joint cost of serving all consumers worldwide; it accounts for roughly 30% of total costs. Optimal (welfare maximising) pricing to cover joint costs (Ramsey pricing) requires setting different prices in different markets, based on inverse demand elasticities. By contrast, parallel trade and regulation based on international price comparisons tend to force price convergence across markets. In response, manufacturers attempt to set a uniform 'euro' price. The primary losers from 'euro' pricing will be consumers in low income countries who will face higher prices or loss of access to new drugs. In the long run, even higher income countries are likely to be worse off with uniform prices, because fewer drugs will be developed. One policy option to preserve price differentials is to exempt on-patent products from parallel trade. An alternative is confidential contracting between individual manufacturers and governments to provide country-specific ex post discounts from the single 'euro' wholesale price, similar to rebates used by managed care in the US. This would preserve differentials in transactions prices even if parallel trade forces convergence of wholesale prices.
Parallel Implicit Algorithms for CFD
NASA Technical Reports Server (NTRS)
Keyes, David E.
1998-01-01
The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.
Parallel, semiparallel, and serial processing of visual hyperacuity
NASA Astrophysics Data System (ADS)
Fahle, Manfred W.
1990-10-01
Humans can discriminate between certain elementary stimulus features in parallel, i.e., simultaneously over the visual field. I present evidence that, in man, vernier rnisalignments in the hyperacuity-range, i.e., below the photoreceptor diameter, can also be detected in parallel. This indicates that the visUal system performs some form of spatial interpolation beyond the photoreceptor spacing simultaneously over the visual field. Vernier offsets are detected in parallel even when orientation cues are masked: deviation from straightness is an elementary feature of visual perception. However, the identification process, that classifies each vernier in a stimulus as being offset to the right (versus to the left) is serial and has to scan the visual field sequentially if orientation cues are masked. Therefore, reaction times and thresholds in vernier acuity tasks increase with the number of verniers presented simultaneously if classification of different features is required. Furthermore, when approaching vernier threshold, simple vernier detection is no longer parallel but becomes partially serial, or semi-parallel.
Partially segmented deformable mirror
Bliss, Erlan S.; Smith, James R.; Salmon, J. Thaddeus; Monjes, Julio A.
1991-01-01
A partially segmented deformable mirror is formed with a mirror plate having a smooth and continuous front surface and a plurality of actuators to its back surface. The back surface is divided into triangular areas which are mutually separated by grooves. The grooves are deep enough to make the plate deformable and the actuators for displacing the mirror plate in the direction normal to its surface are inserted in the grooves at the vertices of the triangular areas. Each actuator includes a transducer supported by a receptacle with outer shells having outer surfaces. The vertices have inner walls which are approximately perpendicular to the mirror surface and make planar contacts with the outer surfaces of the outer shells. The adhesive which is used on these contact surfaces tends to contract when it dries but the outer shells can bend and serve to minimize the tendency of the mirror to warp.
Partially segmented deformable mirror
Bliss, E.S.; Smith, J.R.; Salmon, J.T.; Monjes, J.A.
1991-05-21
A partially segmented deformable mirror is formed with a mirror plate having a smooth and continuous front surface and a plurality of actuators to its back surface. The back surface is divided into triangular areas which are mutually separated by grooves. The grooves are deep enough to make the plate deformable and the actuators for displacing the mirror plate in the direction normal to its surface are inserted in the grooves at the vertices of the triangular areas. Each actuator includes a transducer supported by a receptacle with outer shells having outer surfaces. The vertices have inner walls which are approximately perpendicular to the mirror surface and make planar contacts with the outer surfaces of the outer shells. The adhesive which is used on these contact surfaces tends to contract when it dries but the outer shells can bend and serve to minimize the tendency of the mirror to warp. 5 figures.
Krumpelt, Michael; Ahmed, Shabbir; Kumar, Romesh; Doshi, Rajiv
2000-01-01
A two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion. The dehydrogenation portion is a group VIII metal and the oxide-ion conducting portion is selected from a ceramic oxide crystallizing in the fluorite or perovskite structure. There is also disclosed a method of forming a hydrogen rich gas from a source of hydrocarbon fuel in which the hydrocarbon fuel contacts a two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion at a temperature not less than about 400.degree. C. for a time sufficient to generate the hydrogen rich gas while maintaining CO content less than about 5 volume percent. There is also disclosed a method of forming partially oxidized hydrocarbons from ethanes in which ethane gas contacts a two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion for a time and at a temperature sufficient to form an oxide.
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)
NASA Technical Reports Server (NTRS)
Straeter, T. A.; Markos, A. T.
1975-01-01
A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
Parallelizing Timed Petri Net simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1993-01-01
The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
Robot-assisted partial nephrectomy: Superiority over laparoscopic partial nephrectomy.
Shiroki, Ryoichi; Fukami, Naohiko; Fukaya, Kosuke; Kusaka, Mamoru; Natsume, Takahiro; Ichihara, Takashi; Toyama, Hiroshi
2016-02-01
Nephron-sparing surgery has been proven to positively impact the postoperative quality of life for the treatment of small renal tumors, possibly leading to functional improvements. Laparoscopic partial nephrectomy is still one of the most demanding procedures in urological surgery. Laparoscopic partial nephrectomy sometimes results in extended warm ischemic time and severe complications, such as open conversion, postoperative hemorrhage and urine leakage. Robot-assisted partial nephrectomy exploits the advantages offered by the da Vinci Surgical System to laparoscopic partial nephrectomy, equipped with 3-D vision and a better degree in the freedom of surgical instruments. The introduction of the da Vinci Surgical System made nephron-sparing surgery, specifically robot-assisted partial nephrectomy, safe with promising results, leading to the shortening of warm ischemic time and a reduction in perioperative complications. Even for complex and challenging tumors, robotic assistance is expected to provide the benefit of minimally-invasive surgery with safe and satisfactory renal function. Warm ischemic time is the modifiable factor during robot-assisted partial nephrectomy to affect postoperative kidney function. We analyzed the predictive factors for extended warm ischemic time from our robot-assisted partial nephrectomy series. The surface area of the tumor attached to the kidney parenchyma was shown to significantly affect the extended warm ischemic time during robot-assisted partial nephrectomy. In cases with tumor-attached surface area more than 15 cm(2) , we should consider switching robot-assisted partial nephrectomy to open partial nephrectomy under cold ischemia if it is imperative. In Japan, a nationwide prospective study has been carried out to show the superiority of robot-assisted partial nephrectomy to laparoscopic partial nephrectomy in improving warm ischemic time and complications. By facilitating robotic technology, robot-assisted partial nephrectomy
Visualizing Parallel Computer System Performance
NASA Technical Reports Server (NTRS)
Malony, Allen D.; Reed, Daniel A.
1988-01-01
Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.
Features in Continuous Parallel Coordinates.
Lehmann, Dirk J; Theisel, Holger
2011-12-01
Continuous Parallel Coordinates (CPC) are a contemporary visualization technique in order to combine several scalar fields, given over a common domain. They facilitate a continuous view for parallel coordinates by considering a smooth scalar field instead of a finite number of straight lines. We show that there are feature curves in CPC which appear to be the dominant structures of a CPC. We present methods to extract and classify them and demonstrate their usefulness to enhance the visualization of CPCs. In particular, we show that these feature curves are related to discontinuities in Continuous Scatterplots (CSP). We show this by exploiting a curve-curve duality between parallel and Cartesian coordinates, which is a generalization of the well-known point-line duality. Furthermore, we illustrate the theoretical considerations. Concluding, we discuss relations and aspects of the CPC's/CSP's features concerning the data analysis.
PARAVT: Parallel Voronoi tessellation code
NASA Astrophysics Data System (ADS)
González, R. E.
2016-10-01
In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.
Parallel integrated frame synchronizer chip
NASA Technical Reports Server (NTRS)
Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)
2000-01-01
A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.
Fast data parallel polygon rendering
Ortega, F.A.; Hansen, C.D.
1993-09-01
This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.
Massively Parallel MRI Detector Arrays
Keil, Boris; Wald, Lawrence L
2013-01-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758
Parallel Adaptive Mesh Refinement Library
NASA Technical Reports Server (NTRS)
Mac-Neice, Peter; Olson, Kevin
2005-01-01
Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.
Hybrid parallel programming with MPI and Unified Parallel C.
Dinan, J.; Balaji, P.; Lusk, E.; Sadayappan, P.; Thakur, R.; Mathematics and Computer Science; The Ohio State Univ.
2010-01-01
The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical
Medipix2 parallel readout system
NASA Astrophysics Data System (ADS)
Fanti, V.; Marzeddu, R.; Randaccio, P.
2003-08-01
A fast parallel readout system based on a PCI board has been developed in the framework of the Medipix collaboration. The readout electronics consists of two boards: the motherboard directly interfacing the Medipix2 chip, and the PCI board with digital I/O ports 32 bits wide. The device driver and readout software have been developed at low level in Assembler to allow fast data transfer and image reconstruction. The parallel readout permits a transfer rate up to 64 Mbytes/s. http://medipix.web.cern ch/MEDIPIX/
Gang scheduling a parallel machine
Gorda, B.C.; Brooks, E.D. III.
1991-03-01
Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.
Gang scheduling a parallel machine
Gorda, B.C.; Brooks, E.D. III.
1991-12-01
Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processes. User programs and their gangs of processes are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantum are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory.
The Complexity of Parallel Algorithms,
1985-11-01
Much of this work was done in collaboration with my advisor, Ernst Mayr . He was also supported in part by ONR contract N00014-85-C-0731. F ’. Table...Helinbold and Mayr in their algorithn to compute an optimal two processor schedule [HM2]. One of the promising developments in parallel algorithms is that...lei can be solved by it fast parallel algorithmmmi if the nmlmmmibers are smiall. llehmibold and Mayr JIlM I] have slhowm that. if Ole job timies are
Removable partial denture occlusion.
Ivanhoe, John R; Plummer, Kevin D
2004-07-01
No single occlusal morphology, scheme, or material will successfully treat all patients. Many patients have been treated, both successfully and unsuccessfully, using widely varying theories of occlusion, choices of posterior tooth form, and restorative materials. Therefore, experience has demonstrated that there is no one righ r way to restore the occlusion of all patients. Partially edentulous patients have many and varied needs. Clinicians must understand the healthy physiologic gnathostomatic system and properly diagnose what is or may become pathologic. Henderson [3] stated that the occlusion of the successfully treated patient allows the masticating mechanism to carry out its physiologic functions while the temporomandibular joints, the neuromuscular mechanism, the teeth and their supporting structures remain in a good state of health. Skills in diagnosis and treatment planning are of utmost importance in treating these patients, for whom the clinician's goals are not only an esthetic and functional restoration but also a lasting harmonious state. Perhaps this was best state by DeVan [55] more than 60 years ago in his often-quoted objective. "The patient's fundamental need is the continued meticulous restoration of what is missing, since what is lost is in a sense irretrievably lost." Because it is clear that there is no one method, no one occlusal scheme, or one material that guarantees success for all patients, recommendations for consideration when establishing or reestablishing occlusal schemes have been presented. These recommendations must be used in conjunction with other diagnostic and technical skills.
NASA Astrophysics Data System (ADS)
The evolution of magmas is a topic of considerable importance in geology and geophysics because it affects volcanology, igneous petrology, geothermal energy sources, mantle convection, and the thermaland chemical evolution of the earth. The dynamics and evolution of magmas are strongly affected by the presence of solid crystals that occur either in suspension in liquid or as a rigid porous matrix through which liquid magma can percolate. Such systems are physically complex and difficult to model mathematically. Similar physical situations are encountered by metallurgists who study the solidification of molten alloys, and applied mathematicians have long been interested in such moving boundary problems. Clearly, it would be of mutual benefit to bring together scientists, engineers, and mathematicians with a common interest in such systems. Such a meeting is being organized as a North Atlantic Treaty Organization (NATO) Advanced Research Workshop on the Structure and Dynamics of Partially Solidified Systems, to be held at Stanford University's Fallen Leaf Lodge at Tahoe, Calif., May 12-16, 1986 The invited speakers and their topics are
Partial disassembly of peroxisomes
1985-01-01
Rat liver peroxisomes were subjected to a variety of procedures intended to partially disassemble or damage them; the effects were analyzed by recentrifugation into sucrose gradients, enzyme analyses, electron microscopy, and SDS PAGE. Freezing and thawing or mild sonication released some matrix proteins and produced apparently intact peroxisomal "ghosts" with crystalloid cores and some fuzzy fibrillar content. Vigorous sonication broke open the peroxisomes but the membranes remained associated with cores and fibrillar and amorphous matrix material. The density of both ghosts and more severely damaged peroxisomes was approximately 1.23. Pyrophosphate (pH 9) treatment solubilized the fibrillar content, yielding ghosts that were empty except for cores. Some matrix proteins such as catalase and thiolase readily leak from peroxisomes. Other proteins were identified that remain in mechanically damaged peroxisomes but are neither core nor membrane proteins because they can be released by pyrophosphate treatment. These constitute a class of poorly soluble matrix proteins that appear to correspond to the fibrillar material observed morphologically. All of the peroxisomal beta-oxidation enzymes are located in the matrix, but they vary greatly in how easily they leak out. Palmitoyl coenzyme A synthetase is in the membrane, based on its co-distribution with the 22-kilodalton integral membrane polypeptide. PMID:2989301
Parallel multiscale simulations of a brain aneurysm
NASA Astrophysics Data System (ADS)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
Parallel multiscale simulations of a brain aneurysm.
Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκαr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
[A HPF application to parallelize a 2-D PDE model].
Contreras, Xiómara; Hernández, Emilio
2003-01-01
Many practical numerical applications would require a parallel implementation in order to obtain a satisfactory response in a reasonable amount of time. In this sense, this work shows a parallel implementation of an explicit scheme of finite difference (FD) proposed by Kelly et. al., to solve the Partial Differential Equation (PDE / EDDP) of the Wave Propagation problem in an elastic, homogeneous or heterogeneous, two-dimensional medium. High-Performance-Fortran (HPF) will be used here for this purpose. This report shows measures of time on a PC-Cluster using 1, 2, and 4 processors with different sizes of data grid. In addition, a comparative test is included in which the cluster was initially connected using a Fast-Ethernet card, and then connected by a Myrinet card, using a grid size of 2500 x 2500 in both cases. The execution time achieved with two processors was highly satisfactory for all cases. In analogous conditions, the performance obtained with a Myrinet interconnection was better than the one obtained with a Fast-Ethernet interconnection. The scheme mentioned above has showed an excellent numerical result as it could be seen on the images included in this work. Key words: Partial differential equation, wave equation, explicite finite differences scheme, parallel scheme.
File concepts for parallel I/O
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1989-01-01
The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
Matpar: Parallel Extensions for MATLAB
NASA Technical Reports Server (NTRS)
Springer, P. L.
1998-01-01
Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.
Parallel, Distributed Scripting with Python
Miller, P J
2002-05-24
Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI library gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.
Fast, Massively Parallel Data Processors
NASA Technical Reports Server (NTRS)
Heaton, Robert A.; Blevins, Donald W.; Davis, ED
1994-01-01
Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Optical Interferometric Parallel Data Processor
NASA Technical Reports Server (NTRS)
Breckinridge, J. B.
1987-01-01
Image data processed faster than in present electronic systems. Optical parallel-processing system effectively calculates two-dimensional Fourier transforms in time required by light to travel from plane 1 to plane 8. Coherence interferometer at plane 4 splits light into parts that form double image at plane 6 if projection screen placed there.
Tutorial: Parallel Simulation on Supercomputers
Perumalla, Kalyan S
2012-01-01
This tutorial introduces typical hardware and software characteristics of extant and emerging supercomputing platforms, and presents issues and solutions in executing large-scale parallel discrete event simulation scenarios on such high performance computing systems. Covered topics include synchronization, model organization, example applications, and observed performance from illustrative large-scale runs.
Parallel distributed computing using Python
NASA Astrophysics Data System (ADS)
Dalcin, Lisandro D.; Paz, Rodrigo R.; Kler, Pablo A.; Cosimo, Alejandro
2011-09-01
This work presents two software components aimed to relieve the costs of accessing high-performance parallel computing resources within a Python programming environment: MPI for Python and PETSc for Python. MPI for Python is a general-purpose Python package that provides bindings for the Message Passing Interface (MPI) standard using any back-end MPI implementation. Its facilities allow parallel Python programs to easily exploit multiple processors using the message passing paradigm. PETSc for Python provides access to the Portable, Extensible Toolkit for Scientific Computation (PETSc) libraries. Its facilities allow sequential and parallel Python applications to exploit state of the art algorithms and data structures readily available in PETSc for the solution of large-scale problems in science and engineering. MPI for Python and PETSc for Python are fully integrated to PETSc-FEM, an MPI and PETSc based parallel, multiphysics, finite elements code developed at CIMEC laboratory. This software infrastructure supports research activities related to simulation of fluid flows with applications ranging from the design of microfluidic devices for biochemical analysis to modeling of large-scale stream/aquifer interactions.
Trigonometric Integrals via Partial Fractions
ERIC Educational Resources Information Center
Chen, H.; Fulford, M.
2005-01-01
Parametric differentiation is used to derive the partial fractions decompositions of certain rational functions. Those decompositions enable us to integrate some new combinations of trigonometric functions.
PALM: a Parallel Dynamic Coupler
NASA Astrophysics Data System (ADS)
Thevenin, A.; Morel, T.
2008-12-01
In order to efficiently represent complex systems, numerical modeling has to rely on many physical models at a time: an ocean model coupled with an atmospheric model is at the basis of climate modeling. The continuity of the solution is granted only if these models can constantly exchange information. PALM is a coupler allowing the concurrent execution and the intercommunication of programs not having been especially designed for that. With PALM, the dynamic coupling approach is introduced: a coupled component can be launched and can release computers' resources upon termination at any moment during the simulation. In order to exploit as much as possible computers' possibilities, the PALM coupler handles two levels of parallelism. The first level concerns the components themselves. While managing the resources, PALM allocates the number of processes which are necessary to any coupled component. These models can be parallel programs based on domain decomposition with MPI or applications multithreaded with OpenMP. The second level of parallelism is a task parallelism: one can define a coupling algorithm allowing two or more programs to be executed in parallel. PALM applications are implemented via a Graphical User Interface called PrePALM. In this GUI, the programmer initially defines the coupling algorithm then he describes the actual communications between the models. PALM offers a very high flexibility for testing different coupling techniques and for reaching the best load balance in a high performance computer. The transformation of computational independent code is almost straightforward. The other qualities of PALM are its easy set-up, its flexibility, its performances, the simple updates and evolutions of the coupled application and the many side services and functions that it offers.
Experts' understanding of partial derivatives using the partial derivative machine
NASA Astrophysics Data System (ADS)
Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.
2015-12-01
[This paper is part of the Focused Collection on Upper Division Physics Courses.] Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of students in STEM disciplines. In this paper, we report on an initial study of expert understanding of partial derivatives across three disciplines: physics, engineering, and mathematics. We report on the central research question of how disciplinary experts understand partial derivatives, and how their concept images of partial derivatives differ, with a focus on experimentally measured quantities. Using the partial derivative machine (PDM), we probed expert understanding of partial derivatives in an experimental context without a known functional form. In particular, we investigated which representations were cued by the experts' interactions with the PDM. Whereas the physicists and engineers were quick to use measurements to find a numeric approximation for a derivative, the mathematicians repeatedly returned to speculation as to the functional form; although they were comfortable drawing qualitative conclusions about the system from measurements, they were reluctant to approximate the derivative through measurement. On a theoretical front, we found ways in which existing frameworks for the concept of derivative could be expanded to include numerical approximation.
Vandewalle, S.
1994-12-31
Time-stepping methods for parabolic partial differential equations are essentially sequential. This prohibits the use of massively parallel computers unless the problem on each time-level is very large. This observation has led to the development of algorithms that operate on more than one time-level simultaneously; that is to say, on grids extending in space and in time. The so-called parabolic multigrid methods solve the time-dependent parabolic PDE as if it were a stationary PDE discretized on a space-time grid. The author has investigated the use of multigrid waveform relaxation, an algorithm developed by Lubich and Ostermann. The algorithm is based on a multigrid acceleration of waveform relaxation, a highly concurrent technique for solving large systems of ordinary differential equations. Another method of this class is the time-parallel multigrid method. This method was developed by Hackbusch and was recently subject of further study by Horton. It extends the elliptic multigrid idea to the set of equations that is derived by discretizing a parabolic problem in space and in time.
Bridging the gap between parallel file systems and local file systems : a case study with PVFS.
Gu, P.; Wang, J.; Ross, R.; Mathematics and Computer Science; Univ. of Central Florida
2008-09-01
Parallel I/O plays an increasingly important role in today's data intensive computing applications. While much attention has been paid to parallel read performance, most of this work has focused on the parallel file system, middleware, or application layers, ignoring the potential for improvement through more effective use of local storage. In this paper, we present the design and implementation of segment-structured on-disk data grouping and prefetching (SOGP), a technique that leverages additional local storage to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. Parallel virtual file system (PVFS) is chosen as an example. Our experiments show that an SOGP-enhanced PVFS prototype system can outperform a traditional Linux-Ext3-based PVFS for many applications and benchmarks, in some tests by as much as 230% in terms of I/O bandwidth.
Autocalibrating Tiled Projectors on Piecewise Smooth Vertically Extruded Surfaces.
Sajadi, Behzad; Majumder, Aditi
2011-09-01
In this paper, we present a novel technique to calibrate multiple casually aligned projectors on fiducial-free piecewise smooth vertically extruded surfaces using a single camera. Such surfaces include cylindrical displays and CAVEs, common in immersive virtual reality systems. We impose two priors to the display surface. We assume the surface is a piecewise smooth vertically extruded surface for which the aspect ratio of the rectangle formed by the four corners of the surface is known and the boundary is visible and segmentable. Using these priors, we can estimate the display's 3D geometry and camera extrinsic parameters using a nonlinear optimization technique from a single image without any explicit display to camera correspondences. Using the estimated camera and display properties, the intrinsic and extrinsic parameters of each projector are recovered using a single projected pattern seen by the camera. This in turn is used to register the images on the display from any arbitrary viewpoint making it appropriate for virtual reality systems. The fast convergence and robustness of this method is achieved via a novel dimension reduction technique for camera parameter estimation and a novel deterministic technique for projector property estimation. This simplicity, efficiency, and robustness of our method enable several coveted features for nonplanar projection-based displays. First, it allows fast recalibration in the face of projector, display or camera movements and even change in display shape. Second, this opens up, for the first time, the possibility of allowing multiple projectors to overlap on the corners of the CAVE-a popular immersive VR display system. Finally, this opens up the possibility of easily deploying multiprojector displays on aesthetic novel shapes for edutainment and digital signage applications.
How to: applying and interpreting the SWAT autocalibration tools
Technology Transfer Automated Retrieval System (TEKTRAN)
Watershed-level modelers have expressed a need, through ongoing discussions within the USDA-ARS Conservation Effects Assessment Program and the broader international research community, for a better understanding of uncertainty related to hard-to-measure input parameters and to the remaining interna...
Online camera-gyroscope autocalibration for cell phones.
Jia, Chao; Evans, Brian L
2014-12-01
The gyroscope is playing a key role in helping estimate 3D camera rotation for various vision applications on cell phones, including video stabilization and feature tracking. Successful fusion of gyroscope and camera data requires that the camera, gyroscope, and their relative pose to be calibrated. In addition, the timestamps of gyroscope readings and video frames are usually not well synchronized. Previous paper performed camera-gyroscope calibration and synchronization offline after the entire video sequence has been captured with restrictions on the camera motion, which is unnecessarily restrictive for everyday users to run apps that directly use the gyroscope. In this paper, we propose an online method that estimates all the necessary parameters, whereas a user is capturing video. Our contributions are: 1) simultaneous online camera self-calibration and camera-gyroscope calibration based on an implicit extended Kalman filter and 2) generalization of the multiple-view coplanarity constraint on camera rotation in a rolling shutter camera model for cell phones. The proposed method is able to estimate the needed calibration and synchronization parameters online with all kinds of camera motion and can be embedded in gyro-aided applications, such as video stabilization and feature tracking. Both Monte Carlo simulation and cell phone experiments show that the proposed online calibration and synchronization method converge fast to the ground truth values.
Accuracy of different impression materials in parallel and nonparallel implants
Vojdani, Mahroo; Torabi, Kianoosh; Ansarifard, Elham
2015-01-01
Background: A precise impression is mandatory to obtain passive fit in implant-supported prostheses. The aim of this study was to compare the accuracy of three impression materials in both parallel and nonparallel implant positions. Materials and Methods: In this experimental study, two partial dentate maxillary acrylic models with four implant analogues in canines and lateral incisors areas were used. One model was simulating the parallel condition and the other nonparallel one, in which implants were tilted 30° bucally and 20° in either mesial or distal directions. Thirty stone casts were made from each model using polyether (Impregum), additional silicone (Monopren) and vinyl siloxanether (Identium), with open tray technique. The distortion values in three-dimensions (X, Y and Z-axis) were measured by coordinate measuring machine. Two-way analysis of variance (ANOVA), one-way ANOVA and Tukey tests were used for data analysis (α = 0.05). Results: Under parallel condition, all the materials showed comparable, accurate casts (P = 0.74). In the presence of angulated implants, while Monopren showed more accurate results compared to Impregum (P = 0.01), Identium yielded almost similar results to those produced by Impregum (P = 0.27) and Monopren (P = 0.26). Conclusion: Within the limitations of this study, in parallel conditions, the type of impression material cannot affect the accuracy of the implant impressions; however, in nonparallel conditions, polyvinyl siloxane is shown to be a better choice, followed by vinyl siloxanether and polyether respectively. PMID:26288620
Dynamic Load Balancing Strategies for Parallel Reacting Flow Simulations
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Meneses, Esteban; Givi, Peyman
2014-11-01
Load balancing in parallel computing aims at distributing the work as evenly as possible among the processors. This is a critical issue in the performance of parallel, time accurate, flow simulators. The constraint of time accuracy requires that all processes must be finished with their calculation for a given time step before any process can begin calculation of the next time step. Thus, an irregularly balanced compute load will result in idle time for many processes for each iteration and thus increased walltimes for calculations. Two existing, dynamic load balancing approaches are applied to the simplified case of a partially stirred reactor for methane combustion. The first is Zoltan, a parallel partitioning, load balancing, and data management library developed at the Sandia National Laboratories. The second is Charm++, which is its own machine independent parallel programming system developed at the University of Illinois at Urbana-Champaign. The performance of these two approaches is compared, and the prospects for their application to full 3D, reacting flow solvers is assessed.
Time parallelization of plasma simulations using the parareal algorithm
Samaddar, D.; Houlberg, Wayne A; Berry, Lee A; Elwasif, Wael R; Huysmans, G; Batchelor, Donald B
2011-01-01
Simulation of fusion plasmas involve a broad range of timescales. In magnetically confined plasmas, such as in ITER, the timescale associated with the microturbulence responsible for transport and confinement timescales vary by an order of 10^6 10^9. Simulating this entire range of timescales is currently impossible, even on the most powerful supercomputers available. Space parallelization has so far been the most common approach to solve partial differential equations. Space parallelization alone has led to computational saturation for fluid codes, which means that the walltime for computaion does not linearly decrease with the increasing number of processors used. The application of the parareal algorithm to simulations of fusion plasmas ushers in a new avenue of parallelization, namely temporal parallelization. The algorithm has been successfully applied to plasma turbulence simulations, prior to which it has been applied to other relatively simpler problems. This work explores the extension of the applicability of the parareal algorithm to ITER relevant problems, starting with a diffusion-convection model.
Toward an automated parallel computing environment for geosciences
NASA Astrophysics Data System (ADS)
Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping
2007-08-01
Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.
Task parallelism and high-performance languages
Foster, I.
1996-03-01
The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages.
A generalized parallel replica dynamics
NASA Astrophysics Data System (ADS)
Binder, Andrew; Lelièvre, Tony; Simpson, Gideon
2015-03-01
Metastability is a common obstacle to performing long molecular dynamics simulations. Many numerical methods have been proposed to overcome it. One method is parallel replica dynamics, which relies on the rapid convergence of the underlying stochastic process to a quasi-stationary distribution. Two requirements for applying parallel replica dynamics are knowledge of the time scale on which the process converges to the quasi-stationary distribution and a mechanism for generating samples from this distribution. By combining a Fleming-Viot particle system with convergence diagnostics to simultaneously identify when the process converges while also generating samples, we can address both points. This variation on the algorithm is illustrated with various numerical examples, including those with entropic barriers and the 2D Lennard-Jones cluster of seven atoms.
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Parallel supercomputing with commodity components
Warren, M.S.; Goda, M.P.; Becker, D.J.
1997-09-01
We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10{sup 15} floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.
ASP: a parallel computing technology
NASA Astrophysics Data System (ADS)
Lea, R. M.
1990-09-01
ASP modules constitute the basis of a parallel computing technology platform for the rapid development of a broad range of numeric and symbolic information processing systems. Based on off-the-shelf general-purpose hardware and software modules ASP technology is intended to increase productivity in the development (and competitiveness in the marketing) of cost-effective low-MIMD/high-SIMD Massively Parallel Processor (MPPs). The paper discusses ASP module philosophy and demonstrates how ASP modules can satisfy the market algorithmic architectural and engineering requirements of such MPPs. In particular two specific ASP modules based on VLSI and WSI technologies are studied as case examples of ASP technology the latter reporting 1 TOPS/fl3 1 GOPS/W and 1 MOPS/$ as ball-park figures-of-merit of cost-effectiveness.
Parallel processing spacecraft communication system
NASA Technical Reports Server (NTRS)
Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)
1998-01-01
An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.
A generalized parallel replica dynamics
Binder, Andrew; Lelièvre, Tony; Simpson, Gideon
2015-03-01
Metastability is a common obstacle to performing long molecular dynamics simulations. Many numerical methods have been proposed to overcome it. One method is parallel replica dynamics, which relies on the rapid convergence of the underlying stochastic process to a quasi-stationary distribution. Two requirements for applying parallel replica dynamics are knowledge of the time scale on which the process converges to the quasi-stationary distribution and a mechanism for generating samples from this distribution. By combining a Fleming–Viot particle system with convergence diagnostics to simultaneously identify when the process converges while also generating samples, we can address both points. This variation on the algorithm is illustrated with various numerical examples, including those with entropic barriers and the 2D Lennard-Jones cluster of seven atoms.
Parallel supercomputing with commodity components
NASA Technical Reports Server (NTRS)
Warren, M. S.; Goda, M. P.; Becker, D. J.
1997-01-01
We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10(sup 15) floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.
Parallel multiplex laser feedback interferometry
Zhang, Song; Tan, Yidong; Zhang, Shulian
2013-12-15
We present a parallel multiplex laser feedback interferometer based on spatial multiplexing which avoids the signal crosstalk in the former feedback interferometer. The interferometer outputs two close parallel laser beams, whose frequencies are shifted by two acousto-optic modulators by 2Ω simultaneously. A static reference mirror is inserted into one of the optical paths as the reference optical path. The other beam impinges on the target as the measurement optical path. Phase variations of the two feedback laser beams are simultaneously measured through heterodyne demodulation with two different detectors. Their subtraction accurately reflects the target displacement. Under typical room conditions, experimental results show a resolution of 1.6 nm and accuracy of 7.8 nm within the range of 100 μm.
Parallelism in Manipulator Dynamics. Revision.
1983-12-01
excessive, and a VLSI implementation architecutre is suggested. We indicate possible appli- cations to incorporating dynamical considerations into...Inverse Dynamics problem. It investigates the high degree of parallelism inherent in the computations , and presents two "mathematically exact" formulations...and a 3 b Cases ............. ... 109 5 .9-- i 0. OVERVIEW The Inverse Dynamics problem consists (loosely) of computing the motor torques necessary to
Parallel Symmetric Eigenvalue Problem Solvers
2015-05-01
graduate school. Debugging somebody else’s MPI code is an immensely frustrating experience, but he would regularly stay late at the oce to assist me...cessfully. In addition, I will describe the parallel kernels required by my code . 5 The next sections will describe my Fortran-based implementations of...Sandia’s publicly available Trace- Min code . Each of the methods has its own unique advantages and disadvantages, summarized in table 3.1. In short, I
Parallel Algorithms for Computer Vision.
1987-01-01
73 755 P fiu.LEL ALORITHMS FOR CO PUTER VISIO (U) /MASSACHUSETTS INST OF TECH CRMORIDGE T P00010 ET AL.JAN 8? ETL-0456 DACA7-05-C-8IIO m 7E F/0 1...regularization principles, such as edge detection, stereo , motion, surface interpolation and shape from shading. The basic members of class I are convolution...them in collabo- ration with Thinking Machines Corporation): * Parallel convolution * Zero-crossing detection * Stereo -matching * Surface reconstruction
Lightweight Specifications for Parallel Correctness
2012-12-05
this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204... George Necula Professor David Wessel Fall 2012 1 Abstract Lightweight Specifications for Parallel Correctness by Jacob Samuels Burnim Doctor of Philosophy...enthusiasm and endless flow of ideas, and for his keen research sense. I would also like to thank George Necula for chairing my qualifying exam committee and
National Combustion Code: Parallel Performance
NASA Technical Reports Server (NTRS)
Babrauckas, Theresa
2001-01-01
This report discusses the National Combustion Code (NCC). The NCC is an integrated system of codes for the design and analysis of combustion systems. The advanced features of the NCC meet designers' requirements for model accuracy and turn-around time. The fundamental features at the inception of the NCC were parallel processing and unstructured mesh. The design and performance of the NCC are discussed.
Parallel Algorithms for Computer Vision.
1989-01-01
demonstrated the Vision Machine system processing images and recognizing objects through the inte- gration of several visual cues. The first version of the...achievements. n 2.1 The Vision Machine The overall organization of tie Vision Machine systeliis ased. o parallel processing of tie images by independent...smoothed and made dense by exploiting known constraints within each process (for example., that disparity is smooth). This is the stage of approximation
Parallel strategies for SAR processing
NASA Astrophysics Data System (ADS)
Segoviano, Jesus A.
2004-12-01
This article proposes a series of strategies for improving the computer process of the Synthetic Aperture Radar (SAR) signal treatment, following the three usual lines of action to speed up the execution of any computer program. On the one hand, it is studied the optimization of both, the data structures and the application architecture used on it. On the other hand it is considered a hardware improvement. For the former, they are studied both, the usually employed SAR process data structures, proposing the use of parallel ones and the way the parallelization of the algorithms employed on the process is implemented. Besides, the parallel application architecture classifies processes between fine/coarse grain. These are assigned to individual processors or separated in a division among processors, all of them in their corresponding architectures. For the latter, it is studied the hardware employed on the computer parallel process used in the SAR handling. The improvement here refers to several kinds of platforms in which the SAR process is implemented, shared memory multicomputers, and distributed memory multiprocessors. A comparison between them gives us some guidelines to follow in order to get a maximum throughput with a minimum latency and a maximum effectiveness with a minimum cost, all together with a limited complexness. It is concluded and described, that the approach consisting of the processing of the algorithms in a GNU/Linux environment, together with a Beowulf cluster platform offers, under certain conditions, the best compromise between performance and cost, and promises the major development in the future for the Synthetic Aperture Radar computer power thirsty applications in the next years.
Parallel Power Grid Simulation Toolkit
Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol
2015-09-14
ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.
Parallel processing of genomics data
NASA Astrophysics Data System (ADS)
Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario
2016-10-01
The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Parallelism in integrated fluidic circuits
NASA Astrophysics Data System (ADS)
Bousse, Luc J.; Kopf-Sill, Anne R.; Parce, J. W.
1998-04-01
Many research groups around the world are working on integrated microfluidics. The goal of these projects is to automate and integrate the handling of liquid samples and reagents for measurement and assay procedures in chemistry and biology. Ultimately, it is hoped that this will lead to a revolution in chemical and biological procedures similar to that caused in electronics by the invention of the integrated circuit. The optimal size scale of channels for liquid flow is determined by basic constraints to be somewhere between 10 and 100 micrometers . In larger channels, mixing by diffusion takes too long; in smaller channels, the number of molecules present is so low it makes detection difficult. At Caliper, we are making fluidic systems in glass chips with channels in this size range, based on electroosmotic flow, and fluorescence detection. One application of this technology is rapid assays for drug screening, such as enzyme assays and binding assays. A further challenge in this area is to perform multiple functions on a chip in parallel, without a large increase in the number of inputs and outputs. A first step in this direction is a fluidic serial-to-parallel converter. Fluidic circuits will be shown with the ability to distribute an incoming serial sample stream to multiple parallel channels.
Highly parallel sparse Cholesky factorization
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Parallel Environment for Quantum Computing
NASA Astrophysics Data System (ADS)
Tabakin, Frank; Diaz, Bruno Julia
2009-03-01
To facilitate numerical study of noise and decoherence in QC algorithms,and of the efficacy of error correction schemes, we have developed a Fortran 90 quantum computer simulator with parallel processing capabilities. It permits rapid evaluation of quantum algorithms for a large number of qubits and for various ``noise'' scenarios. State vectors are distributed over many processors, to employ a large number of qubits. Parallel processing is implemented by the Message-Passing Interface protocol. A description of how to spread the wave function components over many processors, along with how to efficiently describe the action of general one- and two-qubit operators on these state vectors will be delineated.Grover's search and Shor's factoring algorithms with noise will be discussed as examples. A major feature of this work is that concurrent versions of the algorithms can be evaluated with each version subject to diverse noise effects, corresponding to solving a stochastic Schrodinger equation. The density matrix for the ensemble of such noise cases is constructed using parallel distribution methods to evaluate its associated entropy. Applications of this powerful tool is made to delineate the stability and correction of QC processes using Hamiltonian based dynamics.
Parallel Markov chain Monte Carlo simulations.
Ren, Ruichao; Orkoulas, G
2007-06-07
With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.
Parallel Markov chain Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Ren, Ruichao; Orkoulas, G.
2007-06-01
With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.
Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring
NASA Technical Reports Server (NTRS)
Padovan, Joe; Kwang, Abel
1994-01-01
This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.
Zhdanov, V. M. Stepanenko, A. A.
2013-12-15
The influence of resonant charge exchange for ion-atom interaction on the viscosity of partially ionized plasma embedded in the magnetic field is investigated. The general system of equations used to derive the viscosity coefficients for an arbitrary plasma component in the 21-moment approximation of Grad’s method is presented. The expressions for the coefficients of total and partial viscosities of a multicomponent partially ionized plasma in the magnetic field are obtained. As an example, the coefficients of the parallel and transverse viscosities for the ionic and neutral components of the partially ionized hydrogen plasma are calculated. It is shown that the account for resonant charge exchange can lead to a substantial change of the parallel and transverse viscosity of the plasma components in the region of low degrees of ionization on the order of 0.1.
Experimental generating the partially coherent and partially polarized electromagnetic source.
Ostrovsky, Andrey S; Rodríguez-Zurita, Gustavo; Meneses-Fabián, Cruz; Olvera-Santamaría, Miguel A; Rickenstorff-Parrao, Carolina
2010-06-07
The technique for generating the partially coherent and partially polarized source starting from the completely coherent and completely polarized laser source is proposed and analyzed. This technique differs from the known ones by the simplicity of its physical realization. The efficiency of the proposed technique is illustrated with the results of physical experiment in which an original technique for characterizing the coherence and polarization properties of the generated source is employed.
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
Lu Liuyan Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.
2009-08-20
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f{sub m}pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
NASA Astrophysics Data System (ADS)
Lu, Liuyan; Lantz, Steven R.; Ren, Zhuyin; Pope, Stephen B.
2009-08-01
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f_mpi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Parallelizing alternating direction implicit solver on GPUs
Technology Transfer Automated Retrieval System (TEKTRAN)
We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...
Implementing clips on a parallel computer
NASA Technical Reports Server (NTRS)
Riley, Gary
1987-01-01
The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.
Parallel molecular dynamics: Communication requirements for massively parallel machines
NASA Astrophysics Data System (ADS)
Taylor, Valerie E.; Stevens, Rick L.; Arnold, Kathryn E.
1995-05-01
Molecular mechanics and dynamics are becoming widely used to perform simulations of molecular systems from large-scale computations of materials to the design and modeling of drug compounds. In this paper we address two major issues: a good decomposition method that can take advantage of future massively parallel processing systems for modest-sized problems in the range of 50,000 atoms and the communication requirements needed to achieve 30 to 40% efficiency on MPPs. We analyzed a scalable benchmark molecular dynamics program executing on the Intel Touchstone Deleta parallelized with an interaction decomposition method. Using a validated analytical performance model of the code, we determined that for an MPP with a four-dimensional mesh topology and 400 MHz processors the communication startup time must be at most 30 clock cycles and the network bandwidth must be at least 2.3 GB/s. This configuration results in 30 to 40% efficiency of the MPP for a problem with 50,000 atoms executing on 50,000 processors.
Force user's manual: A portable, parallel FORTRAN
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.
1990-01-01
The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
The electron signature of parallel electric fields
NASA Astrophysics Data System (ADS)
Burch, J. L.; Gurgiolo, C.; Menietti, J. D.
1990-12-01
Dynamics Explorer I High-Altitude Plasma Instrument electron data are presented. The electron distribution functions have characteristics expected of a region of parallel electric fields. The data are consistent with previous test-particle simulations for observations within parallel electric field regions which indicate that typical hole, bump, and loss-cone electron distributions, which contain evidence for parallel potential differences both above and below the point of observation, are not expected to occur in regions containing actual parallel electric fields.
Debugging Parallel Programs with Instant Replay.
1986-09-01
produce the same results. In this paper we present a general solution for reproducing the execution behavior of parallel programs, termed Instant Replay...Instant Replay on the BBN Butterfly Parallel Processor, and discuss how it can be incorporated into the debugging cycle for parallel programs. This...program often do not produce the same results. In this paper we present a general solution for reproducing the execution behavior of parallel
Parallel machine architecture and compiler design facilities
NASA Technical Reports Server (NTRS)
Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex
1990-01-01
The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.
Global Arrays Parallel Programming Toolkit
Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarría-Miranda, Daniel
2011-01-01
The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving
Partial-Payload Support Structure
NASA Technical Reports Server (NTRS)
Mitchell, R.; Freeman, M.
1984-01-01
Partial-payload support structure (PPSS) is modular, bridge like structure supporting experiments weighing up to 2 tons. PPSS handles such experiments more economically than standard Spacelab pallet system.
High Performance Parallel Computational Nanotechnology
NASA Technical Reports Server (NTRS)
Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to
Partial arthrodeses of the wrist.
Marcuzzi, A; Cristiani, G; Castagnini, L; Castagnetti, C; Caroli, A
1995-01-01
The authors report 16 cases of partial arthrodeses of the wrist for the treatment of Kienboeck's disease, pseudarthrosis of the scaphoid, rotatory subluxation of the scaphoid, rheumatoid arthritis, etc. Based on the good results obtained (76.6%) the authors believe that partial arthrodeses constitute the type of treatment indicated for the treatment of pathologies that involve only some of the carpal bones, and they also emphasize that this type of surgery represents a valid alternative to total arthrodesis of the wrist.
Parallel Computing Using Web Servers and "Servlets".
ERIC Educational Resources Information Center
Lo, Alfred; Bloor, Chris; Choi, Y. K.
2000-01-01
Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism
ERIC Educational Resources Information Center
Agarwal, Mayank
2009-01-01
The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…
Parallel Processing at the High School Level.
ERIC Educational Resources Information Center
Sheary, Kathryn Anne
This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…
Coordination in serial-parallel image processing
NASA Astrophysics Data System (ADS)
Wójcik, Waldemar; Dubovoi, Vladymyr M.; Duda, Marina E.; Romaniuk, Ryszard S.; Yesmakhanova, Laura; Kozbakova, Ainur
2015-12-01
Serial-parallel systems used to convert the image. The control of their work results with the need to solve coordination problem. The paper summarizes the model of coordination of resource allocation in relation to the task of synchronizing parallel processes; the genetic algorithm of coordination developed, its adequacy verified in relation to the process of parallel image processing.
Scalable Parallel Algebraic Multigrid Solvers
Bank, R; Lu, S; Tong, C; Vassilevski, P
2005-03-23
The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.
Parallel Assembly of LIGA Components
Christenson, T.R.; Feddema, J.T.
1999-03-04
In this paper, a prototype robotic workcell for the parallel assembly of LIGA components is described. A Cartesian robot is used to press 386 and 485 micron diameter pins into a LIGA substrate and then place a 3-inch diameter wafer with LIGA gears onto the pins. Upward and downward looking microscopes are used to locate holes in the LIGA substrate, pins to be pressed in the holes, and gears to be placed on the pins. This vision system can locate parts within 3 microns, while the Cartesian manipulator can place the parts within 0.4 microns.
True Shear Parallel Plate Viscometer
NASA Technical Reports Server (NTRS)
Ethridge, Edwin; Kaukler, William
2010-01-01
This viscometer (which can also be used as a rheometer) is designed for use with liquids over a large temperature range. The device consists of horizontally disposed, similarly sized, parallel plates with a precisely known gap. The lower plate is driven laterally with a motor to apply shear to the liquid in the gap. The upper plate is freely suspended from a double-arm pendulum with a sufficiently long radius to reduce height variations during the swing to negligible levels. A sensitive load cell measures the shear force applied by the liquid to the upper plate. Viscosity is measured by taking the ratio of shear stress to shear rate.
Scheduling Tasks In Parallel Processing
NASA Technical Reports Server (NTRS)
Price, Camille C.; Salama, Moktar A.
1989-01-01
Algorithms sought to minimize time and cost of computation. Report describes research on scheduling of computations tasks in system of multiple identical data processors operating in parallel. Computational intractability requires use of suboptimal heuristic algorithms. First algorithm called "list heuristic", variation of classical list scheduling. Second algorithm called "cluster heuristic" applied to tightly coupled tasks and consists of four phases. Third algorithm called "exchange heuristic", iterative-improvement algorithm beginning with initial feasible assignment of tasks to processors and periods of time. Fourth algorithm is iterative one for optimal assignment of tasks and based on concept called "simulated annealing" because of mathematical resemblance to aspects of physical annealing processes.
Heart Fibrillation and Parallel Supercomputers
NASA Technical Reports Server (NTRS)
Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.
1997-01-01
The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.
NASA Astrophysics Data System (ADS)
Galo, J. R.; Albarreal, I. I.; Calzada, M. C.; Cruz, J. L.; Fernández-Cara, E.; Marín, M.
2008-12-01
For the solution of elliptic problems, fractional step methods and in particular alternating directions (ADI) methods are iterative methods where fractional steps are sequential. Therefore, they only accept parallelization at low level. In [T. Lu, P. Neittaanmäki, X.C. Tai, A parallel splitting-up method for partial differential equations and its applications to Navier-Stokes equations, RAIRO Modél. Math. Anal. Numér. 26 (6) (1992) 673-708], Lu et al. proposed a method where the fractional steps can be performed in parallel. We can thus speak of parallel fractional step (PFS) methods and, in particular, simultaneous directions (SDI) methods. In this paper, we perform a detailed analysis of the convergence and optimization of PFS and SDI methods, complementing what was done in [T. Lu, P. Neittaanmäki, X.C. Tai, A parallel splitting-up method for partial differential equations and its applications to Navier-Stokes equations, RAIRO Modél. Math. Anal. Numér. 26 (6) (1992) 673-708]. We describe the behavior of the method and we specify the good choice of the parameters. We also study the efficiency of the parallelization. Some 2D, 3D and high-dimensional tests confirm our results.
Xyce parallel electronic simulator design.
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Parallel job-scheduling algorithms
Rodger, S.H.
1989-01-01
In this thesis, we consider solving job scheduling problems on the CREW PRAM model. We show how to adapt Cole's pipeline merge technique to yield several efficient parallel algorithms for a number of job scheduling problems and one optimal parallel algorithm for the following job scheduling problem: Given a set of n jobs defined by release times, deadlines and processing times, find a schedule that minimizes the maximum lateness of the jobs and allows preemption when the jobs are scheduled to run on one machine. In addition, we present the first NC algorithm for the following job scheduling problem: Given a set of n jobs defined by release times, deadlines and unit processing times, determine if there is a schedule of jobs on one machine, and calculate the schedule if it exists. We identify the notion of a canonical schedule, which is the type of schedule our algorithm computes if there is a schedule. Our algorithm runs in O((log n){sup 2}) time and uses O(n{sup 2}k{sup 2}) processors, where k is the minimum number of distinct offsets of release times or deadlines.
Parallel approach to incorporating face image information into dialogue processing
NASA Astrophysics Data System (ADS)
Ren, Fuji
2000-10-01
There are many kinds of so-called irregular expressions in natural dialogues. Even if the content of a conversation is the same in words, different meanings can be interpreted by a person's feeling or face expression. To have a good understanding of dialogues, it is required in a flexible dialogue processing system to infer the speaker's view properly. However, it is difficult to obtain the meaning of the speaker's sentences in various scenes using traditional methods. In this paper, a new approach for dialogue processing that incorporates information from the speaker's face is presented. We first divide conversation statements into several simple tasks. Second, we process each simple task using an independent processor. Third, we employ some speaker's face information to estimate the view of the speakers to solve ambiguities in dialogues. The approach presented in this paper can work efficiently, because independent processors run in parallel, writing partial results to a shared memory, incorporating partial results at appropriate points, and complementing each other. A parallel algorithm and a method for employing the face information in a dialogue machine translation will be discussed, and some results will be included in this paper.
Partitioning in parallel processing of production systems
Oflazer, K.
1987-01-01
This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpreter with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.
Partial synchronization and partial amplitude death in mesoscale network motifs.
Poel, Winnie; Zakharova, Anna; Schöll, Eckehard
2015-02-01
We study the interplay between network topology and complex space-time patterns and introduce a concept to analytically predict complex patterns in networks of Stuart-Landau oscillators with linear symmetric and instantaneous coupling based solely on the network topology. These patterns consist of partial amplitude death and partial synchronization and are found to exist in large variety for all undirected networks of up to 5 nodes. The underlying concept is proved to be robust with respect to frequency mismatch and can also be extended to larger networks. In addition it directly links the stability of complete in-phase synchronization to only a small subset of topological eigenvalues of a network.
[Parallel PLS algorithm using MapReduce and its aplication in spectral modeling].
Yang, Hui-Hua; Du, Ling-Ling; Li, Ling-Qiao; Tang, Tian-Biao; Guo, Tuo; Liang, Qiong-Lin; Wang, Yi-Ming; Luo, Guo-An
2012-09-01
Partial least squares (PLS) has been widely used in spectral analysis and modeling, and it is computation-intensive and time-demanding when dealing with massive data To solve this problem effectively, a novel parallel PLS using MapReduce is proposed, which consists of two procedures, the parallelization of data standardizing and the parallelization of principal component computing. Using NIR spectral modeling as an example, experiments were conducted on a Hadoop cluster, which is a collection of ordinary computers. The experimental results demonstrate that the parallel PLS algorithm proposed can handle massive spectra, can significantly cut down the modeling time, and gains a basically linear speedup, and can be easily scaled up.
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1987-01-01
The development of numerical methods and software tools for parallel processors can be aided through the use of a hardware test-bed. The test-bed architecture must be flexible enough to support investigations into architecture-algorithm interactions. One way to implement a test-bed is to use a commercial parallel processor. Unfortunately, most commercial parallel processors are fixed in their interconnection and/or processor architecture. In this paper, we describe a modified n cube architecture, called the hypercluster, which is a superset of many other processor and interconnection architectures. The hypercluster is intended to support research into parallel processing of computational fluid and structural mechanics problems which may require a number of different architectural configurations. An example of how a typical partial differential equation solution algorithm maps on to the hypercluster is given.
Domain decomposition methods for the parallel computation of reacting flows
NASA Technical Reports Server (NTRS)
Keyes, David E.
1988-01-01
Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1991-01-01
The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.
Hybrid Optimization Parallel Search PACKage
2009-11-10
HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework provides a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.
Parallel Performance Characterization of Columbia
NASA Technical Reports Server (NTRS)
Biswas, Rupak
2004-01-01
Using a collection of benchmark problems of increasing levels of realism and computational effort, we will characterize the strengths and limitations of the 10,240 processor Columbia system to deliver supercomputing value to application scientists. Scientists need to be able to determine if and how they can utilize Columbia to carry extreme workloads, either in terms of ultra-large applications that cannot be run otherwise (capability), or in terms of very large ensembles of medium-scale applications to populate response matrices (capacity). We select existing application benchmarks that scale from a small number of processors to the entire machine, and that highlight different issues in running supercomputing-calss applicaions, such as the various types of memory access, file I/O, inter- and intra-node communications and parallelization paradigms. http://www.nas.nasa.gov/Software/NPB/
Information hiding in parallel programs
Foster, I.
1992-01-30
A fundamental principle in program design is to isolate difficult or changeable design decisions. Application of this principle to parallel programs requires identification of decisions that are difficult or subject to change, and the development of techniques for hiding these decisions. We experiment with three complex applications, and identify mapping, communication, and scheduling as areas in which decisions are particularly problematic. We develop computational abstractions that hide such decisions, and show that these abstractions can be used to develop elegant solutions to programming problems. In particular, they allow us to encode common structures, such as transforms, reductions, and meshes, as software cells and templates that can reused in different applications. An important characteristic of these structures is that they do not incorporate mapping, communication, or scheduling decisions: these aspects of the design are specified separately, when composing existing structures to form applications. This separation of concerns allows the same cells and templates to be reused in different contexts.
Embodied and Distributed Parallel DJing.
Cappelen, Birgitta; Andersson, Anders-Petter
2016-01-01
Everyone has a right to take part in cultural events and activities, such as music performances and music making. Enforcing that right, within Universal Design, is often limited to a focus on physical access to public areas, hearing aids etc., or groups of persons with special needs performing in traditional ways. The latter might be people with disabilities, being musicians playing traditional instruments, or actors playing theatre. In this paper we focus on the innovative potential of including people with special needs, when creating new cultural activities. In our project RHYME our goal was to create health promoting activities for children with severe disabilities, by developing new musical and multimedia technologies. Because of the users' extreme demands and rich contribution, we ended up creating both a new genre of musical instruments and a new art form. We call this new art form Embodied and Distributed Parallel DJing, and the new genre of instruments for Empowering Multi-Sensorial Things.
Parallel spinors on flat manifolds
NASA Astrophysics Data System (ADS)
Sadowski, Michał
2006-05-01
Let p(M) be the dimension of the vector space of parallel spinors on a closed spin manifold M. We prove that every finite group G is the holonomy group of a closed flat spin manifold M(G) such that p(M(G))>0. If the holonomy group Hol(M) of M is cyclic, then we give an explicit formula for p(M) another than that given in [R.J. Miatello, R.A. Podesta, The spectrum of twisted Dirac operators on compact flat manifolds, Trans. Am. Math. Soc., in press]. We answer the question when p(M)>0 if Hol(M) is a cyclic group of prime order or dimM≤4.
Device for balancing parallel strings
Mashikian, Matthew S.
1985-01-01
A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.
Parallel network simulations with NEURON.
Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L
2006-10-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.
Parallel computing in enterprise modeling.
Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.
2008-08-01
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Designing successful removable partial dentures.
Daher, Tony; Hall, Dan; Goodacre, Charles J
2006-03-01
In today's busy dental offices, removable partial denture design is often abdicated by dentists, both as a result of a lack of experience and consensus of design and because of educational failure on the part of dental schools. The result is delegation of the clinical design process to the lab technician. The lack of clinical data provided to the dental technician jeopardizes the quality of care. This article will focus on a logical and simple approach to this problem, making removable partial denture design simple and predictably achievable. The clinical evidence related to removable partial denture design will be described, along with a checklist to simplify the process and make it practical and applicable to everyday clinical practice.
Partial Priapism Treated with Pentoxifylline
Cooper, Meghan A.; Carrion, Rafael E.; Yang, Christopher
2015-01-01
ABSTRACT Main findings: A 26-year-old man suffering from partial priapism was successfully treated with a regimen including pentoxifylline, a nonspecific phosphodiesterase inhibitor that is often used to conservatively treat Peyronie's disease. Case hypothesis: Partial priapism is an extremely rare urological condition that is characterized by thrombosis within the proximal segment of a single corpus cavernosum. There have only been 36 reported cases to date. Although several factors have been associated with this unusual disorder, such as trauma or bicycle riding, the etiology is still not completely understood. Treatment is usually conservative and consists of a non-steroidal anti-inflammatory and anti-thrombotic. Promising future implications: This case report supports the utilization of pentoxifylline in patients with partial priapism due to its anti-fibrogenic and anti-thrombotic properties. PMID:26401875
Landsliding in partially saturated materials
Godt, J.W.; Baum, R.L.; Lu, N.
2009-01-01
[1] Rainfall-induced landslides are pervasive in hillslope environments around the world and among the most costly and deadly natural hazards. However, capturing their occurrence with scientific instrumentation in a natural setting is extremely rare. The prevailing thinking on landslide initiation, particularly for those landslides that occur under intense precipitation, is that the failure surface is saturated and has positive pore-water pressures acting on it. Most analytic methods used for landslide hazard assessment are based on the above perception and assume that the failure surface is located beneath a water table. By monitoring the pore water and soil suction response to rainfall, we observed shallow landslide occurrence under partially saturated conditions for the first time in a natural setting. We show that the partially saturated shallow landslide at this site is predictable using measured soil suction and water content and a novel unified effective stress concept for partially saturated earth materials. Copyright 2009 by the American Geophysical Union.
Integrated Task and Data Parallel Programming
NASA Technical Reports Server (NTRS)
Grimshaw, A. S.
1998-01-01
This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated
Computer-Aided Parallelizer and Optimizer
NASA Technical Reports Server (NTRS)
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Parallel processing considerations for image recognition tasks
NASA Astrophysics Data System (ADS)
Simske, Steven J.
2011-01-01
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
Partially coherent vectorial nonparaxial beams.
Duan, Kailiang; Lü, Baida
2004-10-01
Generalized vectorial Rayleigh-Sommerfeld diffraction integrals are developed for the cross-spectral-density matrices of spatially partially coherent beams. Using the Gaussian Schell-model (GSM) beam as an example, we derive the expressions for the propagation of cross-spectral-density matrices and intensity of partially coherent vectorial nonparaxial beams, and the corresponding far-field asymptotic forms, beyond the paraxial approximation. The propagation of the vectorial nonparaxial GSM beams are evaluated and analyzed. It is shown that a 3 x 3 cross-spectral-density matrix or a vector theory is required for the exact description of nonparaxial GSM beams.
Apparatus for generating partially coherent radiation
Naulleau, Patrick P.
2005-02-22
Techniques for generating partially coherent radiation and particularly for converting effectively coherent radiation from a synchrotron to partially coherent EUV radiation suitable for projection lithography.
A Time-Optimal On-the-Fly Parallel Algorithm for Model Checking of Weak LTL Properties
NASA Astrophysics Data System (ADS)
Barnat, Jiří; Brim, Luboš; Ročkai, Petr
One of the most important open problems of parallel LTL model-checking is to design an on-the-fly scalable parallel algorithm with linear time complexity. Such an algorithm would give the optimality we have in sequential LTL model-checking. In this paper we give a partial solution to the problem. We propose an algorithm that has the required properties for a very rich subset of LTL properties, namely those expressible by weak Büchi automata.
Series-connected shaded modules to address partial shading conditions in SPV systems
NASA Astrophysics Data System (ADS)
Pareek, Smita; Dahiya, Ratna
2016-03-01
With the progress of technology and reduced cost of PV cells, the PV systems are being installed in many countries, including India. Even though this method of power generation has sufficient potential but its effective utilization is still lacking. This is because the output power of PV cells depends on many factors like insolation, temperature, climate conditions prevailing nearby, aging, using modules from different technologies/manufacturers or partial shading conditions. Among these factors, partial shading causes major reduction in output power despite the size of PV systems. As a result, the produced power is lower than the expected value. The connection of modules to each other has great impact on output power if they are prone to partial shading conditions. In this paper, PV arrays are investigated under partial shading conditions. The results show that partial shading losses can be minimized by connecting shaded modules in series rather than in parallel.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
Towards Distributed Memory Parallel Program Analysis
Quinlan, D; Barany, G; Panas, T
2008-06-17
This paper presents a parallel attribute evaluation for distributed memory parallel computer architectures where previously only shared memory parallel support for this technique has been developed. Attribute evaluation is a part of how attribute grammars are used for program analysis within modern compilers. Within this work, we have extended ROSE, a open compiler infrastructure, with a distributed memory parallel attribute evaluation mechanism to support user defined global program analysis required for some forms of security analysis which can not be addressed by a file by file view of large scale applications. As a result, user defined security analyses may now run in parallel without the user having to specify the way data is communicated between processors. The automation of communication enables an extensible open-source parallel program analysis infrastructure.
Runtime volume visualization for parallel CFD
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu
1995-01-01
This paper discusses some aspects of design of a data distributed, massively parallel volume rendering library for runtime visualization of parallel computational fluid dynamics simulations in a message-passing environment. Unlike the traditional scheme in which visualization is a postprocessing step, the rendering is done in place on each node processor. Computational scientists who run large-scale simulations on a massively parallel computer can thus perform interactive monitoring of their simulations. The current library provides an interface to handle volume data on rectilinear grids. The same design principles can be generalized to handle other types of grids. For demonstration, we run a parallel Navier-Stokes solver making use of this rendering library on the Intel Paragon XP/S. The interactive visual response achieved is found to be very useful. Performance studies show that the parallel rendering process is scalable with the size of the simulation as well as with the parallel computer.
Parallel reactor systems for bioprocess development.
Weuster-Botz, Dirk
2005-01-01
Controlled parallel bioreactor systems allow fed-batch operation at early stages of process development. The characteristics of shaken bioreactors operated in parallel (shake flask, microtiter plate), sparged bioreactors (small-scale bubble column) and stirred bioreactors (stirred-tank, stirred column) are briefly summarized. Parallel fed-batch operation is achieved with an intermittent feeding and pH-control system for up to 16 bioreactors operated in parallel on a scale of 100 ml. Examples of the scale-up and scale-down of pH-controlled microbial fed-batch processes demonstrate that controlled parallel reactor systems can result in more effective bioprocess development. Future developments are also outlined, including units of 48 parallel stirred-tank reactors with individual pH- and pO2-controls and automation as well as liquid handling system, operated on a scale of ml.
Linearly exact parallel closures for slab geometry
NASA Astrophysics Data System (ADS)
Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun
2013-08-01
Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).
Design considerations for parallel graphics libraries
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Partially molten magma ocean model
Shirley, D.N.
1983-02-15
The properties of the lunar crust and upper mantle can be explained if the outer 300-400 km of the moon was initially only partially molten rather than fully molten. The top of the partially molten region contained about 20% melt and decreased to 0% at 300-400 km depth. Nuclei of anorthositic crust formed over localized bodies of magma segregated from the partial melt, then grew peripherally until they coverd the moon. Throughout most of its growth period the anorthosite crust floated on a layer of magma a few km thick. The thickness of this layer is regulated by the opposing forces of loss of material by fractional crystallization and addition of magma from the partial melt below. Concentrations of Sr, Eu, and Sm in pristine ferroan anorthosites are found to be consistent with this model, as are trends for the ferroan anorthosites and Mg-rich suites on a diagram of An in plagioclase vs. mg in mafics. Clustering of Eu, Sr, and mg values found among pristine ferroan anorthosites are predicted by this model.
Covert Reinforcement: A Partial Replication.
ERIC Educational Resources Information Center
Ripstra, Constance C.; And Others
A partial replication of an investigation of the effect of covert reinforcement on a perceptual estimation task is described. The study was extended to include an extinction phase. There were five treatment groups: covert reinforcement, neutral scene reinforcement, noncontingent covert reinforcement, and two control groups. Each subject estimated…
Leadership in Partially Distributed Teams
ERIC Educational Resources Information Center
Plotnick, Linda
2009-01-01
Inter-organizational collaboration is becoming more common. When organizations collaborate they often do so in partially distributed teams (PDTs). A PDT is a hybrid team that has at least one collocated subteam and at least two subteams that are geographically distributed and communicate primarily through electronic media. While PDTs share many…
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.
Inverse Kinematics for a Parallel Myoelectric Elbow
2001-10-25
Inverse Kinematics for a Parallel Myoelectric Elbow A. Z. Escudero, Ja. Álvarez, L. Leija. Center of Research and Advanced Studies of the IPN...replacement above elbow are serial mechanisms driven by a DC motor and they include only one active articulation for the elbow [1]. Parallel mechanisms...are rather scarce [2]. The inverse kinematics model of a 3-degree of freedom parallel prosthetic elbow mechanism is reported. The mathematical
Parallel computations and control of adaptive structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)
1991-01-01
The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Parallel auto-correlative statistics with VTK.
Pebay, Philippe Pierre; Bennett, Janine Camille
2013-08-01
This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Parallel programming in Split-C
Culler, D.E.; Dusseau, A.; Goldstein, S.C.; Krishnamurthy, A.; Lumetta, S.; Eicken, T. von; Yelick, K.
1993-12-31
The authors introduce the Split-C language, a parallel extension of C intended for high performance programming on distributed memory multiprocessors, and demonstrate the use of the language in optimizing parallel programs. Split-C provides a global address space with a clear concept of locality and unusual assignment operators. These are used as tools to reduce the frequency and cost of remote access. The language allows a mixture of shared memory, message passing, and data parallel programming styles while providing efficient access to the underlying machine. They demonstrate the basic language concepts using regular and irregular parallel programs and give performance results for various stages of program optimization.
Shared-memory parallel programming in C++
Beck, B. )
1990-07-01
This paper discusses how researchers have produced a set of portable parallel-programming constructs for C, implemented in M4 macros. These parallel-programming macros are available under the name Parmacs. The Parmacs macros let one write parallel C programs for shared-memory, distributed-memory, and mixed-memory (shared and distributed) systems. They have been implemented on several machines. Because Parmacs offers useful parallel-programming features, the author has considered how these problems might be overcome or avoided. The author thought that using C++, rather than C, would address these problems adequately, and describes the C++ features exploited. The work described addresses shared-memory constructs.
Parallel Algorithms for the Exascale Era
Robey, Robert W.
2016-10-19
New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.
A parallel algorithm for global routing
NASA Technical Reports Server (NTRS)
Brouwer, Randall J.; Banerjee, Prithviraj
1990-01-01
A Parallel Hierarchical algorithm for Global Routing (PHIGURE) is presented. The router is based on the work of Burstein and Pelavin, but has many extensions for general global routing and parallel execution. Main features of the algorithm include structured hierarchical decomposition into separate independent tasks which are suitable for parallel execution and adaptive simplex solution for adding feedthroughs and adjusting channel heights for row-based layout. Alternative decomposition methods and the various levels of parallelism available in the algorithm are examined closely. The algorithm is described and results are presented for a shared-memory multiprocessor implementation.
Conformal pure radiation with parallel rays
NASA Astrophysics Data System (ADS)
Leistner, Thomas; Nurowski, Paweł
2012-03-01
We define pure radiation metrics with parallel rays to be n-dimensional pseudo-Riemannian metrics that admit a parallel null line bundle K and whose Ricci tensor vanishes on vectors that are orthogonal to K. We give necessary conditions in terms of the Weyl, Cotton and Bach tensors for a pseudo-Riemannian metric to be conformal to a pure radiation metric with parallel rays. Then, we derive conditions in terms of the tractor calculus that are equivalent to the existence of a pure radiation metric with parallel rays in a conformal class. We also give analogous results for n-dimensional pseudo-Riemannian pp-waves.
Parallel Genetic Algorithm for Alpha Spectra Fitting
NASA Astrophysics Data System (ADS)
García-Orellana, Carlos J.; Rubio-Montero, Pilar; González-Velasco, Horacio
2005-01-01
We present a performance study of alpha-particle spectra fitting using parallel Genetic Algorithm (GA). The method uses a two-step approach. In the first step we run parallel GA to find an initial solution for the second step, in which we use Levenberg-Marquardt (LM) method for a precise final fit. GA is a high resources-demanding method, so we use a Beowulf cluster for parallel simulation. The relationship between simulation time (and parallel efficiency) and processors number is studied using several alpha spectra, with the aim of obtaining a method to estimate the optimal processors number that must be used in a simulation.
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1995-01-01
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
Parallel Computing for Brain Simulation.
Pastur-Romay, L A; Porto-Pazos, A B; Cedrón, F; Pazos, A
2016-11-04
The human brain is the most complex system in the known universe, but it is the most unknown system. It allows the human beings to possess extraordinary capacities. However, we don´t understand yet how and why most of these capacities are produced. For decades, it have been tried that the computers reproduces these capacities. On one hand, to help understanding the nervous system. On the other hand, to process the data in a more efficient way than before. It is intended to make the computers process the information like the brain does it. The important technological developments and the big multidisciplinary projects have allowed create the first simulation with a number of neurons similar to the human brain neurons number. This paper presents an update review about the main research projects that are trying of simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the actual applications of these works and also the future trends. We have reviewed some works that look for a step forward in Neuroscience and other ones that look for a breakthrough in Computer Science (neuromorphic hardware, machine learning techniques). We summarize the most outstanding characteristics of them and present the latest advances and future plans. In addition, this review remarks the importance of considering not only neurons: the computational models of the brain should include glial cells, given the proven importance of the astrocytes in the information processing.
NASA Astrophysics Data System (ADS)
Zemlyanaya, E. V.; Bashashin, M. V.; Rahmonov, I. R.; Shukrinov, Yu. M.; Atanasova, P. Kh.; Volokhova, A. V.
2016-10-01
We consider a model of system of long Josephson junctions (LJJ) with inductive and capacitive coupling. Corresponding system of nonlinear partial differential equations is solved by means of the standard three-point finite-difference approximation in the spatial coordinate and utilizing the Runge-Kutta method for solution of the resulting Cauchy problem. A parallel algorithm is developed and implemented on a basis of the MPI (Message Passing Interface) technology. Effect of the coupling between the JJs on the properties of LJJ system is demonstrated. Numerical results are discussed from the viewpoint of effectiveness of parallel implementation.
Parallel methods for the flight simulation model
Xiong, Wei Zhong; Swietlik, C.
1994-06-01
The Advanced Computer Applications Center (ACAC) has been involved in evaluating advanced parallel architecture computers and the applicability of these machines to computer simulation models. The advanced systems investigated include parallel machines with shared. memory and distributed architectures consisting of an eight processor Alliant FX/8, a twenty four processor sor Sequent Symmetry, Cray XMP, IBM RISC 6000 model 550, and the Intel Touchstone eight processor Gamma and 512 processor Delta machines. Since parallelizing a truly efficient application program for the parallel machine is a difficult task, the implementation for these machines in a realistic setting has been largely overlooked. The ACAC has developed considerable expertise in optimizing and parallelizing application models on a collection of advanced multiprocessor systems. One of aspect of such an application model is the Flight Simulation Model, which used a set of differential equations to describe the flight characteristics of a launched missile by means of a trajectory. The Flight Simulation Model was written in the FORTRAN language with approximately 29,000 lines of source code. Depending on the number of trajectories, the computation can require several hours to full day of CPU time on DEC/VAX 8650 system. There is an impetus to reduce the execution time and utilize the advanced parallel architecture computing environment available. ACAC researchers developed a parallel method that allows the Flight Simulation Model to be able to run in parallel on the multiprocessor system. For the benchmark data tested, the parallel Flight Simulation Model implemented on the Alliant FX/8 has achieved nearly linear speedup. In this paper, we describe a parallel method for the Flight Simulation Model. We believe the method presented in this paper provides a general concept for the design of parallel applications. This concept, in most cases, can be adapted to many other sequential application programs.
Recent Improvements to HST Parallel Scheduling
NASA Astrophysics Data System (ADS)
Henry, Ronald; Butschky, Mike
The Hubble Space Telescope (HST) has several scientific instruments (SIs) that may be used at any given time. Most primary visits submitted by HST observers only use one SI, leaving the other SIs free to be requested by ``pure parallel'' observing programs. In order to accomplish this, separate scheduling units (SUs) for each parallel SI must be created and then scheduled by the Science Planning and Scheduling System (SPSS), taking into account numerous orbital and scientific constraints. The Parallel Observation Matching System (POMS) has the task of matching parallel visits to primary observations and ``crafting'' appropriate parallel SUs at each opportunity, taking scientific criteria and orbital constraints into account. The process for planning and scheduling parallel observations is thus quite different from the process for primary science. In the past, custom crafting rules for each parallel program were necessary, requiring full-time support from a software developer. In addition, because POMS ran as a standalone system, its ability to model how long parallel SUs would take was limited, especially with the flexible buffer-management schemes used for the second-generation SIs. A new version of POMS was developed in 1997. This version uses a formal proposal syntax (the same used for primary observations) for parallels, so that different proposals can be handled uniformly and without the need for customized ``crafting rules.'' In addition, POMS is integrated with the Transformation (TRANS) planning system in order to give it full knowledge of overheads within an SU, eliminating the need for ad hoc modeling. The power and versatility of this approach has paid off in improved utilization of parallel opportunities, greatly reduced maintenance costs, and an ability to gracefully handle new parallel proposals and new SIs with minimal software effort. This paper discusses the requirements, design, and operational results of the new POMS.
ERIC Educational Resources Information Center
von Davier, Matthias
2016-01-01
This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
NASA Astrophysics Data System (ADS)
Herrera, I.; Herrera, G. S.
2015-12-01
Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
The language parallel Pascal and other aspects of the massively parallel processor
NASA Technical Reports Server (NTRS)
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Partial polarization by quantum distinguishability
NASA Astrophysics Data System (ADS)
Lahiri, Mayukh; Hochrainer, Armin; Lapkiewicz, Radek; Lemos, Gabriela Barreto; Zeilinger, Anton
2017-03-01
We establish that a connection exists between wave-particle duality of photons and partial polarization of a light beam. We perform a two-path lowest-order (single photon) interference experiment and demonstrate both theoretically and experimentally that the degree of polarization of the light beam emerging from an output of the interferometer depends on path distinguishability. In our experiment, we are able to change the quantum state of the emerging photon from a pure state to a fully mixed state without any direct interaction with the photon. Although most lowest-order interference experiments can be explained by classical theory, our experiment has no genuine classical analog. Our results show that a case exists where the cause of partial polarization is beyond the scope of classical theory.
Wettability of partially suspended graphene
Ondarçuhu, Thierry; Thomas, Vincent; Nuñez, Marc; Dujardin, Erik; Rahman, Atikur; Black, Charles T.; Checco, Antonio
2016-01-01
The dependence of the wettability of graphene on the nature of the underlying substrate remains only partially understood. Here, we systematically investigate the role of liquid-substrate interactions on the wettability of graphene by varying the area fraction of suspended graphene from 0 to 95% by means of nanotextured substrates. We find that completely suspended graphene exhibits the highest water contact angle (85° ± 5°) compared to partially suspended or supported graphene, regardless of the hydrophobicity (hydrophilicity) of the substrate. Further, 80% of the long-range water-substrate interactions are screened by the graphene monolayer, the wettability of which is primarily determined by short-range graphene-liquid interactions. By its well-defined chemical and geometrical properties, supported graphene therefore provides a model system to elucidate the relative contribution of short and long range interactions to the macroscopic contact angle. PMID:27072195
Wettability of partially suspended graphene
Ondarçuhu, Thierry; Thomas, Vincent; Nuñez, Marc; ...
2016-04-13
Dependence on the wettability of graphene on the nature of the underlying substrate remains only partially understood. We systematically investigate the role of liquid-substrate interactions on the wettability of graphene by varying the area fraction of suspended graphene from 0 to 95% by means of nanotextured substrates. We find that completely suspended graphene exhibits the highest water contact angle (85° ± 5°) compared to partially suspended or supported graphene, regardless of the hydrophobicity (hydrophilicity) of the substrate. Moreover, 80% of the long-range water-substrate interactions are screened by the graphene monolayer, the wettability of which is primarily determined by short-range graphene-liquidmore » interactions. By its well-defined chemical and geometrical properties, supported graphene therefore provides a model system to elucidate the relative contribution of short and long range interactions to the macroscopic contact angle.« less
Wettability of partially suspended graphene
Ondarçuhu, Thierry; Thomas, Vincent; Nuñez, Marc; Dujardin, Erik; Rahman, Atikur; Black, Charles T.; Checco, Antonio
2016-04-13
Dependence on the wettability of graphene on the nature of the underlying substrate remains only partially understood. We systematically investigate the role of liquid-substrate interactions on the wettability of graphene by varying the area fraction of suspended graphene from 0 to 95% by means of nanotextured substrates. We find that completely suspended graphene exhibits the highest water contact angle (85° ± 5°) compared to partially suspended or supported graphene, regardless of the hydrophobicity (hydrophilicity) of the substrate. Moreover, 80% of the long-range water-substrate interactions are screened by the graphene monolayer, the wettability of which is primarily determined by short-range graphene-liquid interactions. By its well-defined chemical and geometrical properties, supported graphene therefore provides a model system to elucidate the relative contribution of short and long range interactions to the macroscopic contact angle.
Matching games with partial information
NASA Astrophysics Data System (ADS)
Laureti, Paolo; Zhang, Yi-Cheng
2003-06-01
We analyze different ways of pairing agents in a bipartite matching problem, with regard to its scaling properties and to the distribution of individual “satisfactions”. Then we explore the role of partial information and bounded rationality in a generalized Marriage Problem, comparing the benefits obtained by self-searching and by a matchmaker. Finally we propose a modified matching game intended to mimic the way consumers’ information makes firms to enhance the quality of their products in a competitive market.
Tree reconstruction from partial orders
Kannan, S.K. ); Warnow, T.J. )
1993-01-01
The problem of constructing trees given a matrix of interleaf distances is motivated by applications in computational evolutionary biology and linguistics. The general problem is to find an edge-weighted tree which most closely approximates the distance matrix. Although the construction problem is easy when the tree exactly fits the distance matrix, optimization problems under all popular criteria are either known or conjectured to be NP-complete. In this paper we consider the related problem where we are given a partial order on the pairwise distances, and wish to construct (if possible) an edge-weighted tree realizing the partial order. In particular we are interested in partial orders which arise from experiments on triples of species, which determine either a linear ordering of the three pairwise distances (called Total Order Model or TOM experiments) or only the pair(s) of minimum distance apart (called Partial Order Model or POM experiments). The POM and TOM experimental model is inspired by the model proposed by Kannan, Lawler, and Warnow for constructing trees from experiments which determine the rooted topology for any triple of species. We examine issues of construction of trees and consistency of TOM and POM experiments, where the trees may either be weighted or unweighted. Using these experiments to construct unweighted trees without nodes of degree two is motivated by a similar problem studied by Winkler, called the Discrete Metric Realization problem, which he showed to be strongly NP-hard. We have the following results: Determining consistency of a set of TOM or POM experiments is NP-Complete whether the tree is weighted or constrained to be unweighted and without degree two nodes. We can construct unweighted trees without degree two nodes from TOM experiments in optimal O(n[sup 3]) time and from POM experiments in O(n[sup 4]) time.
Tree reconstruction from partial orders
Kannan, S.K.; Warnow, T.J.
1993-03-01
The problem of constructing trees given a matrix of interleaf distances is motivated by applications in computational evolutionary biology and linguistics. The general problem is to find an edge-weighted tree which most closely approximates the distance matrix. Although the construction problem is easy when the tree exactly fits the distance matrix, optimization problems under all popular criteria are either known or conjectured to be NP-complete. In this paper we consider the related problem where we are given a partial order on the pairwise distances, and wish to construct (if possible) an edge-weighted tree realizing the partial order. In particular we are interested in partial orders which arise from experiments on triples of species, which determine either a linear ordering of the three pairwise distances (called Total Order Model or TOM experiments) or only the pair(s) of minimum distance apart (called Partial Order Model or POM experiments). The POM and TOM experimental model is inspired by the model proposed by Kannan, Lawler, and Warnow for constructing trees from experiments which determine the rooted topology for any triple of species. We examine issues of construction of trees and consistency of TOM and POM experiments, where the trees may either be weighted or unweighted. Using these experiments to construct unweighted trees without nodes of degree two is motivated by a similar problem studied by Winkler, called the Discrete Metric Realization problem, which he showed to be strongly NP-hard. We have the following results: Determining consistency of a set of TOM or POM experiments is NP-Complete whether the tree is weighted or constrained to be unweighted and without degree two nodes. We can construct unweighted trees without degree two nodes from TOM experiments in optimal O(n{sup 3}) time and from POM experiments in O(n{sup 4}) time.
Microflora of partially processed lettuce.
Magnuson, J A; King, A D; Török, T
1990-12-01
Bacteria, yeasts, and molds isolated from partially processed iceberg lettuce were taxonomically classified. The majority of bacterial isolates were gram-negative rods. Pseudomonas, Erwinia, and Serratia species were commonly found. Yeasts most frequently isolated from lettuce included members of the genera Candida, Cryptococcus, Pichia, Torulaspora, and Trichosporon. Comparatively few molds were isolated; members of the genera Rhizopus, Cladosporium, Phoma, Aspergillus, and Penicillium were identified.
Prototyping Parallel and Distributed Programs in Proteus
1990-10-01
Cole90, Gibb89]. " Highly-parallel processors - Applications for highly-parallel machines such as the CM- 2 or the iPSC are programmed using data...Programming, (Prentice-Hall, Englewood Cliffs, NJ) 1990. [Gibb89] Gibbons , P.B., "A more practical PRAM model", in: Proceedings of the First ACM
Simulation Exploration through Immersive Parallel Planes: Preprint
Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny; Smith, Steve
2016-03-01
We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Parallel computation with the spectral element method
Ma, Hong
1995-12-01
Spectral element models for the shallow water equations and the Navier-Stokes equations have been successfully implemented on a data parallel supercomputer, the Connection Machine model CM-5. The nonstaggered grid formulations for both models are described, which are shown to be especially efficient in data parallel computing environment.
Predicting Protein Structure Using Parallel Genetic Algorithms.
1994-12-01
By " Predicting rotein Structure D istribticfiar.. ................ Using Parallel Genetic Algorithms ,Avaiu " ’ •"... Dist THESIS I IGeorge H...iiLite-d Approved for public release; distribution unlimited AFIT/ GCS /ENG/94D-03 Predicting Protein Structure Using Parallel Genetic Algorithms ...1-1 1.2 Genetic Algorithms ......... ............................ 1-3 1.3 The Protein Folding Problem
Parallel Activation in Bilingual Phonological Processing
ERIC Educational Resources Information Center
Lee, Su-Yeon
2011-01-01
In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…
Serial Order: A Parallel Distributed Processing Approach.
ERIC Educational Resources Information Center
Jordan, Michael I.
Human behavior shows a variety of serially ordered action sequences. This paper presents a theory of serial order which describes how sequences of actions might be learned and performed. In this theory, parallel interactions across time (coarticulation) and parallel interactions across space (dual-task interference) are viewed as two aspects of a…
MULTIOBJECTIVE PARALLEL GENETIC ALGORITHM FOR WASTE MINIMIZATION
In this research we have developed an efficient multiobjective parallel genetic algorithm (MOPGA) for waste minimization problems. This MOPGA integrates PGAPack (Levine, 1996) and NSGA-II (Deb, 2000) with novel modifications. PGAPack is a master-slave parallel implementation of a...
Parallel Narrative Structure in Paul Harding's "Tinkers"
ERIC Educational Resources Information Center
Çirakli, Mustafa Zeki
2014-01-01
The present paper explores the implications of parallel narrative structure in Paul Harding's "Tinkers" (2009). Besides primarily recounting the two sets of parallel narratives, "Tinkers" also comprises of seemingly unrelated fragments such as excerpts from clock repair manuals and diaries. The main stories, however, told…
Parallel Computing Strategies for Irregular Algorithms
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Parallel unstructured grid generation for computational aerosciences
NASA Technical Reports Server (NTRS)
Shephard, Mark S.
1993-01-01
The objective of this research project is to develop efficient parallel automatic grid generation procedures for use in computational aerosciences. This effort is focused on a parallel version of the Finite Octree grid generator. Progress made during the first six months is reported.
Differences Between Distributed and Parallel Systems
Brightwell, R.; Maccabe, A.B.; Rissen, R.
1998-10-01
Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.
Configuration space representation in parallel coordinates
NASA Technical Reports Server (NTRS)
Fiorini, Paolo; Inselberg, Alfred
1989-01-01
By means of a system of parallel coordinates, a nonprojective mapping from R exp N to R squared is obtained for any positive integer N. In this way multivariate data and relations can be represented in the Euclidean plane (embedded in the projective plane). Basically, R squared with Cartesian coordinates is augmented by N parallel axes, one for each variable. The N joint variables of a robotic device can be represented graphically by using parallel coordinates. It is pointed out that some properties of the relation are better perceived visually from the parallel coordinate representation, and that new algorithms and data structures can be obtained from this representation. The main features of parallel coordinates are described, and an example is presented of their use for configuration space representation of a mechanical arm (where Cartesian coordinates cannot be used).
Implementation and performance of parallel Prolog interpreter
Wei, S.; Kale, L.V.; Balkrishna, R. . Dept. of Computer Science)
1988-01-01
In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
Parallel Algebraic Multigrid Methods - High Performance Preconditioners
Yang, U M
2004-11-11
The development of high performance, massively parallel computers and the increasing demands of computationally challenging applications have necessitated the development of scalable solvers and preconditioners. One of the most effective ways to achieve scalability is the use of multigrid or multilevel techniques. Algebraic multigrid (AMG) is a very efficient algorithm for solving large problems on unstructured grids. While much of it can be parallelized in a straightforward way, some components of the classical algorithm, particularly the coarsening process and some of the most efficient smoothers, are highly sequential, and require new parallel approaches. This chapter presents the basic principles of AMG and gives an overview of various parallel implementations of AMG, including descriptions of parallel coarsening schemes and smoothers, some numerical results as well as references to existing software packages.
A parallel variable metric optimization algorithm
NASA Technical Reports Server (NTRS)
Straeter, T. A.
1973-01-01
An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.
National Combustion Code: Parallel Implementation and Performance
NASA Technical Reports Server (NTRS)
Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.
2000-01-01
The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.
Parallelization of a Compositional Reservoir Simulator
NASA Astrophysics Data System (ADS)
Reme, Hilde; Åge Øye, Geir; Espedal, Magne S.; Fladmark, Gunnar E.
A finite volume dicretization has been used to solve compositional flow in porous media. Secondary migration in fractured rocks has been the main motivation for the work. Multipoint flux approximation has been implemented and adaptive local grid refinement, based on domain decomposition, is used at fractures and faults. The parallelization method, which is described in this paper, strongly promotes code reuse and gives a very high level of parallelization despite low implementation costs. The programming framework is also portable to other platforms or other applications. We have presented computer experiments to examine the parallel efficiency of the implemented parallel simulator with respect to scalability and speedup. Keywords: porous media, multipoint flux approximation, domain decomposition, parallelization
Genetic Parallel Programming: design and implementation.
Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong
2006-01-01
This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Parallel hypergraph partitioning for scientific computing.
Heaphy, Robert; Devine, Karen Dragon; Catalyurek, Umit; Bisseling, Robert; Hendrickson, Bruce Alan; Boman, Erik Gunnar
2005-07-01
Graph partitioning is often used for load balancing in parallel computing, but it is known that hypergraph partitioning has several advantages. First, hypergraphs more accurately model communication volume, and second, they are more expressive and can better represent nonsymmetric problems. Hypergraph partitioning is particularly suited to parallel sparse matrix-vector multiplication, a common kernel in scientific computing. We present a parallel software package for hypergraph (and sparse matrix) partitioning developed at Sandia National Labs. The algorithm is a variation on multilevel partitioning. Our parallel implementation is novel in that it uses a two-dimensional data distribution among processors. We present empirical results that show our parallel implementation achieves good speedup on several large problems (up to 33 million nonzeros) with up to 64 processors on a Linux cluster.
Broadcasting a message in a parallel computer
Berg, Jeremy E.; Faraj, Ahmad A.
2011-08-02
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Sequential bioequivalence approaches for parallel designs.
Fuglsang, Anders
2014-05-01
Regulators in EU, USA and Canada allow the use of two-stage approaches for evaluation of bioequivalence. The purpose of this paper is to evaluate such designs for parallel groups using trial simulations. The methods developed by Diane Potvin and co-workers were adapted to parallel designs. Trials were simulated and evaluated on basis of either equal or unequal variances between treatment groups. Methods B and C of Potvin et al., when adapted for parallel designs, protected well against type I error rate inflation under all of the simulated scenarios. Performance characteristics of the new parallel design methods showed little dependence on the assumption of equality of the test and reference variances. This is the first paper to describe the performance of two-stage approaches for parallel designs used to evaluate bioequivalence. The results may prove useful to sponsors developing formulations where crossover designs for bioequivalence evaluation are undesirable.
Solving the Cauchy-Riemann equations on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
Parallel preconditioning for the solution of nonsymmetric banded linear systems
Amodio, P.; Mazzia, F.
1994-12-31
Many computational techniques require the solution of banded linear systems. Common examples derive from the solution of partial differential equations and of boundary value problems. In particular the authors are interested in the parallel solution of block Hessemberg linear systems Gx = f, arising from the solution of ordinary differential equations by means of boundary value methods (BVMs), even if the considered preconditioning may be applied to any block banded linear system. BVMs have been extensively investigated in the last few years and their stability properties give promising results. A new class of BVMs called Reverse Adams, which are BV-A-stable for orders up to 6, and BV-A{sub 0}-stable for orders up to 9, have been studied.
Parallel fast Fourier transforms for non power of two data
Semeraro, B.D.
1994-09-01
This report deals with parallel algorithms for computing discrete Fourier transforms of real sequences of length N not equal to a power of two. The method described is an extension of existing power of two transforms to sequences with N a product of small primes. In particular, this implementation requires N = 2{sup p}3{sup q}5{sup r}. The communication required is the same as for a transform of length N = 2{sup p}. The algorithm presented is intended for use in the solution of partial differential equations, or in any situation in which a large number of forward and backward transforms must be performed and in which the Fourier Coefficients need not be ordered. This implementation is a one dimensional FFT but the techniques are applicable to multidimensional transforms as well. The algorithm has been implemented on a 128 node Intel Ipsc/860.
Code Parallelization with CAPO: A User Manual
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2001-01-01
A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Gooding, Thomas Michael; McCarthy, Patrick Joseph
2010-03-02
A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.
The delayed coupling method: An algorithm for solving banded diagonal matrix problems in parallel
Mattor, N.; Williams, T.J.; Hewett, D.W.; Dimits, A.M.
1997-09-01
We present a new algorithm for solving banded diagonal matrix problems efficiently on distributed-memory parallel computers, designed originally for use in dynamic alternating-direction implicit partial differential equation solvers. The algorithm optimizes efficiency with respect to the number of numerical operations and to the amount of interprocessor communication. This is called the ``delayed coupling method`` because the communication is deferred until needed. We focus here on tridiagonal and periodic tridiagonal systems.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Xyce parallel electronic simulator : users' guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique
AMR++: Object-Oriented Parallel Adaptive Mesh Refinement
Quinlan, D.; Philip, B.
2000-02-02
Adaptive mesh refinement (AMR) computations are complicated by their dynamic nature. The development of solvers for realistic applications is complicated by both the complexity of the AMR and the geometry of realistic problem domains. The additional complexity of distributed memory parallelism within such AMR applications most commonly exceeds the level of complexity that can be reasonable maintained with traditional approaches toward software development. This paper will present the details of our object-oriented work on the simplification of the use of adaptive mesh refinement on applications with complex geometries for both serial and distributed memory parallel computation. We will present an independent set of object-oriented abstractions (C++ libraries) well suited to the development of such seemingly intractable scientific computations. As an example of the use of this object-oriented approach we will present recent results of an application modeling fluid flow in the eye. Within this example, the geometry is too complicated for a single curvilinear coordinate grid and so a set of overlapping curvilinear coordinate grids' are used. Adaptive mesh refinement and the required grid generation work to support the refinement process is coupled together in the solution of essentially elliptic equations within this domain. This paper will focus on the management of complexity within development of the AMR++ library which forms a part of the Overture object-oriented framework for the solution of partial differential equations within scientific computing.
On combining computational differentiation and toolkits for parallel scientific computing.
Bischof, C. H.; Buecker, H. M.; Hovland, P. D.
2000-06-08
Automatic differentiation is a powerful technique for evaluating derivatives of functions given in the form of a high-level programming language such as Fortran, C, or C++. The program is treated as a potentially very long sequence of elementary statements to which the chain rule of differential calculus is applied over and over again. Combining automatic differentiation and the organizational structure of toolkits for parallel scientific computing provides a mechanism for evaluating derivatives by exploiting mathematical insight on a higher level. In these toolkits, algorithmic structures such as BLAS-like operations, linear and nonlinear solvers, or integrators for ordinary differential equations can be identified by their standardized interfaces and recognized as high-level mathematical objects rather than as a sequence of elementary statements. In this note, the differentiation of a linear solver with respect to some parameter vector is taken as an example. Mathematical insight is used to reformulate this problem into the solution of multiple linear systems that share the same coefficient matrix but differ in their right-hand sides. The experiments reported here use ADIC, a tool for the automatic differentiation of C programs, and PETSC, an object-oriented toolkit for the parallel solution of scientific problems modeled by partial differential equations.
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU.
Xia, Yong; Wang, Kuanquan; Zhang, Henggui
2015-01-01
Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU
Xia, Yong; Wang, Kuanquan; Zhang, Henggui
2015-01-01
Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations. PMID:26581957
Super-resolved Parallel MRI by Spatiotemporal Encoding
Schmidt, Rita; Baishya, Bikash; Ben-Eliezer, Noam; Seginer, Amir; Frydman, Lucio
2016-01-01
Recent studies described an alternative “ultrafast” scanning method based on spatiotemporal (SPEN) principles. SPEN demonstrates numerous potential advantages over EPI-based alternatives, at no additional expense in experimental complexity. An important aspect that SPEN still needs to achieve for providing a competitive acquisition alternative entails exploiting parallel imaging algorithms, without compromising its proven capabilities. The present work introduces a combination of multi-band frequency-swept pulses simultaneously encoding multiple, partial fields-of-view; together with a new algorithm merging a Super-Resolved SPEN image reconstruction and SENSE multiple-receiving methods. The ensuing approach enables one to reduce both the excitation and acquisition times of ultrafast SPEN acquisitions by the customary acceleration factor R, without compromises in either the ensuing spatial resolution, SAR deposition, or the capability to operate in multi-slice mode. The performance of these new single-shot imaging sequences and their ancillary algorithms were explored on phantoms and human volunteers at 3T. The gains of the parallelized approach were particularly evident when dealing with heterogeneous systems subject to major T2/T2* effects, as is the case upon single-scan imaging near tissue/air interfaces. PMID:24120293
Laparoscopic radical and partial cystectomy
Challacombe, Ben J.; Rose, Kristen; Dasgupta, Prokar
2005-01-01
Radical cystectomy remains the standard treatment for muscle invasive organ confined bladder carcinoma. Laparoscopic radical cystoprostatectomy (LRC) is an advanced laparoscopic procedure that places significant demands on the patient and the surgeon alike. It is a prolonged procedure which includes several technical steps and requires highly developed laparoscopic skills including intra-corporeal suturing. Here we review the development of the technique, the indications, complications and outcomes. We also examine the potential benefits of robotic-assisted LRC and explore the indications and technique of laparoscopic partial cystectomy. PMID:21206662
Partial coalescence of soap bubbles
NASA Astrophysics Data System (ADS)
Harris, Daniel M.; Pucci, Giuseppe; Bush, John W. M.
2015-11-01
We present the results of an experimental investigation of the merger of a soap bubble with a planar soap film. When gently deposited onto a horizontal film, a bubble may interact with the underlying film in such a way as to decrease in size, leaving behind a smaller daughter bubble with approximately half the radius of its progenitor. The process repeats up to three times, with each partial coalescence event occurring over a time scale comparable to the inertial-capillary time. Our results are compared to the recent numerical simulations of Martin and Blanchette and to the coalescence cascade of droplets on a fluid bath.
A simple hyperbolic model for communication in parallel processing environments
NASA Technical Reports Server (NTRS)
Stoica, Ion; Sultan, Florin; Keyes, David
1994-01-01
We introduce a model for communication costs in parallel processing environments called the 'hyperbolic model,' which generalizes two-parameter dedicated-link models in an analytically simple way. Dedicated interprocessor links parameterized by a latency and a transfer rate that are independent of load are assumed by many existing communication models; such models are unrealistic for workstation networks. The communication system is modeled as a directed communication graph in which terminal nodes represent the application processes that initiate the sending and receiving of the information and in which internal nodes, called communication blocks (CBs), reflect the layered structure of the underlying communication architecture. The direction of graph edges specifies the flow of the information carried through messages. Each CB is characterized by a two-parameter hyperbolic function of the message size that represents the service time needed for processing the message. The parameters are evaluated in the limits of very large and very small messages. Rules are given for reducing a communication graph consisting of many to an equivalent two-parameter form, while maintaining an approximation for the service time that is exact in both large and small limits. The model is validated on a dedicated Ethernet network of workstations by experiments with communication subprograms arising in scientific applications, for which a tight fit of the model predictions with actual measurements of the communication and synchronization time between end processes is demonstrated. The model is then used to evaluate the performance of two simple parallel scientific applications from partial differential equations: domain decomposition and time-parallel multigrid. In an appropriate limit, we also show the compatibility of the hyperbolic model with the recently proposed LogP model.
Parallelization of the Implicit RPLUS Algorithm
NASA Technical Reports Server (NTRS)
Orkwis, Paul D.
1997-01-01
The multiblock reacting Navier-Stokes flow solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.
Parallelization of the Implicit RPLUS Algorithm
NASA Technical Reports Server (NTRS)
Orkwis, Paul D.
1994-01-01
The multiblock reacting Navier-Stokes flow-solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.
Knowledge representation into Ada parallel processing
NASA Technical Reports Server (NTRS)
Masotto, Tom; Babikyan, Carol; Harper, Richard
1990-01-01
The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.
Time-parallel multiscale/multiphysics framework
Frantziskonis, G.; Muralidharan, Krishna; Deymier, Pierre; Simunovic, Srdjan; Nukala, Phani K; Pannala, Sreekanth
2009-01-01
We introduce the time-parallel compound wavelet matrix method (tpCWM) for modeling the temporal evolution of multiscale and multiphysics systems. The method couples time parallel (TP) and CWM methods operating at different spatial and temporal scales. We demonstrate the efficiency of our approach on two examples: a chemical reaction kinetic system and a non-linear predator prey system. Our results indicate that the tpCWM technique is capable of accelerating time-to-solution by 2 3-orders of magnitude and is amenable to efficient parallel implementation.
Language constructs for modular parallel programs
Foster, I.
1996-03-01
We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.
Distributed parallel messaging for multiprocessor systems
Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka
2013-06-04
A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Parallel path aspects of transmission modeling
Kavicky, J.A.; Shahidehpour, S.M.
1996-11-01
This paper examines the present methods and modeling techniques available to address the effects of parallel flows resulting from various firm and short-term energy transactions. A survey of significant methodologies is conducted to determine the present status of parallel flow transaction modeling. The strengths and weaknesses of these approaches are identified to suggest areas of further modeling improvements. The motivating force behind this research is to improve transfer capability assessment accuracy by suggesting a real-time modeling environment that adequately represents the influences of parallel flows while recognizing operational constraints and objectives.
Fast combinatorial optimization with parallel digital computers.
Kakeya, H; Okabe, Y
2000-01-01
This paper presents an algorithm which realizes fast search for the solutions of combinatorial optimization problems with parallel digital computers.With the standard weight matrices designed for combinatorial optimization, many iterations are required before convergence to a quasioptimal solution even when many digital processors can be used in parallel. By removing the components of the eingenvectors with eminent negative eigenvalues of the weight matrix, the proposed algorithm avoids oscillation and realizes energy reduction under synchronous discrete dynamics, which enables parallel digital computers to obtain quasi-optimal solutions with much less time than the conventional algorithm.
Synchronization Of Parallel Discrete Event Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Heterogeneous parallel programming capability. Final report
Flower, J.W.; Kolawa, A.
1990-11-30
In creating a heterogeneous parallel processing capability we are really trying to approach three basic problems with current systems: (1) Supercomputer and parallel computer hardware architectures vary widely but need to support one or two fairly standard programming languages and programming models. A particularly important issue concerns the short life cycle of individual hardware designs; (2) Many algorithms require capabilities beyond the reach of single superconducters but could be approached by several machines working together; and (3) Performing a given task requires integration of a system that may contain many components in addition to the super or parallel computer itself. Peripherals from many different manufacturers must be incorporated.
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation
NASA Technical Reports Server (NTRS)
Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.
1996-01-01
We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
Formation of parallel two-phase flow in nanochannel and application to solvent extraction
NASA Astrophysics Data System (ADS)
Kazoe, Yutaka; Ugajin, Takuya; Ohta, Ryoichi; Mawatari, Kazuma; Kitamori, Takehiko; The University of Tokyo Team
2015-11-01
Micro chemical systems have realized high-throughput analysis in ultra small volumes. Our group has established unit operations such as extraction, separation and reaction, and a concept of integration of chemical processes using parallel multi-phase flows in microchannels. Recently, the research field has been extended to 10-1000 nm space (extended-nanospace). Exploiting extended-nanospace, we developed ultra high performance chemical operations such as aL-chromatography and single molecule immunoassay. However, formation of parallel multi-phase flow in nanochannels has been difficult. The challenge is to control liquid-liquid/gas-liquid interfaces in 100 nm-scale. For this purpose, this study developed a partial surface modification method of nanochannel and verified formation of parallel two-phase flow. We achieved partial hydrophobic modification using focused ion beam (FIB). Using this method, formation of parallel water/dodecane two-phase flow in a nanochannel of 1500 nm width and 890 nm depth was succeeded. Solvent extraction of lipid, which is a basic separation in bioanalysis, was achieved in 25 fL volume much smaller than single cell. This study will greatly contribute to develop novel nanofluidic devices for chemical analysis and chemical synthesis. This work was supported by Japan Science and Technology Agency, Core Research for Evolutional Science and Technology.
Social Problems and Deviance: Some Parallel Issues
ERIC Educational Resources Information Center
Kitsuse, John I.; Spector, Malcolm
1975-01-01
Explores parallel developments in labeling theory and in the value conflict approach to social problems. Similarities in their critiques of functionalism and etiological theory as well as their emphasis on the definitional process are noted. (Author)
Data parallel sorting for particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
Runtime support for parallelizing data mining algorithms
NASA Astrophysics Data System (ADS)
Jin, Ruoming; Agrawal, Gagan
2002-03-01
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Parallel processing of a rotating shaft simulation
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.
1989-01-01
A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.
NAS Parallel Benchmarks, Multi-Zone Versions
NASA Technical Reports Server (NTRS)
vanderWijngaart, Rob F.; Haopiang, Jin
2003-01-01
We describe an extension of the NAS Parallel Benchmarks (NPB) suite that involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy, which is common among structured-mesh production flow solver codes in use at NASA Ames and elsewhere, provides relatively easily exploitable coarse-grain parallelism between meshes. Since the individual application benchmarks also allow fine-grain parallelism themselves, this NPB extension, named NPB Multi-Zone (NPB-MZ), is a good candidate for testing hybrid and multi-level parallelization tools and strategies.
Asynchronous parallel pattern search for nonlinear optimization
P. D. Hough; T. G. Kolda; V. J. Torczon
2000-01-01
Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems
Parallel programming with PCN. Revision 1
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
A Nomograph for Resistors in Parallel
NASA Astrophysics Data System (ADS)
Greenslade, Thomas B.
2002-11-01
The author has a large collection of 19th century textbooks, and found a quotation in one of them concerning a construction for finding the equivalent resistance of two resistors in parallel. This note discusses the equations.
Parallel Implementation of the Discontinuous Galerkin Method
NASA Technical Reports Server (NTRS)
Baggag, Abdalkader; Atkins, Harold; Keyes, David
1999-01-01
This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.
Feature Clustering for Accelerating Parallel Coordinate Descent
Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.
2012-12-06
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
Parallel algorithms for dynamically partitioning unstructured grids
Diniz, P.; Plimpton, S.; Hendrickson, B.; Leland, R.
1994-10-01
Grid partitioning is the method of choice for decomposing a wide variety of computational problems into naturally parallel pieces. In problems where computational load on the grid or the grid itself changes as the simulation progresses, the ability to repartition dynamically and in parallel is attractive for achieving higher performance. We describe three algorithms suitable for parallel dynamic load-balancing which attempt to partition unstructured grids so that computational load is balanced and communication is minimized. The execution time of algorithms and the quality of the partitions they generate are compared to results from serial partitioners for two large grids. The integration of the algorithms into a parallel particle simulation is also briefly discussed.
The PISCES 2 parallel programming environment
NASA Technical Reports Server (NTRS)
Pratt, Terrence W.
1987-01-01
PISCES 2 is a programming environment for scientific and engineering computations on MIMD parallel computers. It is currently implemented on a flexible FLEX/32 at NASA Langley, a 20 processor machine with both shared and local memories. The environment provides an extended Fortran for applications programming, a configuration environment for setting up a run on the parallel machine, and a run-time environment for monitoring and controlling program execution. This paper describes the overall design of the system and its implementation on the FLEX/32. Emphasis is placed on several novel aspects of the design: the use of a carefully defined virtual machine, programmer control of the mapping of virtual machine to actual hardware, forces for medium-granularity parallelism, and windows for parallel distribution of data. Some preliminary measurements of storage use are included.
Massively Parallel Computing: A Sandia Perspective
Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.
1999-05-06
The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
Improved chopper circuit uses parallel transistors
NASA Technical Reports Server (NTRS)
1966-01-01
Parallel transistor chopper circuit operates with one transistor in the forward mode and the other in the inverse mode. By using this method, it acts as a single, symmetrical, bidirectional transistor, and reduces and stabilizes the offset voltage.
Modulated heat pulse propagation and partial transport barriers in chaotic magnetic fields
del-Castillo-Negrete, Diego; Blazevski, Daniel
2016-04-01
Direct numerical simulations of the time dependent parallel heat transport equation modeling heat pulses driven by power modulation in 3-dimensional chaotic magnetic fields are presented. The numerical method is based on the Fourier formulation of a Lagrangian-Green's function method that provides an accurate and efficient technique for the solution of the parallel heat transport equation in the presence of harmonic power modulation. The numerical results presented provide conclusive evidence that even in the absence of magnetic flux surfaces, chaotic magnetic field configurations with intermediate levels of stochasticity exhibit transport barriers to modulated heat pulse propagation. In particular, high-order islands and remnants of destroyed flux surfaces (Cantori) act as partial barriers that slow down or even stop the propagation of heat waves at places where the magnetic field connection length exhibits a strong gradient. The key parameter ismore » $$\\gamma=\\sqrt{\\omega/2 \\chi_\\parallel}$$ that determines the length scale, $$1/\\gamma$$, of the heat wave penetration along the magnetic field line. For large perturbation frequencies, $$\\omega \\gg 1$$, or small parallel thermal conductivities, $$\\chi_\\parallel \\ll 1$$, parallel heat transport is strongly damped and the magnetic field partial barriers act as robust barriers where the heat wave amplitude vanishes and its phase speed slows down to a halt. On the other hand, in the limit of small $$\\gamma$$, parallel heat transport is largely unimpeded, global transport is observed and the radial amplitude and phase speed of the heat wave remain finite. Results on modulated heat pulse propagation in fully stochastic fields and across magnetic islands are also presented. In qualitative agreement with recent experiments in LHD and DIII-D, it is shown that the elliptic (O) and hyperbolic (X) points of magnetic islands have a direct impact on the spatio-temporal dependence of the amplitude and the time delay
Modulated heat pulse propagation and partial transport barriers in chaotic magnetic fields
del-Castillo-Negrete, Diego; Blazevski, Daniel
2016-04-01
Direct numerical simulations of the time dependent parallel heat transport equation modeling heat pulses driven by power modulation in 3-dimensional chaotic magnetic fields are presented. The numerical method is based on the Fourier formulation of a Lagrangian-Green's function method that provides an accurate and efficient technique for the solution of the parallel heat transport equation in the presence of harmonic power modulation. The numerical results presented provide conclusive evidence that even in the absence of magnetic flux surfaces, chaotic magnetic field configurations with intermediate levels of stochasticity exhibit transport barriers to modulated heat pulse propagation. In particular, high-order islands and remnants of destroyed flux surfaces (Cantori) act as partial barriers that slow down or even stop the propagation of heat waves at places where the magnetic field connection length exhibits a strong gradient. The key parameter is $\\gamma=\\sqrt{\\omega/2 \\chi_\\parallel}$ that determines the length scale, $1/\\gamma$, of the heat wave penetration along the magnetic field line. For large perturbation frequencies, $\\omega \\gg 1$, or small parallel thermal conductivities, $\\chi_\\parallel \\ll 1$, parallel heat transport is strongly damped and the magnetic field partial barriers act as robust barriers where the heat wave amplitude vanishes and its phase speed slows down to a halt. On the other hand, in the limit of small $\\gamma$, parallel heat transport is largely unimpeded, global transport is observed and the radial amplitude and phase speed of the heat wave remain finite. Results on modulated heat pulse propagation in fully stochastic fields and across magnetic islands are also presented. In qualitative agreement with recent experiments in LHD and DIII-D, it is shown that the elliptic (O) and hyperbolic (X) points of magnetic islands have a direct impact on the spatio-temporal dependence of the amplitude and the time delay of modulated heat
Molecular dynamics on hypercube parallel computers
NASA Astrophysics Data System (ADS)
Smith, W.
1991-03-01
The implementation of molecular dynamics on parallel computers is described, with particular reference to hypercube computers. Three particular algorithms are described: replicated data (RD); systolic loop (SLS-G), and parallelised link-cells (PLC), all of which have good load balancing. The performance characteristics of each algorithm and the factors affecting their scaling properties are discussed. The article is pedagogic in intent, to introduce a novice to the main aspects of parallel computing in molecular dynamics.
Graphics-Based Parallel Programming Tools
1991-09-01
more general context by implementing perspective views within the Voyeur system [121. Voyeur is a more conventional tool for displaying application...Varadaraju. Interfacing Belvedere with Voyeur . Master’s Thesis, COINS Department, University of Massachusetts (June 1991). 13 David Socha and Mary L...Bailey and David Notkin, " Voyeur : Graphi- cal Views of Parallel Programs", SIGPLAN Workshop on Parallel and Distributed Debugging, pp. 206-215 (1988). 14
Graphics-Based Parallel Programming Tools
1992-01-01
the Voyeur system [121. Voyeur is a more conventional tool for displaying application-specific visualizations of parallel programs [131 and it provides...Department. University of Massachusetts (June 1991). 13 David Socha and Mary L. Bailey and David Notkin. "’ Voyeur : Graphi- cal Views of Parallel Programs...Massachusetts (September 1991). Nandakumar Varadaraju. Interfacing Belvedere with Voyeur . Master’s The- sis. COINS Department. University of Massachusetts
Voyeur: Graphical Views of Parallel Programs
1988-04-01
visualization, parallel debugging, moni torino 20. ABSTRACT (Continue ci reveree, eide if necessary and Identify by block number) ~9 Voyeur is a prototype...PAGE Dh- eaa,:~ Voyeur : Graphical Views of Parallel Programs David Socha, Mary Bailey and David Notkin Department of Computer Science, FR-35 University...of Washington Seattle, Washington 98195 TR 88-04-03 April 1988 Voyeur is a prototype system that facilitates the construction of application-specific
Partitioning And Packing Equations For Parallel Processing
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.; Milner, Edward J.
1989-01-01
Algorithm developed to identify parallelism in set of coupled ordinary differential equations that describe physical system and to divide set into parallel computational paths, along with parts of solution proceeds independently of others during at least part of time. Path-identifying algorithm creates number of paths consisting of equations that must be computed serially and table that gives dependent and independent arguments and "can start," "can end," and "must end" times of each equation. "Must end" time used subsequently by packing algorithm.
Enhancing Scalability of Parallel Structured AMR Calculations
Wissink, A M; Hysom, D; Hornung, R D
2003-02-10
This paper discusses parallel scaling performance of large scale parallel structured adaptive mesh refinement (SAMR) calculations in SAMRAI. Previous work revealed that poor scaling qualities in the adaptive gridding operations in SAMR calculations cause them to become dominant for cases run on up to 512 processors. This work describes algorithms we have developed to enhance the efficiency of the adaptive gridding operations. Performance of the algorithms is evaluated for two adaptive benchmarks run on up 512 processors of an IBM SP system.
Joint Experimentation on Scalable Parallel Processors (JESPP)
2006-04-01
SCALABLE PARALLEL PROCESSORS (JESPP) 6. AUTHOR(S) Dan M. Davis, Robert F. Lucas, Ke-Thia Yao, Gene Wagenbreth 5. FUNDING NUMBERS C...List of Papers • Robert J. Graebener, Gregory Rafuse, Robert Miller & Ke-Thia Yao, “The Road to Successful Joint Experimentation Starts at the...2003. • Robert F. Lucas & Dan M. Davis, “Joint Experimentation on Scalable Parallel Processors“, Interservice/Industry Training, Simulation, and
LDV Measurement of Confined Parallel Jet Mixing
R.F. Kunz; S.W. D'Amico; P.F. Vassallo; M.A. Zaccaria
2001-01-31
Laser Doppler Velocimetry (LDV) measurements were taken in a confinement, bounded by two parallel walls, into which issues a row of parallel jets. Two-component measurements were taken of two mean velocity components and three Reynolds stress components. As observed in isolated three dimensional wall bounded jets, the transverse diffusion of the jets is quite large. The data indicate that this rapid mixing process is due to strong secondary flows, transport of large inlet intensities and Reynolds stress anisotropy effects.
A survey of parallel programming tools
NASA Technical Reports Server (NTRS)
Cheng, Doreen Y.
1991-01-01
This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
HOPSPACK: Hybrid Optimization Parallel Search Package.
Gray, Genetha Anne.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica L.
2008-12-01
In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4
Parallel algorithms for contour extraction and coding
NASA Astrophysics Data System (ADS)
Dinstein, Its'hak; Landau, Gad M.
1990-07-01
A parallel approach to contour extraction and coding on an Exclusive Read Exclusive Write (EREW) Parallel Random Access Machine (PRAM) is presented and analyzed. The algorithm is intended for binary images. The labeled contours can be represented by lists of coordinates, and/or chain codes, and/or any other user designed codes. Using O(n2/log n) processors, the algorithm runs in O(logn) time, where n by n is the size of the processed binary image.
Parallel and Distributed Computing Combinatorial Algorithms
1993-10-01
FUPNDKC %2,•, PARALLEL AND DISTRIBUTED COMPUTING COMBINATORIAL ALGORITHMS 6. AUTHOR(S) 2304/DS F49620-92-J-0125 DR. LEIGHTON 7 PERFORMING ORGANIZATION NAME...on several problems involving parallel and distributed computing and combinatorial optimization. This research is reported in the numerous papers that...network decom- position. In Proceedings of the Eleventh Annual ACM Symposium on Principles of Distributed Computing , August 1992. [15] B. Awerbuch, B
Computational electromagnetics and parallel dense matrix computations
Forsman, K.; Kettunen, L.; Gropp, W.; Levine, D.
1995-06-01
We present computational results using CORAL, a parallel, three-dimensional, nonlinear magnetostatic code based on a volume integral equation formulation. A key feature of CORAL is the ability to solve, in parallel, the large, dense systems of linear equations that are inherent in the use of integral equation methods. Using the Chameleon and PSLES libraries ensures portability and access to the latest linear algebra solution technology.
Fast Parallel Computation Of Multibody Dynamics
NASA Technical Reports Server (NTRS)
Fijany, Amir; Kwan, Gregory L.; Bagherzadeh, Nader
1996-01-01
Constraint-force algorithm fast, efficient, parallel-computation algorithm for solving forward dynamics problem of multibody system like robot arm or vehicle. Solves problem in minimum time proportional to log(N) by use of optimal number of processors proportional to N, where N is number of dynamical degrees of freedom: in this sense, constraint-force algorithm both time-optimal and processor-optimal parallel-processing algorithm.
Parallel computer methods for eigenvalue extraction
NASA Technical Reports Server (NTRS)
Akl, Fred
1988-01-01
A new numerical algorithm for the solution of large-order eigenproblems typically encountered in linear elastic finite element systems is presented. The architecture of parallel processing is used in the algorithm to achieve increased speed and efficiency of calculations. The algorithm is based on the frontal technique for the solution of linear simultaneous equations and the modified subspace eigenanalysis method for the solution of the eigenproblem. The advantages of this new algorithm in parallel computer architecture are discussed.
Parallel algorithms for unconstrained optimizations by multisplitting
He, Qing
1994-12-31
In this paper a new parallel iterative algorithm for unconstrained optimization using the idea of multisplitting is proposed. This algorithm uses the existing sequential algorithms without any parallelization. Some convergence and numerical results for this algorithm are presented. The experiments are performed on an Intel iPSC/860 Hyper Cube with 64 nodes. It is interesting that the sequential implementation on one node shows that if the problem is split properly, the algorithm converges much faster than one without splitting.
Computational electromagnetics and parallel dense matrix computations
Forsman, K.; Kettunen, L.; Gropp, W.
1995-12-01
We present computational results using CORAL, a parallel, three-dimensional, nonlinear magnetostatic code based on a volume integral equation formulation. A key feature of CORAL is the ability to solve, in parallel, the large, dense systems of linear equations that are inherent in the use of integral equation methods. Using the Chameleon and PSLES libraries ensures portability and access to the latest linear algebra solution technology.
A parallel PCG solver for MODFLOW.
Dong, Yanhui; Li, Guomin
2009-01-01
In order to simulate large-scale ground water flow problems more efficiently with MODFLOW, the OpenMP programming paradigm was used to parallelize the preconditioned conjugate-gradient (PCG) solver with in this study. Incremental parallelization, the significant advantage supported by OpenMP on a shared-memory computer, made the solver transit to a parallel program smoothly one block of code at a time. The parallel PCG solver, suitable for both MODFLOW-2000 and MODFLOW-2005, is verified using an 8-processor computer. Both the impact of compilers and different model domain sizes were considered in the numerical experiments. Based on the timing results, execution times using the parallel PCG solver are typically about 1.40 to 5.31 times faster than those using the serial one. In addition, the simulation results are the exact same as the original PCG solver, because the majority of serial codes were not changed. It is worth noting that this parallelizing approach reduces cost in terms of software maintenance because only a single source PCG solver code needs to be maintained in the MODFLOW source tree.
The Xyce Parallel Electronic Simulator - An Overview
HUTCHINSON,SCOTT A.; KEITER,ERIC R.; HOEKSTRA,ROBERT J.; WATTS,HERMAN A.; WATERS,ARLON J.; SCHELLS,REGINA L.; WIX,STEVEN D.
2000-12-08
The Xyce{trademark} Parallel Electronic Simulator has been written to support the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on providing the capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). In addition, they are providing improved performance for numerical kernels using state-of-the-art algorithms, support for modeling circuit phenomena at a variety of abstraction levels and using object-oriented and modern coding-practices that ensure the code will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows.
NASA Astrophysics Data System (ADS)
Laundy, David; Alcock, Simon G.; Alianelli, Lucia; Sutter, John P.; Sawhney, Kawal J. S.; Chubar, Oleg
2014-09-01
A full wave propagation of X-rays from source to sample at a storage ring beamline requires simulation of the electron beam source and optical elements in the beamline. The finite emittance source causes the appearance of partial coherence in the wave field. Consequently, the wavefront cannot be treated exactly with fully coherent wave propagation or fully incoherent ray tracing. We have used the wavefront code Synchrotron Radiation Workshop (SRW) to perform partially coherent wavefront propagation using a parallel computing cluster at the Diamond Light Source. Measured mirror profiles have been used to correct the wavefront for surface errors.
Partial coalescence of soap bubbles
NASA Astrophysics Data System (ADS)
Pucci, G.; Harris, D. M.; Bush, J. W. M.
2015-06-01
We present the results of an experimental investigation of the merger of a soap bubble with a planar soap film. When gently deposited onto a horizontal film, a bubble may interact with the underlying film in such a way as to decrease in size, leaving behind a smaller daughter bubble with approximately half the radius of its progenitor. The process repeats up to three times, with each partial coalescence event occurring over a time scale comparable to the inertial-capillary time. Our results are compared to the recent numerical simulations of Martin and Blanchette ["Simulations of surfactant effects on the dynamics of coalescing drops and bubbles," Phys. Fluids 27, 012103 (2015)] and to the coalescence cascade of droplets on a fluid bath.
Modeling Partial Attacks with Alloy
NASA Astrophysics Data System (ADS)
Lin, Amerson; Bond, Mike; Clulow, Jolyon
The automated and formal analysis of cryptographic primitives, security protocols and Application Programming Interfaces (APIs) up to date has been focused on discovering attacks that completely break the security of a system. However, there are attacks that do not immediately break a system but weaken the security sufficiently for the adversary. We term these attacks partial attacks and present the first methodology for the modeling and automated analysis of this genre of attacks by describing two approaches. The first approach reasons about entropy and was used to simulate and verify an attack on the ECB|ECB|OFB triple-mode DES block-cipher. The second approach reasons about possibility sets and was used to simulate and verify an attack on the personal identification number (PIN) derivation algorithm used in the IBM 4758 Common Cryptographic Architecture.
Partial Dynamical Symmetry in Molecules
NASA Astrophysics Data System (ADS)
Ping, Jia-Lun; Chen, Jin-Quan
1997-03-01
It is shown that any Hamiltonian involving only one- and two-bond interactions for a molecule withnbonds and having a point groupPas its symmetry group may have theSn⊃Ppartial dynamical symmetry, i.e., the Hamiltonian can be solved analytically for a part of the states, called the unique states. For example, theXY6molecule has theS6⊃Ohpartial dynamical symmetry. The model of Iachello and Oss forncoupled anharmonic oscillators is revisited in terms of the partial dynamical symmetry. The energies are obtained analytically for the nine unique levels of theXY6molecule and the structures of the eigenstates are disclosed for the first time, while for non-unique states they are obtained by diagonalizing the Hamiltonian in theS6⊃Ohsymmetry adapted basis with greatly reduced dimension.
Partial masslessness and conformal gravity
NASA Astrophysics Data System (ADS)
Deser, S.; Joung, E.; Waldron, A.
2013-05-01
We use conformal, but ghostful, Weyl gravity to study its ghost-free, second derivative, partially massless (PM) spin-2 component in the presence of Einstein gravity with positive cosmological constant. Specifically, we consider both gravitational- and self-interactions of PM via the fully nonlinear factorization of conformal gravity’s Bach tensor into Einstein times Schouten operators. We find that extending PM beyond linear order suffers from familiar higher spin consistency obstructions: it propagates only in Einstein backgrounds, and the conformal gravity route generates only the usual safe, Noether, cubic order vertices. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Higher spin theories and holography’.
Witte H.; Plate, S
2013-05-03
The international Muon Ionization Cooling Experiment (MICE) is a large scale experiment which is presently assembled at the Rutherford Appleton Laboratory in Didcot, UK. The purpose of MICE is to demonstrate the concept of ionization cooling experimentally. Ionization cooling is an important accelerator concept which will be essential for future HEP experiments such as a potential Muon Collider or a Neutrino Factory. The MICE experiment will house up to 18 superconducting solenoids, all of which produce a substantial amount of magnetic flux. Recently it was realized that this magnetic flux leads to a considerable stray magnetic field in the MICE hall. This is a concern as technical equipment in the MICE hall may may be compromised by this. In July 2012 a concept called partial return yoke was presented to the MICE community, which reduces the stray field in the MICE hall to a safe level. This report summarizes the general concept, engineering considerations and the expected shielding performance.
Start/Pat; A parallel-programming toolkit
Appelbe, B.; Smith, K. ); McDowell, C. )
1989-07-01
How can you make Fortran code parallel without isolating the programmer from learning to understand and exploit parallelism effectively. With an interactive toolkit that automates parallelization as it educates. This paper discusses the Start/Pat toolkit.
Partial Synchronization of Interconnected Boolean Networks.
Chen, Hongwei; Liang, Jinling; Lu, Jianquan
2017-01-01
This paper addresses the partial synchronization problem for the interconnected Boolean networks (BNs) via the semi-tensor product (STP) of matrices. First, based on an algebraic state space representation of BNs, a necessary and sufficient criterion is presented to ensure the partial synchronization of the interconnected BNs. Second, by defining an induced digraph of the partial synchronized states set, an equivalent graphical description for the partial synchronization of the interconnected BNs is established. Consequently, the second partial synchronization criterion is derived in terms of adjacency matrix of the induced digraph. Finally, two examples (including an epigenetic model) are provided to illustrate the efficiency of the obtained results.
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi
1989-01-01
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
Parallel phase model : a programming model for high-end parallel machines with manycores.
Wu, Junfeng; Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian
2009-04-01
This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
Parallel PWMs Based Fully Digital Transmitter with Wide Carrier Frequency Range
Zhou, Bo; Zhang, Kun; Zhou, Wenbiao; Zhang, Yanjun; Liu, Dake
2013-01-01
The carrier-frequency (CF) and intermediate-frequency (IF) pulse-width modulators (PWMs) based on delay lines are proposed, where baseband signals are conveyed by both positions and pulse widths or densities of the carrier clock. By combining IF-PWM and precorrected CF-PWM, a fully digital transmitter with unit-delay autocalibration is implemented in 180 nm CMOS for high reconfiguration. The proposed architecture achieves wide CF range of 2 M–1 GHz, high power efficiency of 70%, and low error vector magnitude (EVM) of 3%, with spectrum purity of 20 dB optimized in comparison to the existing designs. PMID:24223503
High-performance parallel image reconstruction for the New Vacuum Solar Telescope
NASA Astrophysics Data System (ADS)
Li, Xue-Bao; Liu, Zhong; Wang, Feng; Jin, Zhen-Yu; Xiang, Yong-Yuan; Zheng, Yan-Fang
2015-06-01
Many technologies have been developed to help improve spatial resolution of observational images for ground-based solar telescopes, such as adaptive optics (AO) systems and post-processing reconstruction. As any AO system correction is only partial, it is indispensable to use post-processing reconstruction techniques. In the New Vacuum Solar Telescope (NVST), a speckle-masking method is used to achieve the diffraction-limited resolution of the telescope. Although the method is very promising, the computation is quite intensive, and the amount of data is tremendous, requiring several months to reconstruct observational data of one day on a high-end computer. To accelerate image reconstruction, we parallelize the program package on a high-performance cluster. We describe parallel implementation details for several reconstruction procedures. The code is written in the C language using the Message Passing Interface (MPI) and is optimized for parallel processing in a multiprocessor environment. We show the excellent performance of parallel implementation, and the whole data processing speed is about 71 times faster than before. Finally, we analyze the scalability of the code to find possible bottlenecks, and propose several ways to further improve the parallel performance. We conclude that the presented program is capable of executing reconstruction applications in real-time at NVST.
An object-oriented approach for parallel self adaptive mesh refinement on block structured grids
NASA Technical Reports Server (NTRS)
Lemke, Max; Witsch, Kristian; Quinlan, Daniel
1993-01-01
Self-adaptive mesh refinement dynamically matches the computational demands of a solver for partial differential equations to the activity in the application's domain. In this paper we present two C++ class libraries, P++ and AMR++, which significantly simplify the development of sophisticated adaptive mesh refinement codes on (massively) parallel distributed memory architectures. The development is based on our previous research in this area. The C++ class libraries provide abstractions to separate the issues of developing parallel adaptive mesh refinement applications into those of parallelism, abstracted by P++, and adaptive mesh refinement, abstracted by AMR++. P++ is a parallel array class library to permit efficient development of architecture independent codes for structured grid applications, and AMR++ provides support for self-adaptive mesh refinement on block-structured grids of rectangular non-overlapping blocks. Using these libraries, the application programmers' work is greatly simplified to primarily specifying the serial single grid application and obtaining the parallel and self-adaptive mesh refinement code with minimal effort. Initial results for simple singular perturbation problems solved by self-adaptive multilevel techniques (FAC, AFAC), being implemented on the basis of prototypes of the P++/AMR++ environment, are presented. Singular perturbation problems frequently arise in large applications, e.g. in the area of computational fluid dynamics. They usually have solutions with layers which require adaptive mesh refinement and fast basic solvers in order to be resolved efficiently.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-10-11
... AGENCY 40 CFR Part 52 Partial Approval and Partial Disapproval of Air Quality Implementation Plans for... partially disapprove revisions to the State Implementation Plans (SIPs) for Florida, Mississippi, and South... address the 2006 PM 2.5 NAAQS or any requirements related to that NAAQS. Today's partial disapproval...
Federal Register 2010, 2011, 2012, 2013, 2014
2011-01-26
... AGENCY 40 CFR Part 52 RIN 2060-AQ66 Determinations Concerning Need for Error Correction, Partial Approval and Partial Disapproval, and Federal Implementation Plan Regarding Texas Prevention of Significant... Determination Concerning the Need for Error Correction, Partial Approval and Partial Disapproval, and...
An observational and thermodynamic investigation of carbonate partial melting
NASA Astrophysics Data System (ADS)
Floess, David; Baumgartner, Lukas P.; Vonlanthen, Pierre
2015-01-01
Melting experiments available in the literature show that carbonates and pelites melt at similar conditions in the crust. While partial melting of pelitic rocks is common and well-documented, reports of partial melting in carbonates are rare and ambiguous, mainly because of intensive recrystallization and the resulting lack of criteria for unequivocal identification of melting. Here we present microstructural, textural, and geochemical evidence for partial melting of calcareous dolomite marbles in the contact aureole of the Tertiary Adamello Batholith. Petrographic observations and X-ray micro-computed tomography (X-ray μCT) show that calcite crystallized either in cm- to dm-scale melt pockets, or as an interstitial phase forming an interconnected network between dolomite grains. Calcite-dolomite thermometry yields a temperature of at least 670 °C, which is well above the minimum melting temperature of ∼600 °C reported for the CaO-MgO-CO2-H2O system. Rare-earth element (REE) partition coefficients (KDcc/do) range between 9-35 for adjacent calcite-dolomite pairs. These KD values are 3-10 times higher than equilibrium values between dolomite and calcite reported in the literature. They suggest partitioning of incompatible elements into a melt phase. The δ18O and δ13C isotopic values of calcite and dolomite support this interpretation. Crystallographic orientations measured by electron backscattered diffraction (EBSD) show a clustering of c-axes for dolomite and interstitial calcite normal to the foliation plane, a typical feature for compressional deformation, whereas calcite crystallized in pockets shows a strong clustering of c-axes parallel to the pocket walls, suggesting that it crystallized after deformation had stopped. All this together suggests the formation of partial melts in these carbonates. A Schreinemaker analysis of the experimental data for a CO2-H2O fluid-saturated system indeed predicts formation of calcite-rich melt between 650-880 °C, in
PISCES: An environment for parallel scientific computation
NASA Technical Reports Server (NTRS)
Pratt, T. W.
1985-01-01
The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Equalizer: a scalable parallel rendering framework.
Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato
2009-01-01
Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
A parallel algorithm for random searches
NASA Astrophysics Data System (ADS)
Wosniack, M. E.; Raposo, E. P.; Viswanathan, G. M.; da Luz, M. G. E.
2015-11-01
We discuss a parallelization procedure for a two-dimensional random search of a single individual, a typical sequential process. To assure the same features of the sequential random search in the parallel version, we analyze the former spatial patterns of the encountered targets for different search strategies and densities of homogeneously distributed targets. We identify a lognormal tendency for the distribution of distances between consecutively detected targets. Then, by assigning the distinct mean and standard deviation of this distribution for each corresponding configuration in the parallel simulations (constituted by parallel random walkers), we are able to recover important statistical properties, e.g., the target detection efficiency, of the original problem. The proposed parallel approach presents a speedup of nearly one order of magnitude compared with the sequential implementation. This algorithm can be easily adapted to different instances, as searches in three dimensions. Its possible range of applicability covers problems in areas as diverse as automated computer searchers in high-capacity databases and animal foraging.
Iteration schemes for parallelizing models of superconductivity
Gray, P.A.
1996-12-31
The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.
IMPAIR: massively parallel deconvolution on the GPU
NASA Astrophysics Data System (ADS)
Sherry, Michael; Shearer, Andy
2013-02-01
The IMPAIR software is a high throughput image deconvolution tool for processing large out-of-core datasets of images, varying from large images with spatially varying PSFs to large numbers of images with spatially invariant PSFs. IMPAIR implements a parallel version of the tried and tested Richardson-Lucy deconvolution algorithm regularised via a custom wavelet thresholding library. It exploits the inherently parallel nature of the convolution operation to achieve quality results on consumer grade hardware: through the NVIDIA Tesla GPU implementation, the multi-core OpenMP implementation, and the cluster computing MPI implementation of the software. IMPAIR aims to address the problem of parallel processing in both top-down and bottom-up approaches: by managing the input data at the image level, and by managing the execution at the instruction level. These combined techniques will lead to a scalable solution with minimal resource consumption and maximal load balancing. IMPAIR is being developed as both a stand-alone tool for image processing, and as a library which can be embedded into non-parallel code to transparently provide parallel high throughput deconvolution.
Toward a science of parallel computation
Worlton, W.J.
1986-01-01
The evolution of parallel processing over the past several decades can be viewed as the development of a new scientific discipline. Parallel processing has been, and is, undergoing the same evolutionary stages that are common to the development of scientific disciplines in general: exploration, focusing, and maturity. That parallel processing is not yet a science can readily be appreciated by its lack of some of the characteristics typical of mature sciences, such as prescriptive terminology, comprehensive taxonomies, and authoritative fundamental principles. A great deal of outstanding work has been done and the field is experiencing the beginnings of its ''focusing'' phase, i.e., support is being concentrated in a set of the more promising approaches selected from among the larger set of exploratory projects. However, the possible set of parallel-processing concepts is so extensive that exploratory work will probably continue for one or two more decades. In the meantime, the growing maturity of the field will be reflected in the increasing clarity and precision of the terminology, the development of systematic classification of the domain of discourse, the development of basic principles, and the growing number of commercial products that are the outcome of the research and development projects on which support is being focused. In this paper we develop some generalizations of taxonomies and use basic principles to draw conclusions about the extensibility of parallel processor architectures. 7 refs., 5 figs., 2 tabs.
Linear Bregman algorithm implemented in parallel GPU
NASA Astrophysics Data System (ADS)
Li, Pengyan; Ke, Jue; Sui, Dong; Wei, Ping
2015-08-01
At present, most compressed sensing (CS) algorithms have poor converging speed, thus are difficult to run on PC. To deal with this issue, we use a parallel GPU, to implement a broadly used compressed sensing algorithm, the Linear Bregman algorithm. Linear iterative Bregman algorithm is a reconstruction algorithm proposed by Osher and Cai. Compared with other CS reconstruction algorithms, the linear Bregman algorithm only involves the vector and matrix multiplication and thresholding operation, and is simpler and more efficient for programming. We use C as a development language and adopt CUDA (Compute Unified Device Architecture) as parallel computing architectures. In this paper, we compared the parallel Bregman algorithm with traditional CPU realized Bregaman algorithm. In addition, we also compared the parallel Bregman algorithm with other CS reconstruction algorithms, such as OMP and TwIST algorithms. Compared with these two algorithms, the result of this paper shows that, the parallel Bregman algorithm needs shorter time, and thus is more convenient for real-time object reconstruction, which is important to people's fast growing demand to information technology.
A feasibility study of multiplexing parallel beam.
Ma, Jiayi; Zhao, Jingwu; Shi, Xiaodong; Huang, Runshen
2013-05-01
Single-photon emission computed tomography (SPECT) is a suitable tool for clinically localizing deep-sited tumors; SPECT with high spatial resolution has the ability to localize deep-sited tumors precisely. However, because of its poor sensitivity, in China SPECT now only plays a complementary role. To improve the sensitivity of the parallel beam collimator mainly used in China, a multiplexing parallel beam collimator is proposed, which can improve sensitivity while maintaining higher spatial resolution by using theoretical prediction and Monte Carlo simulation. The improved sensitivity-to-spatial resolution ratio has an optimal value. In addition, a set of gamma ray channels, introduced only in the transverse direction, did not have any effect in the axial direction. In the transverse direction, the projection data are the sum of the parallel beam and two oblique parallel beams. From visual assessment obtained using computer simulations with equal sensitivity, the reconstructed image at deep-sited was noticeably better than that with the high sensitivity parallel beam.
Relative Debugging of Automatically Parallelized Programs
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)
2002-01-01
We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Support for Debugging Automatically Parallelized Programs
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)
2001-01-01
We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
A parallel adaptive mesh refinement algorithm
NASA Technical Reports Server (NTRS)
Quirk, James J.; Hanebutte, Ulf R.
1993-01-01
Over recent years, Adaptive Mesh Refinement (AMR) algorithms which dynamically match the local resolution of the computational grid to the numerical solution being sought have emerged as powerful tools for solving problems that contain disparate length and time scales. In particular, several workers have demonstrated the effectiveness of employing an adaptive, block-structured hierarchical grid system for simulations of complex shock wave phenomena. Unfortunately, from the parallel algorithm developer's viewpoint, this class of scheme is quite involved; these schemes cannot be distilled down to a small kernel upon which various parallelizing strategies may be tested. However, because of their block-structured nature such schemes are inherently parallel, so all is not lost. In this paper we describe the method by which Quirk's AMR algorithm has been parallelized. This method is built upon just a few simple message passing routines and so it may be implemented across a broad class of MIMD machines. Moreover, the method of parallelization is such that the original serial code is left virtually intact, and so we are left with just a single product to support. The importance of this fact should not be underestimated given the size and complexity of the original algorithm.
Partial fasciectomy for Dupuytren's contractures.
Mavrogenis, Andreas F; Spyridonos, Sarantis G; Ignatiadis, Ioannis A; Antonopoulos, Dimitrios; Papagelopoulos, Panayiotis J
2009-01-01
One hundred ninety-six patients with Dupuytren's contractures were treated by partial fasciectomy and adequate postoperative rehabilitation. All patients had flexion contracture of the proximal interphalangeal joint of >20 degrees ; 93 patients had flexion contracture of the associated metacarpophalangeal joint of >30 degrees ; 143 patients had risk factors for Dupuytren's disease. Primary skin closure and splinting were done in all patients. Range of motion was begun by the 1st week. Splinting was discontinued by the 2nd week, followed by night-time splinting until the 8th week. The mean follow-up was 6.6 years (range, 2-9 years). At the latest examination, 72.5% of the patients had complete range of motion of the metacarpophalangeal and proximal interphalangeal joints; 20.2% had 5 degrees -10 degrees of extension deficit and 7.3% had recurrent contractures of >20 degrees at the proximal interphalangeal joint and were subjected to reoperation. Complications included digital neurovascular injury in 5%, complex regional pain syndrome in 10.1%, and wound-healing problems and superficial infections in 15.1%.
Channeled partial Mueller matrix polarimetry
NASA Astrophysics Data System (ADS)
Alenin, Andrey S.; Tyo, J. S.
2015-09-01
In prior work,1,2 we introduced methods to treat channeled systems in a way that is similar to Data Reduction Method (DRM), by focusing attention on the Fourier content of the measurement conditions. Introduction of Q enabled us to more readily extract the performance of the system and thereby optimize it to obtain reconstruction with the least noise. The analysis tools developed for that exercise can be expanded to be applicable to partial Mueller Matrix Polarimeters (pMMPs), which were a topic of prior discussion as well. In this treatment, we combine the principles involved in both of those research trajectories and identify a set of channeled pMMP families. As a result, the measurement structure of such systems is completely known and the design of a channeled pMMP intended for any given task becomes a search over a finite set of possibilities, with the additional channel rotation allowing for a more desirable Mueller element mixing.
The future of partial nephrectomy.
Malthouse, Theo; Kasivisvanathan, Veeru; Raison, Nicholas; Lam, Wayne; Challacombe, Ben
2016-12-01
Innovation in recent times has accelerated due to factors such as the globalization of communication; but there are also more barriers/safeguards in place than ever before as we strive to streamline this process. From the first planned partial nephrectomy completed in 1887, it took over a century to become recommended practice for small renal tumours. At present, identified areas for improvement/innovation are 1) to preserve renal parenchyma, 2) to optimise pre-operative eGFR and 3) to reduce global warm ischaemia time. All 3 of these, are statistically significant predictors of post-operative renal function. Urologists, have a proud history of embracing innovation & have experimented with different clamping techniques of the renal vasculature, image guidance in robotics, renal hypothermia, lasers and new robots under development. The DaVinci model may soon no longer have a monopoly on this market, as it loses its stranglehold with novel technology emerging including added features, such as haptic feedback with reduced costs. As ever, our predictions of the future may well fall wide of the mark, but in order to progress, one must open the mind to the possibilities that already exist, as evolution of existing technology often appears to be a revolution in hindsight.
Simulating Billion-Task Parallel Programs
Perumalla, Kalyan S; Park, Alfred J
2014-01-01
In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.
A Parallel Rendering Algorithm for MIMD Architectures
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.; Orloff, Tobias
1991-01-01
Applications such as animation and scientific visualization demand high performance rendering of complex three dimensional scenes. To deliver the necessary rendering rates, highly parallel hardware architectures are required. The challenge is then to design algorithms and software which effectively use the hardware parallelism. A rendering algorithm targeted to distributed memory MIMD architectures is described. For maximum performance, the algorithm exploits both object-level and pixel-level parallelism. The behavior of the algorithm is examined both analytically and experimentally. Its performance for large numbers of processors is found to be limited primarily by communication overheads. An experimental implementation for the Intel iPSC/860 shows increasing performance from 1 to 128 processors across a wide range of scene complexities. It is shown that minimal modifications to the algorithm will adapt it for use on shared memory architectures as well.
Improved CDMA Performance Using Parallel Interference Cancellation
NASA Technical Reports Server (NTRS)
Simon, Marvin; Divsalar, Dariush
1995-01-01
This report considers a general parallel interference cancellation scheme that significantly reduces the degradation effect of user interference but with a lesser implementation complexity than the maximum-likelihood technique. The scheme operates on the fact that parallel processing simultaneously removes from each user the interference produced by the remaining users accessing the channel in an amount proportional to their reliability. The parallel processing can be done in multiple stages. The proposed scheme uses tentative decision devices with different optimum thresholds at the multiple stages to produce the most reliably received data for generation and cancellation of user interference. The 1-stage interference cancellation is analyzed for three types of tentative decision devices, namely, hard, null zone, and soft decision, and two types of user power distribution, namely, equal and unequal powers. Simulation results are given for a multitude of different situations, in particular, those cases for which the analysis is too complex.
Single-agent parallel window search
NASA Technical Reports Server (NTRS)
Powley, Curt; Korf, Richard E.
1991-01-01
Parallel window search is applied to single-agent problems by having different processes simultaneously perform iterations of Iterative-Deepening-A(asterisk) (IDA-asterisk) on the same problem but with different cost thresholds. This approach is limited by the time to perform the goal iteration. To overcome this disadvantage, the authors consider node ordering. They discuss how global node ordering by minimum h among nodes with equal f = g + h values can reduce the time complexity of serial IDA-asterisk by reducing the time to perform the iterations prior to the goal iteration. Finally, the two ideas of parallel window search and node ordering are combined to eliminate the weaknesses of each approach while retaining the strengths. The resulting approach, called simply parallel window search, can be used to find a near-optimal solution quickly, improve the solution until it is optimal, and then finally guarantee optimality, depending on the amount of time available.
On the parallelization of molecular dynamics codes
NASA Astrophysics Data System (ADS)
Trabado, G. P.; Plata, O.; Zapata, E. L.
2002-08-01
Molecular dynamics (MD) codes present a high degree of spatial data locality and a significant amount of independent computations. However, most of the parallelization strategies are usually based on the manual transformation of sequential programs either by completely rewriting the code with message passing routines or using specific libraries intended for writing new MD programs. In this paper we propose a new library-based approach (DDLY) which supports parallelization of existing short-range MD sequential codes. The novelty of this approach is that it can directly handle the distribution of common data structures used in MD codes to represent data (arrays, Verlet lists, link cells), using domain decomposition. Thus, the insertion of run-time support for distribution and communication in a MD program does not imply significant changes to its structure. The method is simple, efficient and portable. It may be also used to extend existing parallel programming languages, such as HPF.
New parallel SOR method by domain partitioning
Xie, D.; Adams, L.
1999-07-01
In this paper the authors propose and analyze a new parallel SOR method, the PSOR method, formulated by using domain partitioning and interprocessor data communication techniques. They prove that the PSOR method has the same asymptotic rate of convergence as the Red/Black (R/B) SOR method for the five-point stencil on both strip and block partitions, and as the four-color (R/B/G/O) SOR method for the nine-point stencil on strip partitions. They also demonstrate the parallel performance of the PSOR method on four different MIMD multiprocessors (a KSR1, an Intel Delta, a Paragon, and an IBM SP2). Finally, they compare the parallel performance of PSOR, R/B SOR, and R/B/G/O SOR. Numerical results on the Paragon indicate that PSOR is more efficient than R/B SOR and R/B/G/O SOR in both computation and interprocessor data communication.
Parallel Harness for Informatic Stream Hashing
Steve Plimpton, Tim Shead
2012-09-11
PHISH is a lightweight framework which a set of independent processes can use to exchange data as they run on the same desktop machine, on processors of a parallel machine, or on different machines across a network. This enables them to work in a coordinated parallel fashion to perform computations on either streaming, archived, or self-generated data. The PHISH distribution includes a simple, portable library for performing data exchanges in useful patterns either via MPI message-passing or ZMQ sockets. PHISH input scripts are used to describe a data-processing algorithm, and additional tools provided in the PHISH distribution convert the script into a form that can be launched as a parallel job.
Optimal expression evaluation for data parallel architectures
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1991-01-01
A data parallel machine represents an array or other composits data structure by allocating one processor per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum cost way to evaluate an expression, for several different data parallel architectures. The algorithm applies to any architecture in which the metric describing the cost of moving an array has a property called robustness. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes.
Optimal expression evaluation for data parallel architectures
NASA Technical Reports Server (NTRS)
Gilbert, J. R.; Schreiber, R.
1990-01-01
A data parallel machine represents an array or other composite data structure by allocating one processor per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum cost way to evaluate an expression, for several different data parallel architectures. The algorithm applies to any architecture in which the metric describing the cost of moving an array has a property called robustness. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes.
Globality and speed of optical parallel processors.
Lohmann, A W; Marathay, A S
1989-09-15
The chances of optical computing are probably best if a large number of processing elements act in parallel. The efficiency of parallel processors depends, among other things, on the time it takes to communicate signals from one processor to any other processor. In an optical parallel processor one hopes to be able to transmit a signal from one processor to any other processor within only one cycle period, no matter how far apart the processors are. Such a global communications network is desirable especially for algorithms with global interactions. The fast Fourier algorithm is an example. We define a degree of globality and we show how speed and globality are related. Our result applies to a specific architecture based on spatial filtering.
PADRE: a parallel asynchronous data routing environment
Gunney, B; Quinlan, D
2001-01-08
Increasingly in industry, software design and implementation is object-oriented, developed in C++ or Java, and relies heavily on pre-existing software libraries (e.g. the Microsoft Foundation Classes for C++, the Java API for Java). A similar but more tentative trend is developing in high-performance parallel scientific computing. The transition from serial to parallel application development considerably increases the need for library support: task creation and management, data distribution and dynamic redistribution, and inter-process and inter-processor communication and synchronization must be supported. PADRE is a library to support the interoperability of parallel applications. We feel there is significant need for just such a tool to compliment the many domain-specific application frameworks presently available today, but which are generally not interoperable.
Parallelization of the Lagrangian Particle Dispersion Model
Buckley, R.L.; O`Steen, B.L.
1997-08-01
An advanced stochastic Lagrangian Particle Dispersion Model (LPDM) is used by the Atmospheric Technologies Group (ATG) to simulate contaminant transport. The model uses time-dependent three-dimensional fields of wind and turbulence to determine the location of individual particles released into the atmosphere. This report describes modifications to LPDM using the Message Passing Interface (MPI) which allows for execution in a parallel configuration on the Cray Supercomputer facility at the SRS. Use of a parallel version allows for many more particles to be released in a given simulation, with little or no increase in computational time. This significantly lowers (greater than an order of magnitude) the minimum resolvable concentration levels without ad hoc averaging schemes and/or without reducing spatial resolution. The general changes made to LPDM are discussed and a series of tests are performed comparing the serial (single processor) and parallel versions of the code.
Extending HPF for advanced data parallel applications
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Zima, Hans
1994-01-01
The stated goal of High Performance Fortran (HPF) was to 'address the problems of writing data parallel programs where the distribution of data affects performance'. After examining the current version of the language we are led to the conclusion that HPF has not fully achieved this goal. While the basic distribution functions offered by the language - regular block, cyclic, and block cyclic distributions - can support regular numerical algorithms, advanced applications such as particle-in-cell codes or unstructured mesh solvers cannot be expressed adequately. We believe that this is a major weakness of HPF, significantly reducing its chances of becoming accepted in the numeric community. The paper discusses the data distribution and alignment issues in detail, points out some flaws in the basic language, and outlines possible future paths of development. Furthermore, we briefly deal with the issue of task parallelism and its integration with the data parallel paradigm of HPF.
National Combustion Code Parallel Performance Enhancements
NASA Technical Reports Server (NTRS)
Quealy, Angela; Benyo, Theresa (Technical Monitor)
2002-01-01
The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. The unstructured grid, reacting flow code uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC code to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This report describes recent parallel processing modifications to NCC that have improved the parallel scalability of the code, enabling a two hour turnaround for a 1.3 million element fully reacting combustion simulation on an SGI Origin 2000.
Computing association probabilities using parallel Boltzmann machines.
Iltis, R A; Ting, P Y
1993-01-01
A new computational method is presented for solving the data association problem using parallel Boltzmann machines. It is shown that the association probabilities can be computed with arbitrarily small errors if a sufficient number of parallel Boltzmann machines are available. The probability beta(i)(j) that the i th measurement emanated from the jth target can be obtained simply by observing the relative frequency with which neuron v(i,j) in a two-dimensional network is on throughout the layers. Some simple tracking examples comparing the performance of the Boltzmann algorithm to the exact data association solution and with the performance of an alternative parallel method using the Hopfield neural network are also presented.
ERIC Educational Resources Information Center
Bluemel, Brody
2014-01-01
This article illustrates the pedagogical value of incorporating parallel corpora in foreign language education. It explores the development of a Chinese/English parallel corpus designed specifically for pedagogical application. The corpus tool was created to aid language learners in reading comprehension and writing development by making foreign…
Flow of a Rarefied Gas between Parallel and Almost Parallel Plates
2005-07-13
Flow of a Rarefied Gas between Parallel and Almost Parallel Plates Carlo Cercignani, Maria Lampis and Silvia Lorenzani Dipartimento di Matematica ...UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Dipartimento di Matematica , Politecnico di Milano, Milano, Italy 20133 8. PERFORMING
Aerodynamic simulation on massively parallel systems
NASA Technical Reports Server (NTRS)
Haeuser, Jochem; Simon, Horst D.
1992-01-01
This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2013-11-12
Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.