Science.gov

Sample records for parallel blade-vortex interaction

  1. Vortex dynamics during blade-vortex interactions

    NASA Astrophysics Data System (ADS)

    Peng, Di; Gregory, James W.

    2015-05-01

    Vortex dynamics during parallel blade-vortex interactions (BVIs) were investigated in a subsonic wind tunnel using particle image velocimetry (PIV). Vortices were generated by applying a rapid pitch-up motion to an airfoil through a pneumatic system, and the subsequent interactions with a downstream, unloaded target airfoil were studied. The blade-vortex interactions may be classified into three categories in terms of vortex behavior: close interaction, very close interaction, and collision. For each type of interaction, the vortex trajectory and strength variation were obtained from phase-averaged PIV data. The PIV results revealed the mechanisms of vortex decay and the effects of several key parameters on vortex dynamics, including separation distance (h/c), Reynolds number, and vortex sense. Generally, BVI has two main stages: interaction between vortex and leading edge (vortex-LE interaction) and interaction between vortex and boundary layer (vortex-BL interaction). Vortex-LE interaction, with its small separation distance, is dominated by inviscid decay of vortex strength due to pressure gradients near the leading edge. Therefore, the decay rate is determined by separation distance and vortex strength, but it is relatively insensitive to Reynolds number. Vortex-LE interaction will become a viscous-type interaction if there is enough separation distance. Vortex-BL interaction is inherently dominated by viscous effects, so the decay rate is dependent on Reynolds number. Vortex sense also has great impact on vortex-BL interaction because it changes the velocity field and shear stress near the surface.

  2. An adaptive mesh method for the simulation of Blade Vortex Interaction 

    E-print Network

    Kim, Kyu-Sup

    1998-01-01

    An adaptive mesh method for the simulation of parallel ics. Blade Vortex Interaction (BV1) with an active Trailing Edge Flap (TEF) is presented. The two-dimensional 1111-steady problem is solved by a higher order upwind Euler method...

  3. Rotating hot-wire investigation of the vortex responsible for blade-vortex interaction noise

    NASA Technical Reports Server (NTRS)

    Fontana, Richard Remo

    1988-01-01

    This distribution of the circumferential velocity of the vortex responsible for blade-vortex interaction noise was measured using a rotating hot-wire rake synchronously meshed with a model helicopter rotor at the blade passage frequency. Simultaneous far-field acoustic data and blade differential pressure measurements were obtained. Results show that the shape of the measured far-field acoustic blade-vortex interaction signature depends on the blade-vortex interaction geometry. The experimental results are compared with the Widnall-Wolf model for blade-vortex interaction noise.

  4. Rotor blade system with reduced blade-vortex interaction noise

    NASA Technical Reports Server (NTRS)

    Leishman, John G. (Inventor); Han, Yong Oun (Inventor)

    2005-01-01

    A rotor blade system with reduced blade-vortex interaction noise includes a plurality of tube members embedded in proximity to a tip of each rotor blade. The inlets of the tube members are arrayed at the leading edge of the blade slightly above the chord plane, while the outlets are arrayed at the blade tip face. Such a design rapidly diffuses the vorticity contained within the concentrated tip vortex because of enhanced flow mixing in the inner core, which prevents the development of a laminar core region.

  5. A Novel Method for Reducing Rotor Blade-Vortex Interaction

    NASA Technical Reports Server (NTRS)

    Glinka, A. T.

    2000-01-01

    One of the major hindrances to expansion of the rotorcraft market is the high-amplitude noise they produce, especially during low-speed descent, where blade-vortex interactions frequently occur. In an attempt to reduce the noise levels caused by blade-vortex interactions, the flip-tip rotor blade concept was devised. The flip-tip rotor increases the miss distance between the shed vortices and the rotor blades, reducing BVI noise. The distance is increased by rotating an outboard portion of the rotor tip either up or down depending on the flight condition. The proposed plan for the grant consisted of a computational simulation of the rotor aerodynamics and its wake geometry to determine the effectiveness of the concept, coupled with a series of wind tunnel experiments exploring the value of the device and validating the computer model. The computational model did in fact show that the miss distance could be increased, giving a measure of the effectiveness of the flip-tip rotor. However, the wind experiments were not able to be conducted. Increased outside demand for the 7'x lO' wind tunnel at NASA Ames and low priority at Ames for this project forced numerous postponements of the tests, eventually pushing the tests beyond the life of the grant. A design for the rotor blades to be tested in the wind tunnel was completed and an analysis of the strength of the model blades based on predicted loads, including dynamic forces, was done.

  6. A parametric study of transonic blade-vortex interaction noise

    NASA Technical Reports Server (NTRS)

    Lyrintzis, A. S.

    1991-01-01

    Several parameters of transonic blade-vortex interactions (BVI) are being studied and some ideas for noise reduction are introduced and tested using numerical simulation. The model used is the two-dimensional high frequency transonic small disturbance equation with regions of distributed vorticity (VTRAN2 code). The far-field noise signals are obtained by using the Kirchhoff method with extends the numerical 2-D near-field aerodynamic results to the linear acoustic 3-D far-field. The BVI noise mechanisms are explained and the effects of vortex type and strength, and angle of attack are studied. Particularly, airfoil shape modifications which lead to noise reduction are investigated. The results presented are expected to be helpful for better understanding of the nature of the BVI noise and better blade design.

  7. Effects of trailing edge flap dynamic deployment on blade-vortex interactions 

    E-print Network

    Nelson, Carter T.

    1997-01-01

    A theoretical and experimental investigation is undertaken to determine the effects of an actively deployable trailing edge flap on the disturbances created during blade-vortex interactions (BVI). The theoretical model consists of an unsteady panel...

  8. Transonic blade-vortex interactions noise: A parametric study

    NASA Technical Reports Server (NTRS)

    Lyrintzis, A. S.; Xue, Y.

    1990-01-01

    Transonic Blade-Vortex Interactions (BVI) are simulated numerically and the noise mechanisms are investigated. The 2-D high frequency transonic small disturbance equation is solved numerically (VTRAN2 code). An Alternating Direction Implicit (ADI) scheme with monotone switches is used; viscous effects are included on the boundary and the vortex is simulated by the cloud-in-cell method. The Kirchoff method is used for the extension of the numerical 2-D near field aerodynamic results to the linear acoustic 3-D far field. The viscous effect (shock/boundary layer interaction) on BVI is investigated. The different types of shock motion are identified and compared. Two important disturbances with different directivity exist in the pressure signal and are believed to be related to the fluctuating lift and drag forces. Noise directivity for different cases is shown. The maximum radiation occurs at an angle between 60 and 90 deg below the horizontal for an airfoil fixed coordinate system and depends on the details of the airfoil shape. Different airfoil shapes are studied and classified according to the BVI noise produced.

  9. HART-II: Prediction of Blade-Vortex Interaction Loading

    NASA Technical Reports Server (NTRS)

    Lim, Joon W.; Tung, Chee; Yu, Yung H.; Burley, Casey L.; Brooks, Thomas; Boyd, Doug; vanderWall, Berend; Schneider, Oliver; Richard, Hugues; Raffel, Markus

    2003-01-01

    During the HART-I data analysis, the need for comprehensive wake data was found including vortex creation and aging, and its re-development after blade-vortex interaction. In October 2001, US Army AFDD, NASA Langley, German DLR, French ONERA and Dutch DNW performed the HART-II test as an international joint effort. The main objective was to focus on rotor wake measurement using a PIV technique along with the comprehensive data of blade deflections, airloads, and acoustics. Three prediction teams made preliminary correlation efforts with HART-II data: a joint US team of US Army AFDD and NASA Langley, German DLR, and French ONERA. The predicted results showed significant improvements over the HART-I predicted results, computed about several years ago, which indicated that there has been better understanding of complicated wake modeling in the comprehensive rotorcraft analysis. All three teams demonstrated satisfactory prediction capabilities, in general, though there were slight deviations of prediction accuracies for various disciplines.

  10. Euler solutions for self-generated rotor blade-vortex interactions

    NASA Technical Reports Server (NTRS)

    Hassan, A. A.; Tung, C.; Sankar, L. N.

    1990-01-01

    A finite-difference procedure was developed, on the basis of the conservation form of the unsteady three-dimensional Euler equations, for the prediction of rotor blade-vortex interactions (BVIs). Numerical solution procedures were obtained for the analysis of the model parallel BVIs and the more realistic helicopter self-generated-rotor BVIs. It was found that, for self-generated subcritical interactions, the accuracy of the predicted leading edge pressures relied heavily on the user-specified vortex core radius and on the CAMRAD-code-predicted geometry of the interaction vortex elements and their relative orientation with respect to the blade. It was also found that the free-wake model used in CAMRAD to predict the tip vortex trajectory for use in the Euler solution yields lower streamwise and higher axial wake convective velocities than those inferred from the experimental data.

  11. Reduction of Helicopter Blade-Vortex Interaction Noise by Active Rotor Control Technology

    NASA Technical Reports Server (NTRS)

    Yu, Yung H.; Gmelin, Bernd; Splettstoesser, Wolf; Brooks, Thomas F.; Philippe, Jean J.; Prieur, Jean

    1997-01-01

    Helicopter blade-vortex interaction noise is one of the most severe noise sources and is very important both in community annoyance and military detection. Research over the decades has substantially improved basic physical understanding of the mechanisms generating rotor blade-vortex interaction noise and also of controlling techniques, particularly using active rotor control technology. This paper reviews active rotor control techniques currently available for rotor blade vortex interaction noise reduction, including higher harmonic pitch control, individual blade control, and on-blade control technologies. Basic physical mechanisms of each active control technique are reviewed in terms of noise reduction mechanism and controlling aerodynamic or structural parameters of a blade. Active rotor control techniques using smart structures/materials are discussed, including distributed smart actuators to induce local torsional or flapping deformations, Published by Elsevier Science Ltd.

  12. Helicopter blade-vortex interaction locations: Scale-model acoustics and free-wake analysis results

    NASA Technical Reports Server (NTRS)

    Hoad, Danny R.

    1987-01-01

    The results of a model rotor acoustic test in the Langley 4by 7-Meter Tunnel are used to evaluate a free-wake analytical technique. An acoustic triangulation technique is used to locate the position in the rotor disk where the blade-vortex interaction noise originates. These locations, along with results of the rotor free-wake analysis, are used to define the geometry of the blade-vortex interaction noise phenomena as well as to determine if the free-wake analysis is a capable diagnostic tool. Data from tests of two teetering rotor systems are used in these analyses.

  13. Helicopter Blade-Vortex Interaction Noise with Comparisons to CFD Calculations

    NASA Technical Reports Server (NTRS)

    McCluer, Megan S.

    1996-01-01

    A comparison of experimental acoustics data and computational predictions was performed for a helicopter rotor blade interacting with a parallel vortex. The experiment was designed to examine the aerodynamics and acoustics of parallel Blade-Vortex Interaction (BVI) and was performed in the Ames Research Center (ARC) 80- by 120-Foot Subsonic Wind Tunnel. An independently generated vortex interacted with a small-scale, nonlifting helicopter rotor at the 180 deg azimuth angle to create the interaction in a controlled environment. Computational Fluid Dynamics (CFD) was used to calculate near-field pressure time histories. The CFD code, called Transonic Unsteady Rotor Navier-Stokes (TURNS), was used to make comparisons with the acoustic pressure measurement at two microphone locations and several test conditions. The test conditions examined included hover tip Mach numbers of 0.6 and 0.7, advance ratio of 0.2, positive and negative vortex rotation, and the vortex passing above and below the rotor blade by 0.25 rotor chords. The results show that the CFD qualitatively predicts the acoustic characteristics very well, but quantitatively overpredicts the peak-to-peak sound pressure level by 15 percent in most cases. There also exists a discrepancy in the phasing (about 4 deg) of the BVI event in some cases. Additional calculations were performed to examine the effects of vortex strength, thickness, time accuracy, and directionality. This study validates the TURNS code for prediction of near-field acoustic pressures of controlled parallel BVI.

  14. Full-Potential Modeling of Blade-Vortex Interactions. Degree awarded by George Washington Univ., Feb. 1987

    NASA Technical Reports Server (NTRS)

    Jones, Henry E.

    1997-01-01

    A study of the full-potential modeling of a blade-vortex interaction was made. A primary goal of this study was to investigate the effectiveness of the various methods of modeling the vortex. The model problem restricts the interaction to that of an infinite wing with an infinite line vortex moving parallel to its leading edge. This problem provides a convenient testing ground for the various methods of modeling the vortex while retaining the essential physics of the full three-dimensional interaction. A full-potential algorithm specifically tailored to solve the blade-vortex interaction (BVI) was developed to solve this problem. The basic algorithm was modified to include the effect of a vortex passing near the airfoil. Four different methods of modeling the vortex were used: (1) the angle-of-attack method, (2) the lifting-surface method, (3) the branch-cut method, and (4) the split-potential method. A side-by-side comparison of the four models was conducted. These comparisons included comparing generated velocity fields, a subcritical interaction, and a critical interaction. The subcritical and critical interactions are compared with experimentally generated results. The split-potential model was used to make a survey of some of the more critical parameters which affect the BVI.

  15. An Euler code calculation of blade-vortex interaction noise

    NASA Technical Reports Server (NTRS)

    Hardin, J. C.; Lamkin, S. L.

    1987-01-01

    An Euler code has been developed for calculation of noise radiation due to the interaction of a distributed vortex with a Joukowski airfoil. THe time-dependent incompressible flow field is first determined and then integrated to yield the resulting sound production through use of the elegant low-frequency Green's function approach. This code has several interesting numerical features involved in the vortex motion and in continuous satisfaction of the Kutta condition. In addition, it removes the limitations on Reynolds number and is much more efficient than an earlier Navier-Stokes code. Results indicate that the noise production is due to the deceleration and subsequent acceleration of the vortex as it approaches and passes the airfoil. Predicted acoustic levels and frequencies agree with measured data although a precise comparison would require the strength, size, and position of the incoming vortex to be known.

  16. Mach number scaling of helicopter rotor blade/vortex interaction noise

    NASA Technical Reports Server (NTRS)

    Leighton, Kenneth P.; Harris, Wesley L.

    1985-01-01

    A parametric study of model helicopter rotor blade slap due to blade vortex interaction (BVI) was conducted in a 5 by 7.5-foot anechoic wind tunnel using model helicopter rotors with two, three, and four blades. The results were compared with a previously developed Mach number scaling theory. Three- and four-bladed rotor configurations were found to show very good agreement with the Mach number to the sixth power law for all conditions tested. A reduction of conditions for which BVI blade slap is detected was observed for three-bladed rotors when compared to the two-bladed baseline. The advance ratio boundaries of the four-bladed rotor exhibited an angular dependence not present for the two-bladed configuration. The upper limits for the advance ratio boundaries of the four-bladed rotors increased with increasing rotational speed.

  17. Reduction of Blade-Vortex Interaction (BVI) noise through X-force control

    NASA Technical Reports Server (NTRS)

    Schmitz, Fredric H.

    1995-01-01

    Momentum theory and the longitudinal force balance equations of a single rotor helicopter are used to develop simple expressions to describe tip-path-plane tilt and uniform inflow to the rotor. The uniform inflow is adjusted to represent the inflow at certain azimuthal locations where strong Blade-Vortex Interaction (BVI) is likely to occur. This theoretical model is then used to describe the flight conditions where BVI is likely to occur and to explore those flight variables that can be used to minimize BVI noise radiation. A new X-force control is introduced to help minimize BVI noise. Several methods of generating the X-force are presented that can be used to alter the inflow to the rotor and thus increasing the likelihood of avoiding BVI during approaches to a landing.

  18. Prediction of blade-vortex interaction noise using measured blade pressures

    NASA Technical Reports Server (NTRS)

    Joshi, Mahendra C.; Liu, Sandy R.; Boxwell, Donald A.

    1987-01-01

    In the study reported here, blade-vortex interaction noise was predicted using a simplified model of blade pressures measured on a one-seventh scale model AH-1/OLS main rotor. The methods used for the acoustic prediction are based on the acoustic analogy and have been developed by Nakamura (1981) and by Brentner, Nystrom, and Farassat (referred to as the WOPWOP method). The waveforms predicted by the two methods are in good agreement with each other and with the measurements in terms of the number of pulses, the pulse widths, and the separation times between the pulses. The peak amplitude of the dominant pulse may, however, be underpredicted by up to 40 percent, depending on flight conditions. Ways of improving the accuracy of the prediction methods are suggested.

  19. Blade-vortex interaction noise predictions using measured blade surface pressures

    NASA Technical Reports Server (NTRS)

    Ziegenbein, Perry R.; Oh, Byung K.

    1987-01-01

    The generation of helicopter noise by blade-vortex interactions during descent under impulsive conditions is investigated analytically. A noise-prediction technique is developed on the basis of the dipole source term of the Ffowcs-Williams/Hawkings equation and applied to data from simultaneous blade-pressure and acoustic measurements obtained by Cowan et al. (1986) on a 10-ft-diameter 4-blade rotor model in a wind tunnel. Preliminary results show that input-blade-airload azimuth resolution of 1 deg or better and computational azimuth step size of 2 deg or less are required to achieve good agreement between predicted and recorded acoustic time histories. The need for more sophisticated methods to model chordwise input data and for a more extensive experimental data base is indicated.

  20. Effect of wake structure on blade-vortex interaction phenomena: Acoustic prediction and validation

    NASA Technical Reports Server (NTRS)

    Gallman, Judith M.; Tung, Chee; Schultz, Klaus J.; Splettstoesser, Wolf; Buchholz, Heino

    1995-01-01

    During the Higher Harmonic Control Aeroacoustic Rotor Test, extensive measurements of the rotor aerodynamics, the far-field acoustics, the wake geometry, and the blade motion for powered, descent, flight conditions were made. These measurements have been used to validate and improve the prediction of blade-vortex interaction (BVI) noise. The improvements made to the BVI modeling after the evaluation of the test data are discussed. The effects of these improvements on the acoustic-pressure predictions are shown. These improvements include restructuring the wake, modifying the core size, incorporating the measured blade motion into the calculations, and attempting to improve the dynamic blade response. A comparison of four different implementations of the Ffowcs Williams and Hawkings equation is presented. A common set of aerodynamic input has been used for this comparison.

  1. Helicopter Model Rotor-Blade Vortex Interaction Impulsive Noise: Scalability and Parametric Variations

    NASA Technical Reports Server (NTRS)

    Boxwell, D. A.; Schmitz, F. H.; Splettstoesser, W. R.; Schultz, K. J.

    1987-01-01

    Acoustic data taken in the anechoic Deutsch-Niederlaendischer Windkanal (DNW) have documented the blade-vortex interaction (BVI) impulsive noise radiated from a 1/7-scale model main rotor of the AH-1 series helicopter. Averaged model-scale data were compared with averaged full-scale, in-flight acoustic data under similar non-dimensional test conditions using an improved data analysis technique. At low advance ratios (mu = 0.164 - 0.194), the BVI impulsive noise data scale remarkably well in level, waveform, and directivity patterns. At moderate advance ratios (mu = 0.224 - 0.270), the scaling deteriorates, suggesting that the model-scale rotor is not adequately simulating the full-scale BVI noise. Presently, no proved explanation of this discrepancy exists. Measured BVI noise radiation is highly sensitive to all of the four governing nondimensional parameters--hover tip Mach number, advance ratio, local inflow ratio, and thrust coefficient.

  2. Wake Geometry Effects on Rotor Blade-Vortex Interaction Noise Directivity

    NASA Technical Reports Server (NTRS)

    Martin, R. M.; Marcolini, Michael A.; Splettstoesser, W. R.; Schultz, K.-J.

    1990-01-01

    Acoustic measurements from a model rotor wind tunnel test are presented which show that the directionality of rotor blade vortex interaction (BVI) noise is strongly dependent on the rotor advance ratio and disk attitude. A rotor free wake analysis is used to show that the general locus of interactions on the rotor disk is also strongly dependent on advance ratio and disk attitude. A comparison of the changing directionality of the BVI noise with changes in the interaction locations shows that the strongest noise radiation occurs in the direction of motion normal to the blade span at the time of interaction, for both advancing and retreating side BVI. For advancing side interactions, the BVI radiation angle down from the tip-path plane appears relatively insensitive to rotor operating condition and is typically between 40 and 55 deg below the disk. However, the azimuthal radiation direction shows a clear trend with descent speed, moving towards the right of the flight path with increasing descent speed. The movement of the strongest radiation direction is attributed to the movement of the interaction locations on the rotor disk with increasing descent speed.

  3. Acoustic measurements from a rotor blade-vortex interaction noise experiment in the German-Dutch Wind Tunnel (DNW)

    NASA Technical Reports Server (NTRS)

    Martin, Ruth M.; Splettstoesser, W. R.; Elliott, J. W.; Schultz, K.-J.

    1988-01-01

    Acoustic data are presented from a 40 percent scale model of the 4-bladed BO-105 helicopter main rotor, measured in the large European aeroacoustic wind tunnel, the DNW. Rotor blade-vortex interaction (BVI) noise data in the low speed flight range were acquired using a traversing in-flow microphone array. The experimental apparatus, testing procedures, calibration results, and experimental objectives are fully described. A large representative set of averaged acoustic signals is presented.

  4. Rotor blade-vortex interaction impulsive noise source identification and correlation with rotor wake predictions

    NASA Astrophysics Data System (ADS)

    Splettstoesser, W. R.; Schultz, K. J.; Martin, Ruth M.

    1987-10-01

    An acoustic source localization scheme applicable to noncompact moving sources is developed and applied to the blade-vortex interaction (BVI) noise data of a 40-percent scale BO-105 model rotor. A generalized rotor wake code is employed to predict possible VBI locations on the rotor disk and is found quite useful in interpreting the acoustic localization results. The highly varying directivity patterns of different BVI impulses generated at the same test condition are explained by both the localization results and predicted tip vortex trajectories. The effects of rotor tip-path-plane angle and advance ratio on the BVI source positions is studied. Decreasing tip-path-plane angle (at constant advance ratio) moves the general interaction region upwind on the rotor disk, significantly changing the interaction geometry. Increasing advance ratio (at constant tip-path-plane angle) shifts the general source region downwind on the rotor disk with the increased convection of the vortices until about 60 deg azimuth, where the BVI sources appear to become acoustically less effective. The region of strongest BVI sources lies between 60 and 70 deg azimuth and 80 and 90 percent radius for the moderate range of advance ratios studied.

  5. A study of the noise mechanisms of transonic blade-vortex interactions

    NASA Technical Reports Server (NTRS)

    Lyrintzis, Anastasios S.; Xue, Y.

    1990-01-01

    Transonic blade-vortex interactions (BVI) are simulated numerically and the noise mechanisms are investigated. The two-dimensional high frequency transonic small disturbance equation is solved numerically (VTRAN2 code). An ADI scheme with monotone switches is used; viscous effects are included on the boundary, and the vortex is simulated by the cloud in cell method. The Kirchhoff method is used for the extension of the numerical two-dimensional near-field aerodynamic results to the linear acoustic three dimensional far field. The viscous effects (shock/boundary layer interactions) on BVI is investigated. The different types of shock motion are identified and compared. Two important disturbances with different directivity exist in the pressure signal and are believed to be related to the fluctuating lift and drag forces. Noise directivity for different cases is shown. The maximum radiation occurs at an angle between 60 and 90 degrees below the horizontal for an airfoil-fixed coordinate system and depends on the details of the airfoil shape. Different airfoil shapes are studied and classified according to the BVI noise produced.

  6. Helicopter model rotor-blade vortex interaction impulsive noise: Scalability and parametric variations

    NASA Technical Reports Server (NTRS)

    Splettstoesser, W. R.; Schultz, K. J.; Boxwell, D. A.; Schmitz, F. H.

    1984-01-01

    Acoustic data taken in the anechoic Deutsch-Niederlaendischer Windkanal (DNW) have documented the blade vortex interaction (BVI) impulsive noise radiated from a 1/7-scale model main rotor of the AH-1 series helicopter. Averaged model scale data were compared with averaged full scale, inflight acoustic data under similar nondimensional test conditions. At low advance ratios (mu = 0.164 to 0.194), the data scale remarkable well in level and waveform shape, and also duplicate the directivity pattern of BVI impulsive noise. At moderate advance ratios (mu = 0.224 to 0.270), the scaling deteriorates, suggesting that the model scale rotor is not adequately simulating the full scale BVI noise; presently, no proved explanation of this discrepancy exists. Carefully performed parametric variations over a complete matrix of testing conditions have shown that all of the four governing nondimensional parameters - tip Mach number at hover, advance ratio, local inflow ratio, and thrust coefficient - are highly sensitive to BVI noise radiation.

  7. Reduction of blade-vortex interaction noise through higher harmonic pitch control

    NASA Technical Reports Server (NTRS)

    Brooks, Thomas F.; Booth, Earl R., Jr.; Jolly, J. Ralph, Jr.; Yeager, William T., Jr.; Wilbur, Matthew L.

    1990-01-01

    An acoustics test using an aeroelastically scaled rotor was conducted to examine the effectiveness of higher harmonic blade pitch control for the reduction of impulsive blade-vortex interaction (BVI) noise. A four-bladed, 110 in. diameter, articulated rotor model was tested in a heavy gas (Freon-12) medium in Langley's Transonic Dynamics Tunnel. Noise and vibration measurements were made for a range of matched flight conditions, where prescribed (open-loop) higher harmonic pitch was superimposed on the normal (baseline) collective and cyclic trim pitch. For the inflow-microphone noise measurements, advantage was taken of the reverberance in the hard walled tunnel by using a sound power determination approach. Initial findings from on-line data processing for three of the test microphones are reported for a 4/rev (4P) collective pitch control for a range of input amplitudes and phases. By comparing these results to corresponding baseline (no control) conditions, significant noise reductions (4 to 5 dB) were found for low-speed descent conditions, where helicopter BVI noise is most intense. For other rotor flight conditions, the overall noise was found to increase. All cases show increased vibration levels.

  8. Reduction of blade-vortex interaction noise using higher harmonic pitch control

    NASA Technical Reports Server (NTRS)

    Brooks, Thomas F.; Booth, Earl R., Jr.; Jolly, J. Ralph, Jr.; Yeager, William T., Jr.; Wilbur, Matthew L.

    1989-01-01

    An acoustics test using an aeroelastically scaled rotor was conducted to examine the effectiveness of higher harmonic blade pitch control for the reduction of impulsive blade-vortex interaction (BVI) noise. A four-bladed, 110 in. diameter, articulated rotor model was tested in a heavy gas (Freon-12) medium in Langley's Transonic Dynamics Tunnel. Noise and vibration measurements were made for a range of matched flight conditions, where prescribed (open-loop) higher harmonic pitch was superimposed on the normal (baseline) collective and cyclic trim pitch. For the inflow-microphone noise measurements, advantage was taken of the reverberance in the hard walled tunnel by using a sound power determination approach. Initial findings from on-line data processing for three of the test microphones are reported for a 4/rev (4P) collective pitch control for a range of input amplitudes and phases. By comparing these results to corresponding baseline (no control) conditions, significant noise reductions (4 to 5 dB) were found for low-speed descent conditions, where helicopter BVI noise is most intense. For other rotor flight conditions, the overall noise was found to increase. All cases show increased vibration levels.

  9. Advancing-side directivity and retreating-side interactions of model rotor blade-vortex interaction noise

    NASA Technical Reports Server (NTRS)

    Martin, R. M.; Splettstoesser, W. R.; Elliott, J. W.; Schultz, K.-J.

    1988-01-01

    Acoustic data are presented from a 40 percent scale model of the four-bladed BO-105 helicopter main rotor, tested in a large aerodynamic wind tunnel. Rotor blade-vortex interaction (BVI) noise data in the low-speed flight range were acquired using a traversing in-flow microphone array. Acoustic results presented are used to assess the acoustic far field of BVI noise, to map the directivity and temporal characteristics of BVI impulsive noise, and to show the existence of retreating-side BVI signals. The characterics of the acoustic radiation patterns, which can often be strongly focused, are found to be very dependent on rotor operating condition. The acoustic signals exhibit multiple blade-vortex interactions per blade with broad impulsive content at lower speeds, while at higher speeds, they exhibit fewer interactions per blade, with much sharper, higher amplitude acoustic signals. Moderate-amplitude BVI acoustic signals measured under the aft retreating quadrant of the rotor are shown to originate from the retreating side of the rotor.

  10. New techniques for experimental generation of two-dimensional blade-vortex interaction at low Reynolds numbers

    NASA Technical Reports Server (NTRS)

    Booth, E., Jr.; Yu, J. C.

    1986-01-01

    An experimental investigation of two dimensional blade vortex interaction was held at NASA Langley Research Center. The first phase was a flow visualization study to document the approach process of a two dimensional vortex as it encountered a loaded blade model. To accomplish the flow visualization study, a method for generating two dimensional vortex filaments was required. The numerical study used to define a new vortex generation process and the use of this process in the flow visualization study were documented. Additionally, photographic techniques and data analysis methods used in the flow visualization study are examined.

  11. Flow structure generated by perpendicular blade-vortex interaction and implications for helicopter noise prediction. Volume 1: Measurements

    NASA Technical Reports Server (NTRS)

    Wittmer, Kenneth S.; Devenport, William J.

    1996-01-01

    The perpendicular interaction of a streamwise vortex with an infinite span helicopter blade was modeled experimentally in incompressible flow. Three-component velocity and turbulence measurements were made using a sub-miniature four sensor hot-wire probe. Vortex core parameters (radius, peak tangential velocity, circulation, and centerline axial velocity deficit) were determined as functions of blade-vortex separation, streamwise position, blade angle of attack, vortex strength, and vortex size. The downstream development of the flow shows that the interaction of the vortex with the blade wake is the primary cause of the changes in the core parameters. The blade sheds negative vorticity into its wake as a result of the induced angle of attack generated by the passing vortex. Instability in the vortex core due to its interaction with this negative vorticity region appears to be the catalyst for the magnification of the size and intensity of the turbulent flowfield downstream of the interaction. In general, the core radius increases while peak tangential velocity decreases with the effect being greater for smaller separations. These effects are largely independent of blade angle of attack; and if these parameters are normalized on their undisturbed values, then the effects of the vortex strength appear much weaker. Two theoretical models were developed to aid in extending the results to other flow conditions. An empirical model was developed for core parameter prediction which has some rudimentary physical basis, implying usefulness beyond a simple curve fit. An inviscid flow model was also created to estimate the vorticity shed by the interaction blade, and to predict the early stages of its incorporation into the interacting vortex.

  12. Evaluation of helicopter noise due to b blade-vortex interaction for five tip configurations. [conducted in the Langley V/STOL tunnel

    NASA Technical Reports Server (NTRS)

    Hoad, D. R.

    1979-01-01

    The effect of tip shape modification on blade vortex interaction induced helicopter blade slap noise was investigated. Simulated flight and descent velocities which have been shown to produce blade slap were tested. Aerodynamic performance parameters of the rotor system were monitored to ensure properly matched flight conditions among the tip shapes. The tunnel was operated in the open throat configuration with treatment to improve the acoustic characteristics of the test chamber. Four promising tips were used along with a standard square tip as a baseline configuration. A detailed acoustic evaluation on the same rotor system of the relative applicability of the various tip configurations for blade slap noise reduction is provided.

  13. Lift distributions for a 3-dimensional steady blade-vortex interaction

    NASA Technical Reports Server (NTRS)

    Dunagan, Stephen E.; Norman, Thomas R.

    1987-01-01

    The interaction of a horizontally mounted V23010-1.58 semispan airfoil (simulating a helicopter rotor blade) with a tip vortex shed by a vertically mounted upstream vortex-generating wing (VGW) is investigated experimentally at 60 m/s (dynamic pressure 2.2 kPa, Reynolds number 850,000, and Mach number 0.17) in the NASA Ames 7 x 10-ft wind tunnel. The velocity field near the blade is determined using a three-dimensional zoom LDV; the spanwise lift distribution is measured by strain gages; and the results are compared with the predictions of the panel computer code VSAERO (Maskew, 1982) in graphs. Features noted include localized loss of lift due to the presence of residual VGW wake, loss of lift far inboard on the blade (indicating the large domain of VGW vorticity), little change in total lift with variations in vortex strength, and good agreement between VSAERO and experiment in overall lift distribution but not in all geometric variations).

  14. A parametric study of blade vortex interaction noise for two, three, and four-bladed model rotors at moderate tip speeds Theory and experiment

    NASA Technical Reports Server (NTRS)

    Leighton, K. P.; Harris, W. L.

    1984-01-01

    An investigation of blade slap due to blade vortex interaction (BVI) has been conducted. This investigation consisted of an examination of BVI blade slap for two, three, and four-bladed model rotors at tip Mach numbers ranging from 0.20 to 0.50. Blade slap contours have been obtained for each configuration tested. Differences in blade slap contours, peak sound pressure level, and directivity for each configuration tested are noted. Additional fundamental differences, such as multiple interaction BVI, are observed and occur for only specific rotor blade configurations. The effect of increasing the Mach number on the BVI blade slap for various rotor blade combinations has been quantified. A peak blade slap Mach number scaling law is proposed. Comparison of measured BVI blade slap with theory is made.

  15. A reduced Blade-Vortex Interaction rotor 

    E-print Network

    Mani, Somnath

    1996-01-01

    This research work aims at mapping the BVI azimuthal locations using a model rotor. A model rotor was first developed. An experimental investigation was then carried out to determine the possible BVI locations. The results ...

  16. Rotor system having alternating length rotor blades for reducing blade-vortex interaction (BVI) noise

    NASA Technical Reports Server (NTRS)

    Moffitt, Robert C. (Inventor); Visintainer, Joseph A. (Inventor)

    1997-01-01

    A rotor system (4) having odd and even blade assemblies (O.sub.b, E.sub.b) mounting to and rotating with a rotor hub assembly (6) wherein the odd blade assemblies (O.sub.b) define a radial length R.sub.O, and the even blade assemblies (E.sub.b) define a radial length R.sub.E and wherein the radial length R.sub.E is between about 70% to about 95% of the radial length R.sub.O. Other embodiments of the invention are directed to a Variable Diameter Rotor system (4) which may be configured for operating in various operating modes for optimizing aerodynamic and acoustic performance. The Variable Diameter Rotor system (4) includes odd and even blade assemblies (O.sub.b, E.sub.b) having inboard and outboard blade sections (10, 12) wherein the outboard blade sections (12) telescopically mount to the inboard blade sections (10). The outboard blade sections (12) are positioned with respect to the inboard blade sections (10 such that the radial length R.sub.E of the even blade assemblies (E.sub.b) is equal to the radial length R.sub.O of the odd blade assemblies (O.sub.b) in a first operating mode, and such that the radial length R.sub.E is between about 70% to about 95% of the length R.sub.O in a second operating mode.

  17. Interactions between flames on parallel solid surfaces

    NASA Technical Reports Server (NTRS)

    Urban, David L.

    1995-01-01

    The interactions between flames spreading over parallel solid sheets of paper are being studied in normal gravity and in microgravity. This geometry is of practical importance since in most heterogeneous combustion systems, the condensed phase is non-continuous and spatially distributed. This spatial distribution can strongly affect burning and/or spread rate. This is due to radiant and diffusive interactions between the surface and the flames above the surfaces. Tests were conducted over a variety of pressures and separation distances to expose the influence of the parallel sheets on oxidizer transport and on radiative feedback.

  18. Introduction Parallel SAMR Fluid-structure interaction Conclusions Massively parallel fluid-structure interaction simulation of

    E-print Network

    Deiterding, Ralf

    -structure interaction simulation of blast and explosions impacting on realistic building structures with a block simulation with a block-structured AMR method 1 #12;Introduction Parallel SAMR Fluid-structure interaction Verification and validation configurations Blast-driven deformation Detonation-driven deformations Conclusions

  19. Parallel Simulation of ElectronSolid Interactions Electron Microscopy Modeling

    E-print Network

    Plimpton, Steve

    Page 1 Parallel Simulation of Electron­Solid Interactions for Electron Microscopy Modeling S. J, Monte Carlo, electron, microscopy, random number generation Abstract A parallel implementation Introduction Analytical electron microscopy (AEM) is a tool for characterizing the spatial distribution of ele

  20. Parallelized Stochastic Cutoff Method for Long-Range Interacting Systems

    NASA Astrophysics Data System (ADS)

    Endo, Eishin; Toga, Yuta; Sasaki, Munetaka

    2015-07-01

    We present a method of parallelizing the stochastic cutoff (SCO) method, which is a Monte-Carlo method for long-range interacting systems. After interactions are eliminated by the SCO method, we subdivide a lattice into noninteracting interpenetrating sublattices. This subdivision enables us to parallelize the Monte-Carlo calculation in the SCO method. Such subdivision is found by numerically solving the vertex coloring of a graph created by the SCO method. We use an algorithm proposed by Kuhn and Wattenhofer to solve the vertex coloring by parallel computation. This method was applied to a two-dimensional magnetic dipolar system on an L × L square lattice to examine its parallelization efficiency. The result showed that, in the case of L = 2304, the speed of computation increased about 102 times by parallel computation with 288 processors.

  1. Investigation of helicopter rotor blade/wake interactive impulsive noise

    NASA Technical Reports Server (NTRS)

    Miley, S. J.; Hall, G. F.; Vonlavante, E.

    1987-01-01

    An analysis of the Tip Aerodynamic/Aeroacoustic Test (TAAT) data was performed to identify possible aerodynamic sources of blade/vortex interaction (BVI) impulsive noise. The identification is based on correlation of measured blade pressure time histories with predicted blade/vortex intersections for the flight condition(s) where impulsive noise was detected. Due to the location of the recording microphones, only noise signatures associated with the advancing blade were available, and the analysis was accordingly restricted to the first and second azimuthal quadrants. The results show that the blade tip region is operating transonically in the azimuthal range where previous BVI experiments indicated the impulsive noise to be. No individual blade/vortex encounter is identifiable in the pressure data; however, there is indication of multiple intersections in the roll-up region which could be the origin of the noise. Discrete blade/vortex encounters are indicated in the second quadrant; however, if impulsive noise were produced here, the directivity pattern would be such that it was not recorded by the microphones. It is demonstrated that the TAAT data base is a valuable resource in the investigation of rotor aerodynamic/aeroacoustic behavior.

  2. An interactive parallel programming environment applied in atmospheric science

    NASA Technical Reports Server (NTRS)

    vonLaszewski, G.

    1996-01-01

    This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.

  3. An interactive parallel programming environment applied in atmospheric science

    SciTech Connect

    Laszewski, G. von

    1996-12-31

    This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.

  4. IPython: components for interactive and parallel computing across disciplines. (Invited)

    NASA Astrophysics Data System (ADS)

    Perez, F.; Bussonnier, M.; Frederic, J. D.; Froehle, B. M.; Granger, B. E.; Ivanov, P.; Kluyver, T.; Patterson, E.; Ragan-Kelley, B.; Sailer, Z.

    2013-12-01

    Scientific computing is an inherently exploratory activity that requires constantly cycling between code, data and results, each time adjusting the computations as new insights and questions arise. To support such a workflow, good interactive environments are critical. The IPython project (http://ipython.org) provides a rich architecture for interactive computing with: 1. Terminal-based and graphical interactive consoles. 2. A web-based Notebook system with support for code, text, mathematical expressions, inline plots and other rich media. 3. Easy to use, high performance tools for parallel computing. Despite its roots in Python, the IPython architecture is designed in a language-agnostic way to facilitate interactive computing in any language. This allows users to mix Python with Julia, R, Octave, Ruby, Perl, Bash and more, as well as to develop native clients in other languages that reuse the IPython clients. In this talk, I will show how IPython supports all stages in the lifecycle of a scientific idea: 1. Individual exploration. 2. Collaborative development. 3. Production runs with parallel resources. 4. Publication. 5. Education. In particular, the IPython Notebook provides an environment for "literate computing" with a tight integration of narrative and computation (including parallel computing). These Notebooks are stored in a JSON-based document format that provides an "executable paper": notebooks can be version controlled, exported to HTML or PDF for publication, and used for teaching.

  5. Interaction of a turbulent vortex with a lifting surface

    NASA Technical Reports Server (NTRS)

    Lee, D. J.; Roberts, L.

    1985-01-01

    The impulsive noise due to blade-vortex-interaction is analyzing in the time domain for the extreme case when the blade cuts through the center of the vortex core with the assumptions of no distortion of the vortex path or of the vortex core. An analytical turbulent vortex core model, described in terms of the tip aerodynamic parameters, is used and its effects on the unsteady loading and maximum acoustic pressure during the interaction are determined.

  6. Framework for Interactive Parallel Dataset Analysis on the Grid

    SciTech Connect

    Alexander, David A.; Ananthan, Balamurali; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  7. Nanoparticle-Target Interactions Parallel Antibody-Protein Interactions

    PubMed Central

    Koh, Isaac; Hong, Rui; Weissleder, Ralph; Josephson, Lee

    2009-01-01

    Magnetic particles can act as magnetic relaxation switches (MRSw's) when they bind to target analytes, and switch between their dispersed and aggregated states resulting in changes in the spin-spin relaxation time (T2) of their surrounding water protons. Both nanoparticles (NPs, 10-100 nm) and micron-sized particles (MPs) have been employed as MRSw's, to sense drugs, metabolites, oligonucleotides, proteins, bacteria and mammalian cells. To better understand how NPs or MPs interact with targets, we employed as a molecular recognition system the reaction between the Tag peptide of the influenza virus hemagglutinin and a monoclonal antibody to that peptide (anti-Tag). To obtain targets of different size and valency, we attached the Tag peptide to BSA (Mw= 65000 Daltons, diameter = 8 nm) and to Latex spheres (diameter = 900 nm). To obtain magnetic probes of very different sizes, anti-Tag was conjugated to 40 nm NPs and 1 ?m MPs. MP and NP probes reacted with Tag peptide targets in a manner similar to antibody/antigen reactions in solution, exhibiting so-called prozone effects. MPs detected all types of targets with higher sensitivity than NPs with targets of higher valency being better detected than those of lower valency. The Tag/anti-tag recognition system can be used to synthesize combinations of molecular targets and magnetic probes, to more fully understand the aggregation reaction that occurs when probes bind targets in solution and the ensuing changes in water relaxation times that result. PMID:19323458

  8. Modeling Interacting Galaxies Using a Parallel Genetic Algorithm

    E-print Network

    Christian Theis; Stefan Harfst

    1999-10-10

    Modeling of interacting galaxies suffers from an extended parameter space prohibiting traditional grid based search strategies. As an alternative approach a combination of a Genetic Algorithm (GA) with fast restricted N-body simulations can be applied. A typical fit takes about 3-6 CPU-hours on a PentiumII processor. Here we present a parallel implementation of our GA which reduces the CPU-requirement of a parameter determination to a few minutes on 100 nodes of a CRAY T3E.

  9. Long-range interactions & parallel scalability in molecular simulations

    E-print Network

    Michael Patra; Marja T. Hyvonen; Emma Falck; Mohsen Sabouri-Ghomi; Ilpo Vattulainen; Mikko Karttunen

    2006-06-21

    Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modelling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes - we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e., communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

  10. Long-range interactions and parallel scalability in molecular simulations

    NASA Astrophysics Data System (ADS)

    Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

    2007-01-01

    Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

  11. A multimodal parallel architecture: A cognitive framework for multimodal interactions.

    PubMed

    Cohn, Neil

    2016-01-01

    Human communication is naturally multimodal, and substantial focus has examined the semantic correspondences in speech-gesture and text-image relationships. However, visual narratives, like those in comics, provide an interesting challenge to multimodal communication because the words and/or images can guide the overall meaning, and both modalities can appear in complicated "grammatical" sequences: sentences use a syntactic structure and sequential images use a narrative structure. These dual structures create complexity beyond those typically addressed by theories of multimodality where only a single form uses combinatorial structure, and also poses challenges for models of the linguistic system that focus on single modalities. This paper outlines a broad theoretical framework for multimodal interactions by expanding on Jackendoff's (2002) parallel architecture for language. Multimodal interactions are characterized in terms of their component cognitive structures: whether a particular modality (verbal, bodily, visual) is present, whether it uses a grammatical structure (syntax, narrative), and whether it "dominates" the semantics of the overall expression. Altogether, this approach integrates multimodal interactions into an existing framework of language and cognition, and characterizes interactions between varying complexity in the verbal, bodily, and graphic domains. The resulting theoretical model presents an expanded consideration of the boundaries of the "linguistic" system and its involvement in multimodal interactions, with a framework that can benefit research on corpus analyses, experimentation, and the educational benefits of multimodality. PMID:26491835

  12. Parallel Force Assay for Protein-Protein Interactions

    PubMed Central

    Aschenbrenner, Daniela; Pippig, Diana A.; Klamecka, Kamila; Limmer, Katja; Leonhardt, Heinrich; Gaub, Hermann E.

    2014-01-01

    Quantitative proteome research is greatly promoted by high-resolution parallel format assays. A characterization of protein complexes based on binding forces offers an unparalleled dynamic range and allows for the effective discrimination of non-specific interactions. Here we present a DNA-based Molecular Force Assay to quantify protein-protein interactions, namely the bond between different variants of GFP and GFP-binding nanobodies. We present different strategies to adjust the maximum sensitivity window of the assay by influencing the binding strength of the DNA reference duplexes. The binding of the nanobody Enhancer to the different GFP constructs is compared at high sensitivity of the assay. Whereas the binding strength to wild type and enhanced GFP are equal within experimental error, stronger binding to superfolder GFP is observed. This difference in binding strength is attributed to alterations in the amino acids that form contacts according to the crystal structure of the initial wild type GFP-Enhancer complex. Moreover, we outline the potential for large-scale parallelization of the assay. PMID:25546146

  13. STAR/MPI: Binding a Parallel Library to Interactive Symbolic Algebra Systems

    E-print Network

    Cooperman, Gene

    STAR/MPI: Binding a Parallel Library to Interactive). These implementationsSTAR/MPI can immediately * *take advantage of such parallel are examples that extend bindings of MPI to interactivearchitectures. The name * *STAR/MPI ("*/MPI") has been languages. (MPI already has

  14. Interactive animation of fault-tolerant parallel algorithms

    SciTech Connect

    Apgar, S.W.

    1992-02-01

    Animation of algorithms makes understanding them intuitively easier. This paper describes the software tool Raft (Robust Animator of Fault Tolerant Algorithms). The Raft system allows the user to animate a number of parallel algorithms which achieve fault tolerant execution. In particular, we use it to illustrate the key Write-All problem. It has an extensive user-interface which allows a choice of the number of processors, the number of elements in the Write-All array, and the adversary to control the processor failures. The novelty of the system is that the interface allows the user to create new on-line adversaries as the algorithm executes.

  15. An Interactive Parallel Coordinates Technique Applied to a Tropical Cyclone Climate

    E-print Network

    Swan II, J. Edward

    An Interactive Parallel Coordinates Technique Applied to a Tropical Cyclone Climate Analysis Chad A is a tropical cyclone, defined as a warm-core non-frontal synoptic-scale cyclone,4 originating over tropical or subtropical waters, with organized thunderstorms5 and a closed surface wind circulation. Tropical cyclones

  16. Parallel implementation of molecular dynamics simulation for short-ranged interaction

    NASA Astrophysics Data System (ADS)

    Wu, Jong-Shinn; Hsu, Yu-Lin; Lee, Yun-Min

    2005-08-01

    A parallel molecular dynamics simulation method, designed for large-scale problems, employing dynamic spatial domain decomposition for short-ranged molecular interactions is proposed. In this parallel cellular molecular dynamics (PCMD) simulation method, the link-cell data structure is used to reduce the searching time required for forming the cut-off neighbor list as well as for domain decomposition, which utilizes the multi-level graph-partitioning technique. A simple threshold scheme (STS), in which workload imbalance is monitored and compared with some threshold value during the runtime, is proposed to decide the proper time for repartitioning the domain. The simulation code is implemented and tested on the memory-distributed parallel machine, e.g., PC-cluster system. Parallel performance is studied using approximately one million L-J atoms in the condensed, vaporized and supercritical states. Results show that fairly good parallel efficiency at 49 processors can be obtained for the condensed and supercritical states (˜60%), while it is comparably lower for the vaporized state (˜40%).

  17. Parallel implementation of three-dimensional molecular dynamic simulation for laser-cluster interaction

    SciTech Connect

    Holkundkar, Amol R.

    2013-11-15

    The objective of this article is to report the parallel implementation of the 3D molecular dynamic simulation code for laser-cluster interactions. The benchmarking of the code has been done by comparing the simulation results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.

  18. Formation of electron kappa distributions due to interactions with parallel propagating whistler waves

    SciTech Connect

    Tao, X. Lu, Q.; Mengcheng National Geophysical Observatory, School of Earth and Space Sciences, University of Science and Technology of China, Hefei, Anhui 230026

    2014-02-15

    In space plasmas, charged particles are frequently observed to possess a high-energy tail, which is often modeled by a kappa-type distribution function. In this work, the formation of the electron kappa distribution in generation of parallel propagating whistler waves is investigated using fully nonlinear particle-in-cell (PIC) simulations. A previous research concluded that the bi-Maxwellian character of electron distributions is preserved in PIC simulations. We now demonstrate that for interactions between electrons and parallel propagating whistler waves, a non-Maxwellian high-energy tail can be formed, and a kappa distribution can be used to fit the electron distribution in time-asymptotic limit. The ?-parameter is found to decrease with increasing initial temperature anisotropy or decreasing ratio of electron plasma frequency to cyclotron frequency. The results might be helpful to understanding the origin of electron kappa distributions observed in space plasmas.

  19. Engineering of parallel plasmonic-photonic interactions for on-chip refractive index sensors

    NASA Astrophysics Data System (ADS)

    Lin, Linhan; Zheng, Yuebing

    2015-07-01

    Ultra-narrow linewidth in the extinction spectrum of noble metal nanoparticle arrays induced by the lattice plasmon resonances (LPRs) is of great significance for applications in plasmonic lasers and plasmonic sensors. However, the challenge of sustaining LPRs in an asymmetric environment greatly restricts their practical applications, especially for high-performance on-chip plasmonic sensors. Herein, we fully study the parallel plasmonic-photonic interactions in both the Au nanodisk arrays (NDAs) and the core/shell SiO2/Au nanocylinder arrays (NCAs). Different from the dipolar interactions in the conventionally studied orthogonal coupling, the horizontal propagating electric field introduces the out-of-plane ``hot spots'' and results in electric field delocalization. Through controlling the aspect ratio to manipulate the ``hot spot'' distributions of the localized surface plasmon resonances (LSPRs) in the NCAs, we demonstrate a high-performance refractive index sensor with a wide dynamic range of refractive indexes ranging from 1.0 to 1.5. Both high figure of merit (FOM) and high signal-to-noise ratio (SNR) can be maintained under these detectable refractive indices. Furthermore, the electromagnetic field distributions confirm that the high FOM in the wide dynamic range is attributed to the parallel coupling between the superstrate diffraction orders and the height-induced LSPR modes. Our study on the near-field ``hot-spot'' engineering and far-field parallel coupling paves the way towards improved understanding of the parallel LPRs and the design of high-performance on-chip refractive index sensors.Ultra-narrow linewidth in the extinction spectrum of noble metal nanoparticle arrays induced by the lattice plasmon resonances (LPRs) is of great significance for applications in plasmonic lasers and plasmonic sensors. However, the challenge of sustaining LPRs in an asymmetric environment greatly restricts their practical applications, especially for high-performance on-chip plasmonic sensors. Herein, we fully study the parallel plasmonic-photonic interactions in both the Au nanodisk arrays (NDAs) and the core/shell SiO2/Au nanocylinder arrays (NCAs). Different from the dipolar interactions in the conventionally studied orthogonal coupling, the horizontal propagating electric field introduces the out-of-plane ``hot spots'' and results in electric field delocalization. Through controlling the aspect ratio to manipulate the ``hot spot'' distributions of the localized surface plasmon resonances (LSPRs) in the NCAs, we demonstrate a high-performance refractive index sensor with a wide dynamic range of refractive indexes ranging from 1.0 to 1.5. Both high figure of merit (FOM) and high signal-to-noise ratio (SNR) can be maintained under these detectable refractive indices. Furthermore, the electromagnetic field distributions confirm that the high FOM in the wide dynamic range is attributed to the parallel coupling between the superstrate diffraction orders and the height-induced LSPR modes. Our study on the near-field ``hot-spot'' engineering and far-field parallel coupling paves the way towards improved understanding of the parallel LPRs and the design of high-performance on-chip refractive index sensors. Electronic supplementary information (ESI) available. See DOI: 10.1039/c5nr03159a

  20. Engineering of parallel plasmonic-photonic interactions for on-chip refractive index sensors.

    PubMed

    Lin, Linhan; Zheng, Yuebing

    2015-07-28

    Ultra-narrow linewidth in the extinction spectrum of noble metal nanoparticle arrays induced by the lattice plasmon resonances (LPRs) is of great significance for applications in plasmonic lasers and plasmonic sensors. However, the challenge of sustaining LPRs in an asymmetric environment greatly restricts their practical applications, especially for high-performance on-chip plasmonic sensors. Herein, we fully study the parallel plasmonic-photonic interactions in both the Au nanodisk arrays (NDAs) and the core/shell SiO2/Au nanocylinder arrays (NCAs). Different from the dipolar interactions in the conventionally studied orthogonal coupling, the horizontal propagating electric field introduces the out-of-plane "hot spots" and results in electric field delocalization. Through controlling the aspect ratio to manipulate the "hot spot" distributions of the localized surface plasmon resonances (LSPRs) in the NCAs, we demonstrate a high-performance refractive index sensor with a wide dynamic range of refractive indexes ranging from 1.0 to 1.5. Both high figure of merit (FOM) and high signal-to-noise ratio (SNR) can be maintained under these detectable refractive indices. Furthermore, the electromagnetic field distributions confirm that the high FOM in the wide dynamic range is attributed to the parallel coupling between the superstrate diffraction orders and the height-induced LSPR modes. Our study on the near-field "hot-spot" engineering and far-field parallel coupling paves the way towards improved understanding of the parallel LPRs and the design of high-performance on-chip refractive index sensors. PMID:26133011

  1. Parallel algorithms and applications of configuration-interaction shell-model code BIGSTICK

    NASA Astrophysics Data System (ADS)

    Krastev, Plamen; Johnson, Calvin; Ormand, Erich

    2010-11-01

    Nuclear shell-model, together with two- and three-body interactions, is a powerful tool for gaining insight for properties of light nuclei. The aid of advanced computer resources is of major importance in such calculations. We report on the latest developments and applications of configuration-interaction shell-model code BIGSTICK -- an efficient parallel on-the-fly code which solves the nuclear many-body problem with both two- and three-body interactions. The US Department of Energy supported this investigation through Contract Nos. DE-FG02-96ER40985 and DE-FC02- 09ER41587 and through Subcontract No. B576152 of the Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344.

  2. Nice Guys Finish Fast and Bad Guys Finish Last: Facilitatory vs. Inhibitory Interaction in Parallel Systems

    PubMed Central

    Eidels, Ami; Houpt, Joseph W.; Altieri, Nicholas; Pei, Lei; Townsend, James T.

    2011-01-01

    Systems Factorial Technology is a powerful framework for investigating the fundamental properties of human information processing such as architecture (i.e., serial or parallel processing) and capacity (how processing efficiency is affected by increased workload). The Survivor Interaction Contrast (SIC) and the Capacity Coefficient are effective measures in determining these underlying properties, based on response-time data. Each of the different architectures, under the assumption of independent processing, predicts a specific form of the SIC along with some range of capacity. In this study, we explored SIC predictions of discrete-state (Markov process) and continuous-state (Linear Dynamic) models that allow for certain types of cross-channel interaction. The interaction can be facilitatory or inhibitory: one channel can either facilitate, or slow down processing in its counterpart. Despite the relative generality of these models, the combination of the architecture-oriented plus the capacity oriented analyses provide for precise identification of the underlying system. PMID:21516183

  3. Efficient Parallel Analysis of Shell-fluid Interaction Problem by Using Monolithic Method Based on Consistent Pressure Poisson Equation

    NASA Astrophysics Data System (ADS)

    Ishihara, Daisuke; Kanei, Shigeo; Yoshimura, Shinobu; Horie, Tomoyoshi

    In this paper, a parallel monolithic method for shell-fluid interaction based on the consistent Pressure Poisson Equation (PPE) is developed and its parallel computational efficiency is demonstrated. The Conjugate Gradient (CG) method without any preconditioner works well to solve the consistent PPE, even though the coefficient matrix of the original coupled equation system becomes ill-conditioned due to (a) the inhomogeneity of submatrix elements between the fluid and the structure and (b) the ill-conditioned submatrix of shell structure. Thus our parallel monolithic method using the consistent PPE and the CG method without any preconditioner is efficient for parallel analyses of shell-fluid interaction problems. The present parallel solution procedure is based on the mesh decomposition. To demonstrate the performances of the developed method, it is applied to simulate the vibration of an elastic plate situated in the wake of a rectangular prism and a flapping elastic wing in quiescent fluid.

  4. Determination of interaction forces between parallel dislocations by the evaluation of J integrals of plane elasticity

    NASA Astrophysics Data System (ADS)

    Lubarda, Vlado A.

    2015-05-01

    The Peach-Koehler expressions for the glide and climb components of the force exerted on a straight dislocation in an infinite isotropic medium by another straight dislocation are derived by evaluating the plane and antiplane strain versions of J integrals around the center of the dislocation. After expressing the elastic fields as the sums of elastic fields of each dislocation, the energy momentum tensor is decomposed into three parts. It is shown that only one part, involving mixed products from the two dislocation fields, makes a nonvanishing contribution to J integrals and the corresponding dislocation forces. Three examples are considered, with dislocations on parallel or intersecting slip planes. For two edge dislocations on orthogonal slip planes, there are two equilibrium configurations in which the glide and climb components of the dislocation force simultaneously vanish. The interactions between two different types of screw dislocations and a nearby circular void, as well as between parallel line forces in an infinite or semi-infinite medium, are then evaluated.

  5. Numerical investigation of two interacting parallel thruster-plumes and comparison to experiment

    NASA Astrophysics Data System (ADS)

    Grabe, Martin; Holz, André; Ziegenhagen, Stefan; Hannemann, Klaus

    2014-12-01

    Clusters of orbital thrusters are an attractive option to achieve graduated thrust levels and increased redundancy with available hardware, but the heavily under-expanded plumes of chemical attitude control thrusters placed in close proximity will interact, leading to a local amplification of downstream fluxes and of back-flow onto the spacecraft. The interaction of two similar, parallel, axi-symmetric cold-gas model thrusters has recently been studied in the DLR High-Vacuum Plume Test Facility STG under space-like vacuum conditions, employing a Patterson-type impact pressure probe with slot orifice. We reproduce a selection of these experiments numerically, and emphasise that a comparison of numerical results to the measured data is not straight-forward. The signal of the probe used in the experiments must be interpreted according to the degree of rarefaction and local flow Mach number, and both vary dramatically thoughout the flow-field. We present a procedure to reconstruct the probe signal by post-processing the numerically obtained flow-field data and show that agreement to the experimental results is then improved. Features of the investigated cold-gas thruster plume interaction are discussed on the basis of the numerical results.

  6. Parallel kinetic Monte Carlo simulation framework incorporating accurate models of adsorbate lateral interactions

    NASA Astrophysics Data System (ADS)

    Nielsen, Jens; d'Avezac, Mayeul; Hetherington, James; Stamatakis, Michail

    2013-12-01

    Ab initio kinetic Monte Carlo (KMC) simulations have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These simulations necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for simulating catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce parallelization with OpenMP. We further benchmark our framework by simulating a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.

  7. Parallel kinetic Monte Carlo simulation framework incorporating accurate models of adsorbate lateral interactions.

    PubMed

    Nielsen, Jens; d'Avezac, Mayeul; Hetherington, James; Stamatakis, Michail

    2013-12-14

    Ab initio kinetic Monte Carlo (KMC) simulations have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These simulations necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for simulating catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce parallelization with OpenMP. We further benchmark our framework by simulating a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion. PMID:24329081

  8. Structure and drug interactions of parallel-stranded DNA studied by infrared spectroscopy and fluorescence.

    PubMed Central

    Fritzsche, H; Akhebat, A; Taillandier, E; Rippe, K; Jovin, T M

    1993-01-01

    The infrared spectra of three different 25-mer parallel-stranded DNAs (ps-DNA) have been studied. We have used ps-DNAs containing either exclusively dA x dT base pairs or substitution with four dG x dC base pairs and have them compared with their antiparallel-stranded (aps) reference duplexes in a conventional B-DNA conformation. Significant differences have been found in the region of the thymine C = O stretching vibrations. The parallel-stranded duplexes showed characteristic marker bands for the C2 = O2 and C4 = O4 carbonyl stretching vibrations of thymine at 1685 cm-1 and 1668 cm-1, respectively, as compared to values of 1696 cm-1 and 1663 cm-1 for the antiparallel-stranded reference duplexes. The results confirm previous studies indicating that the secondary structure in parallel-stranded DNA is established by reversed Watson--Crick base pairing of dA x dT with hydrogen bonds between N6H...O2 and N1...HN3. The duplex structure of the ps-DNA is much more sensitive to dehydration than that of the aps-DNA. Interaction with three drugs known to bind in the minor groove of aps-DNA--netropsin, distamycin A and Hoechst 33258--induces shifts of the C = O stretching vibrations of ps-DNA even at low ratio of drug per DNA base pair. These results suggest a conformational change of the ps-DNA to optimize the DNA-drug interaction. As demonstrated by excimer fluorescence of strands labeled with pyrene at the 5'-end, the drugs induce dissociation of the ps-DNA duplex with subsequent formation of imperfectly matched aps-DNA to allow the more favorable drug binding to aps-DNA. Similarly, attempts to form a triple helix of the type d(T)n.d(A)n.d(T)n with ps-DNA failed and resulted in the dissociation of the ps-DNA duplex and reformation of a triple helix based upon an aps-DNA duplex core d(T)10.d(A)10. PMID:7504812

  9. Parallel Programming with Interacting Processes Peiyi Tang 1 and Yoichi Muraoka 2

    E-print Network

    Tang, Peiyi

    believe that IP is a good candi- date for the mainstream programming model for the both parallel to be hard. The productivity of parallel software development is still low. One of the reasons for the gap

  10. A Grid Service for the Interactive Use of a Parallel Non-Rigid Registration Algorithm of Medical Images

    E-print Network

    Ayache, Nicholas

    of a cluster, we have build a registration grid service. On the user side, the system is composed investments in hospitals. The system exhibits good performances even if the user is connected to the grid1 A Grid Service for the Interactive Use of a Parallel Non-Rigid Registration Algorithm of Medical

  11. Highly scalable parallel implementation of turbulent collision of aerodynamically interacting cloud droplets

    NASA Astrophysics Data System (ADS)

    Parishani, Hossein; Ayala, Orlando; Wang, Lian-Ping; Rosa, Bogdan; Grabowski, Wojciech

    2011-11-01

    Hybrid direct numerical simulation (HDNS) has advanced our understanding of turbulent collision-coalescence of cloud droplets. In this approach, the background fluid turbulence is simulated by a pseudospectral method and disturbance flows of droplets are treated analytically. To better realize its potential on PetaScale computers with ~100,000 processors, here we implement and test a parallel implementation using two-dimensional domain decomposition. The purpose is to increase both the range of flow scales and the number of droplets realizable in the simulations, so the dependence of collision statistics on flow Reynolds number and droplet size can be explored. We expect that the 2D domain-decomposition HDNS code can be used to produce statistics of aerodynamically-interacting droplets with Taylor microscale flow Reynolds number R? up to ~ 1000 and a system of O (107) polydisperse droplets. We will present the implementation details as well as results of turbulent collision statistics (e.g., collision kernel, radial distribution function, relative velocity statistics) of sedimenting cloud droplets from our latest high-resolution HDNS. Work supported by NSF and NCAR.

  12. LPIC++ a parallel one-dimensional relativistic electromagnetic Particle-In-Cell code for simulating laser-plasma-interaction

    NASA Astrophysics Data System (ADS)

    Pfund, R. E. W.; Lichters, R.; Meyer-ter-Vehn, J.

    1998-02-01

    We report on a recently developed electromagnetic relativistic 1D3V (one spatial, three velocity dimensions) Particle-In-Cell code for simulating laser-plasma interaction at normal and oblique incidence. The code is written in C++ and easy to extend. The data structure is characterized by the use of chained lists for the grid cells as well as particles belonging to one cell. The parallel version of the code is based on PVM. It splits the grid into several spatial domains each belonging to one processor. Since particles can cross boundaries of cells as well as domains, the processor loads will generally change in time. This is counteracted by adjusting the domain sizes dynamically, for which the use of chained lists has proven to be very convenient. Moreover, an option for restarting the simulation from intermediate stages of the time evolution has been implemented even in the parallel version. The code will be published and distributed freely.

  13. Structural Studies on Porphyrin-PNA Conjugates in Parallel PNA:PNA Duplexes: Effect of Stacking Interactions on Helicity.

    PubMed

    Accetta, Alessandro; Petrovic, Ana G; Marchelli, Rosangela; Berova, Nina; Corradini, Roberto

    2015-12-01

    Parallel PNA:PNA duplexes were synthesized and conjugated with meso-tris(pyridyl)phenylporphyrin carboxylic acid at the N-terminus. The introduction of one porphyrin unit was shown to affect slightly the stability of the PNA:PNA parallel duplex, whereas the presence of two porphyrin units at the same end resulted in a dramatic increase of the melting temperature, accompanied by hysteresis between melting and cooling curves. The circular dichroism (CD) profile of the Soret band and fluorescence quenching strongly support the occurrence of a face-to-face interaction between the two porphyrin units. Introduction of a L-lysine residue at the C-terminal of one strand of the parallel duplex induced a left-handed helical structure in the PNA:PNA duplex if the latter contains only one or no porphyrin moiety. The left-handed helicity was revealed by nucleobase CD profile at 240-280 nm and by the induced-CD observed in the presence of the DiSC2 (5) cyanine dye at ~500-550 nm. Surprisingly, the presence of two porphyrin units led to the disappearance of the nucleobase CD signal and the absence of CD exciton coupling within the Soret band region. In addition, a dramatic decrease of induced CD of DiSC2 (5) was observed. These results are in agreement with a model where the porphyrin-porphyrin interactions cause partial loss of chirality of the PNA:PNA parallel duplex, forcing it to adopt a ladder-like conformation. Chirality 27:864-874, 2015. © 2015 Wiley Periodicals, Inc. PMID:26412743

  14. Request queues for interactive clients in a shared file system of a parallel computing system

    DOEpatents

    Bent, John M.; Faibish, Sorin

    2015-08-18

    Interactive requests are processed from users of log-in nodes. A metadata server node is provided for use in a file system shared by one or more interactive nodes and one or more batch nodes. The interactive nodes comprise interactive clients to execute interactive tasks and the batch nodes execute batch jobs for one or more batch clients. The metadata server node comprises a virtual machine monitor; an interactive client proxy to store metadata requests from the interactive clients in an interactive client queue; a batch client proxy to store metadata requests from the batch clients in a batch client queue; and a metadata server to store the metadata requests from the interactive client queue and the batch client queue in a metadata queue based on an allocation of resources by the virtual machine monitor. The metadata requests can be prioritized, for example, based on one or more of a predefined policy and predefined rules.

  15. Wave-particle interaction in parallel transport of long mean-free-path plasmas along open field magnetic field lines

    NASA Astrophysics Data System (ADS)

    Guo, Zehua; Tang, Xianzhu

    2012-03-01

    A tokamak fusion reactor dumps a large amount of heat and particle flux to the divertor through the scrape-off plasma (SOL). Situation exists either by necessity or through deliberate design that the SOL plasma attains long mean-free-path along large segments of the open field lines. The rapid parallel streaming of electrons requires a large parallel electric field to maintain ambipolarity. The confining effect of the parallel electric field on electrons leads to a trap/passing boundary in the velocity space for electrons. In the normal situation where the upstream electron source populates both the trapped and passing region, a mechanism must exist to produce a flux across the electron trap/passing boundary. In a short mean-free-path plasma, this is provided by collisions. For long mean-free-path plasmas, wave-particle interaction is the primary candidate for detrapping the electrons. Here we present simulation results and a theoretical analysis using a model distribution function of trapped electrons. The dominating electromagnetic plasma instability and the associated collisionless scattering, that produces both particle and energy fluxes across the electron trap/passing boundary in velocity space, are discussed.

  16. Interacting parallel pathways associate sounds with visual identity in auditory cortices.

    PubMed

    Ahveninen, Jyrki; Huang, Samantha; Ahlfors, Seppo P; Hämäläinen, Matti; Rossi, Stephanie; Sams, Mikko; Jääskeläinen, Iiro P

    2016-01-01

    Spatial and non-spatial information of sound events is presumably processed in parallel auditory cortex (AC) "what" and "where" streams, which are modulated by inputs from the respective visual-cortex subsystems. How these parallel processes are integrated to perceptual objects that remain stable across time and the source agent's movements is unknown. We recorded magneto- and electroencephalography (MEG/EEG) data while subjects viewed animated video clips featuring two audiovisual objects, a black cat and a gray cat. Adaptor-probe events were either linked to the same object (the black cat meowed twice in a row in the same location) or included a visually conveyed identity change (the black and then the gray cat meowed with identical voices in the same location). In addition to effects in visual (including fusiform, middle temporal or MT areas) and frontoparietal association areas, the visually conveyed object-identity change was associated with a release from adaptation of early (50-150ms) activity in posterior ACs, spreading to left anterior ACs at 250-450ms in our combined MEG/EEG source estimates. Repetition of events belonging to the same object resulted in increased theta-band (4-8Hz) synchronization within the "what" and "where" pathways (e.g., between anterior AC and fusiform areas). In contrast, the visually conveyed identity changes resulted in distributed synchronization at higher frequencies (alpha and beta bands, 8-32Hz) across different auditory, visual, and association areas. The results suggest that sound events become initially linked to perceptual objects in posterior AC, followed by modulations of representations in anterior AC. Hierarchical what and where pathways seem to operate in parallel after repeating audiovisual associations, whereas the resetting of such associations engages a distributed network across auditory, visual, and multisensory areas. PMID:26419388

  17. MPI parallelization of Vlasov codes for the simulation of nonlinear laser-plasma interactions

    NASA Astrophysics Data System (ADS)

    Savchenko, V.; Won, K.; Afeyan, B.; Decyk, V.; Albrecht-Marc, M.; Ghizzo, A.; Bertrand, P.

    2003-10-01

    The simulation of optical mixing driven KEEN waves [1] and electron plasma waves [1] in laser-produced plasmas require nonlinear kinetic models and massive parallelization. We use Massage Passing Interface (MPI) libraries and Appleseed [2] to solve the Vlasov Poisson system of equations on an 8 node dual processor MAC G4 cluster. We use the semi-Lagrangian time splitting method [3]. It requires only row-column exchanges in the global data redistribution, minimizing the total number of communications between processors. Recurrent communication patterns for 2D FFTs involves global transposition. In the Vlasov-Maxwell case, we use splitting into two 1D spatial advections and a 2D momentum advection [4]. Discretized momentum advection equations have a double loop structure with the outer index being assigned to different processors. We adhere to a code structure with separate routines for calculations and data management for parallel computations. [1] B. Afeyan et al., IFSA 2003 Conference Proceedings, Monterey, CA [2] V. K. Decyk, Computers in Physics, 7, 418 (1993) [3] Sonnendrucker et al., JCP 149, 201 (1998) [4] Begue et al., JCP 151, 458 (1999)

  18. A Lightweight Remote Parallel Visualization Platform for Interactive Massive Time-varying Climate Data Analysis

    NASA Astrophysics Data System (ADS)

    Li, J.; Zhang, T.; Huang, Q.; Liu, Q.

    2014-12-01

    Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.

  19. Prediction of BVI noise patterns and correlation with wake interaction locations

    NASA Technical Reports Server (NTRS)

    Marcolini, Michael A.; Martin, Ruth M.; Lorber, Peter F.; Egolf, T. A.

    1992-01-01

    High resolution fluctuating airloads data were acquired during a test of a contemporary design United Technologies model rotor in the Duits-Nederlandse Windtunnel (DNW). The airloads are used as input to the noise prediction program WOPWOP, in order to predict the blade-vortex interaction (BVI) noise field on a large plane below the rotor. Trends of predicted advancing and retreating side BVI noise levels and directionality as functions of flight condition are presented. The measured airloads have been analyzed to determine the BVI locations on the blade surface, and are used to interpret the predicted BVI noise radiation patterns. Predicted BVI locations are obtained using the free wake model in CAMRAD/JA, the UTRC Generalized Forward Flight Distorted Wake Model, and the UTRC FREEWAKE analysis. These predicted BVI locations are compared with those obtained from the measured pressure data.

  20. Parallel adaptive fluid-structure interaction simulation of explosions impacting on building structures

    SciTech Connect

    Deiterding, Ralf; Wood, Stephen L

    2013-01-01

    We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside simulations verifying the coupled fluid-structure solver and assessing its parallel scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the simulation of a prototypical blast explosion in a realistic multistory building are presented.

  1. Simulation of the Quasi-Monoenergetic Protons Generation by Parallel Laser Pulses Interaction with Foils

    NASA Astrophysics Data System (ADS)

    Wang, Wei-Quan; Yin, Yan; Zou, De-Bin; Yu, Tong-Pu; Yang, Xiao-Hu; Xu, Han; Yu, Ming-Yang; Ma, Yan-Yun; Zhuo, Hong-Bin; Shao, Fu-Qiu

    2014-11-01

    A new scheme of radiation pressure acceleration for generating high-quality protons by using two overlapping-parallel laser pulses is proposed. Particle-in-cell simulation shows that the overlapping of two pulses with identical Gaussian profiles in space and trapezoidal profiles in the time domain can result in a composite light pulse with a spatial profile suitable for stable acceleration of protons to high energies. At ~2.46 × 1021 W/cm2 intensity of the combination light pulse, a quasi-monoenergetic proton beam with peak energy ~200 MeV/nucleon, energy spread <15%, and divergency angle <4° is obtained, which is appropriate for tumor therapy. The proton beam quality can be controlled by adjusting the incidence points of two laser pulses.

  2. RKKY Interaction and the Nature of the Ground State of Double Dots in Parallel

    SciTech Connect

    Kulkarni, M.; Konik, R.

    2011-06-23

    We argue through a combination of slave-boson mean-field theory and the Bethe ansatz that the ground state of closely spaced double quantum dots in parallel coupled to a single effective channel are Fermi liquids. We do so by studying the dots conductance, impurity entropy, and spin correlation. In particular, we find that the zero-temperature conductance is characterized by the Friedel sum rule, a hallmark of Fermi-liquid physics, and that the impurity entropy vanishes in the limit of zero temperature, indicating that the ground state is a singlet. This conclusion is in opposition to a number of numerical renormalization-group studies. We suggest a possible reason for the discrepancy.

  3. The displacement field characterization of two interacting parallel edge cracks in a finite body 

    E-print Network

    Keener, Todd Whitney

    1996-01-01

    collocation analysis procedure was used, along with geometric moir6 experiments and finite element models, to both verify the validity of the model and investigate the interaction effects of the two cracks. Results were obtained for cases in which the length...

  4. Guided Analysis of Hurricane Trends Using Statistical Processes Integrated with Interactive Parallel Coordinates

    E-print Network

    Swan II, J. Edward

    Guided Analysis of Hurricane Trends Using Statistical Processes Integrated with Interactive. The system's utility is demonstrated with an extensive hurricane climate study that was conducted by a hurricane expert. In the study, the expert used a new data set of environmental weather data, composed of 28

  5. The grid-based fast multipole method - a massively parallel numerical scheme for calculating two-electron interaction energies.

    PubMed

    Toivanen, Elias A; Losilla, Sergio A; Sundholm, Dage

    2015-11-25

    Algorithms and working expressions for a grid-based fast multipole method (GB-FMM) have been developed and implemented. The computational domain is divided into cubic subdomains, organized in a hierarchical tree. The contribution to the electrostatic interaction energies from pairs of neighboring subdomains is computed using numerical integration, whereas the contributions from further apart subdomains are obtained using multipole expansions. The multipole moments of the subdomains are obtained by numerical integration. Linear scaling is achieved by translating and summing the multipoles according to the tree structure, such that each subdomain interacts with a number of subdomains that are almost independent of the size of the system. To compute electrostatic interaction energies of neighboring subdomains, we employ an algorithm which performs efficiently on general purpose graphics processing units (GPGPU). Calculations using one CPU for the FMM part and 20 GPGPUs consisting of tens of thousands of execution threads for the numerical integration algorithm show the scalability and parallel performance of the scheme. For calculations on systems consisting of Gaussian functions (? = 1) distributed as fullerenes from C20 to C720, the total computation time and relative accuracy (ppb) are independent of the system size. PMID:26006111

  6. Parallel Three-Dimensional Computation of Fluid Dynamics and Fluid-Structure Interactions of Ram-Air Parachutes

    NASA Technical Reports Server (NTRS)

    Tezduyar, Tayfun E.

    1998-01-01

    This is a final report as far as our work at University of Minnesota is concerned. The report describes our research progress and accomplishments in development of high performance computing methods and tools for 3D finite element computation of aerodynamic characteristics and fluid-structure interactions (FSI) arising in airdrop systems, namely ram-air parachutes and round parachutes. This class of simulations involves complex geometries, flexible structural components, deforming fluid domains, and unsteady flow patterns. The key components of our simulation toolkit are a stabilized finite element flow solver, a nonlinear structural dynamics solver, an automatic mesh moving scheme, and an interface between the fluid and structural solvers; all of these have been developed within a parallel message-passing paradigm.

  7. DNS of hydrodynamically interacting droplets in turbulent clouds: Parallel implementation and scalability analysis using 2D domain decomposition

    NASA Astrophysics Data System (ADS)

    Ayala, Orlando; Parishani, Hossein; Chen, Liu; Rosa, Bogdan; Wang, Lian-Ping

    2014-12-01

    The study of turbulent collision of cloud droplets requires simultaneous considerations of the transport by background air turbulence (i.e., geometric collision rate) and influence of droplet disturbance flows (i.e., collision efficiency). In recent years, this multiscale problem has been addressed through a hybrid direct numerical simulation (HDNS) approach (Ayala et al., 2007). This approach, while currently is the only viable tool to quantify the effects of air turbulence on collision statistics, is computationally expensive. In order to extend the HDNS approach to higher flow Reynolds numbers, here we developed a highly scalable implementation of the approach using 2D domain decomposition. The scalability of the parallel implementation was studied using several parallel computers, at 5123 and 10243 grid resolutions with O(106)-O(107) droplets. It was found that the execution time scaled with number of processors almost linearly until it saturates and deteriorates due to communication latency issues. To better understand the scalability, we developed a complexity analysis by partitioning the execution tasks into computation, communication, and data copy. Using this complexity analysis, we were able to predict the scalability performance of our parallel code. Furthermore, the theory was used to estimate the maximum number of processors below which the approximately linear scalability is sustained. We theoretically showed that we could efficiently solved problems of up to 81923 with O(100,000) processors. The complexity analysis revealed that the pseudo-spectral simulation of background turbulent flow for a dilute droplet suspension typical of cloud conditions typically takes about 80% of the total execution time, except when the droplets are small (less than 5 ?m in a flow with energy dissipation rate of 400 cm2/s3 and liquid water content of 1 g/m3), for which case the particle-particle hydrodynamic interactions become the bottleneck. The complexity analysis was also used to explore some alternative methods to handle FFT calculations within the flow simulation and to advance droplets less than 5 ?m in radius, for better computational efficiency. Finally, preliminary results are reported to shed light on the Reynolds number-dependence of collision kernel of non-interacting droplets.

  8. Massively parallel full configuration interaction. Benchmark electronic structure calculations on the Intel Touchstone Delta

    SciTech Connect

    Harrison, R.J.; Stahlberg, E.A.

    1994-10-01

    We describe an implementation of the benchmark ab initio electronic structure full configuration interaction model on the Intel Touchstone Delta. Its performance is demonstrated with several calculations, the largest of which (95 million configurations, 418 million determinants) is the largest full-CI calculation yet completed. The feasibility of calculations with over one billion configurations is discussed. A sustained computation rate in excess of 4 GFLOP/s on 512 processors is achieved, with an average aggregate communication rate of 155 Mbytes/s. Data-compression techniques and a modified diagonalization method were required to minimize I/O. The object-oriented design has increased portability and provides the distinction between local and non-local data essential for use of a distributed-data model.

  9. Quantitative analysis of RNA-protein interactions on a massively parallel array for mapping biophysical and evolutionary landscapes

    PubMed Central

    Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.

    2015-01-01

    RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714

  10. Electrostatic interactions control the parallel and antiparallel orientation of alpha-helical chains in two-stranded alpha-helical coiled-coils.

    PubMed

    Monera, O D; Kay, C M; Hodges, R S

    1994-04-01

    The role of interchain electrostatic interactions in orientating alpha-helical chains to form two-stranded parallel and antiparallel coiled-coils has been investigated. Four disulfide-bridged coiled-coils were designed: parallel coiled-coils with interchain electrostatic attractions (P/A) and repulsions (P/R) and antiparallel coiled-coils with interchain electrostatic attractions (AP/A) and repulsions (AP/R). These coiled-coils were made by air oxidation of two 35-residue peptides with the appropriate heptad repeat (LaEbAcLdEeGfKg or LaAbEcLdKeGfEg) to give the desired interchain electrostatic interactions, and the appropriate position of the cysteine residue (C2 or C33) to give the desired chain orientation. The coiled-coils were characterized by circular dichroism spectroscopy, and their stabilities were assessed by guanidine hydrochloride and urea denaturations. The results indicated that the favored chain orientation, that is, the major disulfide-bridged product formed under benign conditions, was the one that provides interchain electrostatic attractions between oppositely-charged amino acid residues in the e-g' and g-e' positions of the parallel coiled-coil and the g-g' and e-e' positions in the antiparallel coiled-coil. When the electrostatic interactions were similar, the antiparallel coiled-coils were more stable than the parallel coiled-coils. However, the overall stability of the coiled-coils was either increased by interchain electrostatic attractions or decreased by interchain electrostatic repulsions, as determined by urea denaturation. Thus, the order of overall stability of these coiled-coils was AP/A > P/A > AP/R > P/R. This study demonstrates the importance of interchain electrostatic interactions in determining the parallel or antiparallel orientation of alpha-helical chains in two-stranded coiled-coils. PMID:8142389

  11. Development of a Multi-Grids Approach into a Parallelized Hybrid Model to Describe Ganymede's Interaction with the Jovian Plasma

    NASA Astrophysics Data System (ADS)

    Leclercq, L.; Modolo, R.; Leblanc, F.; Hess, S. L.; Andre, N.

    2014-12-01

    Ganymede is the only satellite which has its own magnetosphere, which is embedded in the Jovian magnetosphere (Kivelson et al. 1996). This peculiar interaction has been investigated by means of a 3D parallel multi-species hybrid model based on a CAM-CL algorithm (Mathews et al. 1994). In this formalism, ions have a kinetic description whereas electrons are considered as an inertialess fluid which ensures the neutrality of the plasma and contributes to the total current and electronic pressure. Maxwell's equations are solved to compute the temporal evolution of electromagnetic field. Hybrid simulations are performed on a uniform cartesian grid with a spatial resolution of about 240 km. Our results are globally consistent with other models and Galileo measurements. Nevertheless, our description of the magnetopause and the ionosphere is not satisfying enough due to the low spatial resolution. Indeed, we want to describe scale heights of 125 km in the ionosphere whereas the best spatial resolution that we are allowed to use is about 240 km. Therefore, in order to obtain more efficient and relevant results, it is necessary to improve the size of the grid. In this optic, we are introducing a multi-grids approach in order to refine the spatial resolution by a factor 2 (~120km) near Ganymede. The creation of a finer mesh in the simulation grid leads to make some peculiar computations at the interfaces between the two different grids, whether for the calculation of moments, such as charge density or current, or the computation of electromagnetic fields. Moreover, the parallelization of the code, based on domain decomposition methods, imposes us to take care of boundary conditions. In the hybrid model, macroparticules, which represent a kind of cloud of physical particles, have a volume equal to that of a grid cell. Then, the macroparticules entering into the higher spatial resolution region are splited into smaller macroparticules whose the volume corresponds to the volume of a cell of the finer mesh. The improvement of the spatial resolution in the hybrid model will also allow us to relevantly couple the results of this model with those of our 3D multi-species exospheric model (Turc et al. 2014), into a test-particle model that describes the ionosphere of Ganymede. Basic tests and validation results of the multi-grids approach are presented.

  12. An experimental investigation of the chopping of helicopter main rotor tip vortices by the tail rotor

    NASA Technical Reports Server (NTRS)

    Ahmadi, A. R.

    1984-01-01

    The chopping of helicopter main rotor tip vortices by the tail rotor was experimentally investigated. This is a problem of blade vortex interaction (BVI) at normal incidence where the vortex is generally parallel to the rotor axis. The experiment used a model rotor and an isolated vortex and was designed to isolate BVI noise from other types of rotor noise. Tip Mach number, radical BVI station, and free stream velocity were varied. Fluctuating blade pressures, farfield sound pressure level and directivity, velocity field of the incident vortex, and blade vortex interaction angles were measured. Blade vortex interaction was found to produce impulsive noise which radiates primarily ahead of the blade. For interaction away from the blade tip, the results demonstrate the dipole character of BVI radiation. For BVI close to the tip, three dimensional relief effect reduces the intensity of the interaction, despite larger BVI angle and higher local Mach number. Furthermore, in this case, the radiation patern is more complex due to diffraction at and pressure communication around the tip.

  13. Comparison of density functional theory and field approaches to van der Waals interactions in plan parallel geometry

    E-print Network

    Podgornik, Rudolf

    the two methods explicitly in the case of a plan parallel geometry with a continuously varying dielectric ceramics5 or interfaces and grain boundaries in perovskite based elec- tronic ceramics.6 Understanding

  14. Parallel Newton-Krylov-Schwarz algorithms for the three-dimensional Poisson-Boltzmann equation in numerical simulation of colloidal particle interactions

    NASA Astrophysics Data System (ADS)

    Hwang, Feng-Nan; Cai, Shang-Rong; Shao, Yun-Long; Wu, Jong-Shinn

    2010-09-01

    We investigate fully parallel Newton-Krylov-Schwarz (NKS) algorithms for solving the large sparse nonlinear systems of equations arising from the finite element discretization of the three-dimensional Poisson-Boltzmann equation (PBE), which is often used to describe the colloidal phenomena of an electric double layer around charged objects in colloidal and interfacial science. The NKS algorithm employs an inexact Newton method with backtracking (INB) as the nonlinear solver in conjunction with a Krylov subspace method as the linear solver for the corresponding Jacobian system. An overlapping Schwarz method as a preconditioner to accelerate the convergence of the linear solver. Two test cases including two isolated charged particles and two colloidal particles in a cylindrical pore are used as benchmark problems to validate the correctness of our parallel NKS-based PBE solver. In addition, a truly three-dimensional case, which models the interaction between two charged spherical particles within a rough charged micro-capillary, is simulated to demonstrate the applicability of our PBE solver to handle a problem with complex geometry. Finally, based on the result obtained from a PC cluster of parallel machines, we show numerically that NKS is quite suitable for the numerical simulation of interaction between colloidal particles, since NKS is robust in the sense that INB is able to converge within a small number of iterations regardless of the geometry, the mesh size, the number of processors. With help of an additive preconditioned Krylov subspace method NKS achieves parallel efficiency of 71% or better on up to a hundred processors for a 3D problem with 5 million unknowns.

  15. Interaction of an oblique shock wave with a pair of parallel vortices: Shock dynamics and mechanism of sound generation

    E-print Network

    Zhang, Yong-Tao

    and the mechanism of sound generation in the interaction between an oblique shock wave and a pair of vortices. We is related to the interaction of the reflected shock waves and sound waves. The first mechanism is dominating affected by the interaction of the reflected shock waves and sound waves. © 2006 American Institute

  16. Targeted Tuning of Interactive Forces by Engineering of Molecular Bonds in Series and Parallel Using Peptide-Based Adhesives.

    PubMed

    Utzig, Thomas; Stock, Philipp; Raman, Sangeetha; Valtiner, Markus

    2015-10-13

    Polymer-mediated adhesion plays a major role for both technical glues and biological processes like self-assembly or biorecognition. In contrast to engineering systems, adhesive strength in biological systems is precisely tuned via well-adjusted arrangement of individual bonds. How adhesion may be engineered by arrangement of individual bonds is however not yet well-understood. Here we show how the number of bonds in series and parallel can significantly influence adhesion forces using specifically designed surface-bridging peptides. We directly measure how adhesion forces between -COOH and -NH2 functionalized surfaces across aqueous media vary as a function of the number of bonds in parallel. We also introduce surface bridging peptide sequences that are similarly end-functionalized with amines and carboxylic acid. Compared to single molecular junctions, adhesive strength mediated by these surface bridging peptides decreases by a factor of 2 for adhesive junctions that consist of two acid/base bonds in series. Furthermore, adhesive strength varies with the density of bonds in parallel. For dense systems, we observe that the formation of a bridging peptide monolayer is sterically hindered and therefore adhesion is further reduced significantly by 20%. Our results unravel how the arrangement of individual bonds in an adhesive junction allows for a wide tuning of adhesive strength on the basis of utilizing just one single specific bond. As such, for peptide adhesives it is essential to consider bonds in parallel in a wide range of applications where both high adhesion and triggered release of adhesive bonds is essential. PMID:26382013

  17. Aeroacoustic theory for noncompact wing-gust interaction

    NASA Technical Reports Server (NTRS)

    Martinez, R.; Widnall, S. E.

    1981-01-01

    Three aeroacoustic models for noncompact wing-gust interaction were developed for subsonic flow. The first is that for a two dimensional (infinite span) wing passing through an oblique gust. The unsteady pressure field was obtained by the Wiener-Hopf technique; the airfoil loading and the associated acoustic field were calculated, respectively, by allowing the field point down on the airfoil surface, or by letting it go to infinity. The second model is a simple spanwise superposition of two dimensional solutions to account for three dimensional acoustic effects of wing rotation (for a helicopter blade, or some other rotating planform) and of finiteness of wing span. A three dimensional theory for a single gust was applied to calculate the acoustic signature in closed form due to blade vortex interaction in helicopters. The third model is that of a quarter infinite plate with side edge through a gust at high subsonic speed. An approximate solution for the three dimensional loading and the associated three dimensional acoustic field in closed form was obtained. The results reflected the acoustic effect of satisfying the correct loading condition at the side edge.

  18. Stochastic gyroresonant electron acceleration in a low-beta plasma. I - Interaction with parallel transverse cold plasma waves

    NASA Technical Reports Server (NTRS)

    Steinacker, Juergen; Miller, James A.

    1992-01-01

    The gyroresonance of electrons with parallel transverse cold plasma waves is considered, and the Fokker-Planck equation describing the evolution of the electron distribution function in the presence of a spectrum of turbulence is derived. A new resonance which produces a divergence in the Fokker-Planck coefficients is identified; it results when the electron is in gyroresonance with a wave that has a group velocity equal to the velocity of the electron along the magnetic field. Under the assumption of a power-law spectral density, the Fokker-Planck coefficients are calculated numerically, and their complicated momentum and pitch-angle dependence, as well as the influence of various approximations to the dispersion relation, gyroresonance condition, and spectral density are discussed. It is found that there is no resonance gap at any pitch angle as long as the full gyroresonance condition is used and waves propagating on both directions are present.

  19. Interaction between parallel polymer fibers insonificated by ultrasound of low/mild intensity: an analytical theory and experiments.

    PubMed

    Wu, Junru; Chen, Di; Langevin, Helene M; Nyborg, Wesley L

    2012-03-01

    The purpose of this article is to develop a simple mathematical model to address some bioeffects which may be caused by a static attractive force between two long neighboring parallel thin fibers (for example, a pair of collagen bundles of connective tissue) when they are insonificated by a continuous (CW) traveling plane ultrasound (US) under the condition that the fiber length (L)?the distance between them (h) and h?the wavelength of US (?). The theory predicts that there is an attractive force between these fibers when they are exposed to the CW US with an intensity of a magnitude of 100mW/cm(2). The relationship between the relative approaching velocity of the fibers and the acoustic pressure amplitude can be calculated using the theory. An experiment was performed to verify the theoretical predictions. A plastic test chamber (diameter × height=6mm × 3.5mm) with a cap made of a sound-absorbing material and filled full with distilled water was placed on a microscope stage. A polymer fiber pair of 100?m diameter (d) and 4mm length (L) were immersed in water and aligned parallel in a plane which is normal to the US propagation direction. They floated at the central area of the chamber and h ?10d. A 25mm diameter, 1MHz quartz crystal was used as an ultrasound source as well as the bottom of the test chamber. The quartz crystal was gold-coated on both sides, but a 5mm diameter center was left transparent (electrode free) to enable optical observation via a microscope. The maximum acoustic intensity, I(max), of the CW wave generated by the source was set at 300mW/cm(2); the corresponding acoustic pressure amplitude was 100kPa. The magnitude of the average approaching velocity of the fiber pair due to the attractive force was found in agreement with that predicted by the theory. PMID:22099253

  20. Parallel rendering

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  1. Massively parallel multiple interacting continua formulation for modeling flow in fractured porous media using the subsurface reactive flow and transport code PFLOTRAN

    NASA Astrophysics Data System (ADS)

    Kumar, J.; Mills, R. T.; Lichtner, P. C.; Hammond, G. E.

    2010-12-01

    Fracture dominated flows occur in numerous subsurface geochemical processes and at many different scales in rock pore structures, micro-fractures, fracture networks and faults. Fractured porous media can be modeled as multiple interacting continua which are connected to each other through transfer terms that capture the flow of mass and energy in response to pressure, temperature and concentration gradients. However, the analysis of large-scale transient problems using the multiple interacting continuum approach presents an algorithmic and computational challenge for problems with very large numbers of degrees of freedom. A generalized dual porosity model based on the Dual Continuum Disconnected Matrix approach has been implemented within a massively parallel multiphysics-multicomponent-multiphase subsurface reactive flow and transport code PFLOTRAN. Developed as part of the Department of Energy's SciDAC-2 program, PFLOTRAN provides subsurface simulation capabilities that can scale from laptops to ultrascale supercomputers, and utilizes the PETSc framework to solve the large, sparse algebraic systems that arises in complex subsurface reactive flow and transport problems. It has been successfully applied to the solution of problems composed of more than two billions degrees of freedom, utilizing up to 131,072 processor cores on Jaguar, the Cray XT5 system at Oak Ridge National Laboratory that is the world’s fastest supercomputer. Building upon the capabilities and computational efficiency of PFLOTRAN, we will present an implementation of the multiple interacting continua formulation for fractured porous media along with an application case study.

  2. Spin accumulation in the parallel-coupled double quantum dots with Rashba spin-orbit interaction connected with ferromagnetic and superconducting electrodes

    NASA Astrophysics Data System (ADS)

    Ye, Cheng-Zhi; Lu, Wei-Tao; Xu, Chang-Tan

    2015-09-01

    Using the standard nonequilibrium Green’s function techniques, we investigate the effect of Rashba spin-orbit interaction (RSOI) and ferromagnetic electrode on the spin accumulation in the parallel-coupled double quantum dots coupled with a ferromagnetic and a superconducting electrode. It is demonstrated that FM electrode cannot induce the spin polarization of Andreev reflection (AR) current, but can induce the spin accumulation in the QDs. However, RSOI can lead to the spin polarization of AR current as well as the spin accumulation in the QDs. In the existence of RSOI, complete spin-polarized QD can be achieved with negative bias voltage V, which is the most significant advantage of our device. When energy levels ?1 = ?2 = 0 and the interdot coupling strength tc = 0.01, the maximum value of spin accumulation in this paper is obtained as 0.7. The results may be useful on the design of spintronic devices.

  3. Suppression of electron magnetotunneling between parallel two-dimensional GaAs/InAs electron systems by the correlation interaction

    SciTech Connect

    Khanin, Yu. N.; Vdovin, E. E.; Makarovsky, O.; Henini, M.

    2013-09-15

    Magnetotunneling between two-dimensional GaAs/InAs electron systems in vertical resonant tunneling GaAs/InAs/AlAs heterostructures is studied. A new-type of singularity in the tunneling density of states, specifically a dip at the Fermi level, is found; this feature is drastically different from that observed previously for the case of tunneling between two-dimensional GaAs tunnel systems in terms of both the kind of functional dependence and the energy and temperature parameters. As before, this effect manifests itself in the suppression of resonant tunneling in a narrow range near zero bias voltage in a high magnetic field parallel to the current direction. Magnetic-field and temperature dependences of the effect's parameters are obtained; these dependences are compared with available theoretical and experimental data. The observed effect can be caused by a high degree of disorder in two-dimensional correlated electron systems as a result of the introduction of structurally imperfect strained InAs layers.

  4. Research investigation of helicopter main rotor/tail rotor interaction noise

    NASA Technical Reports Server (NTRS)

    Fitzgerald, J.; Kohlhepp, F.

    1988-01-01

    Acoustic measurements were obtained in a Langley 14 x 22 foot Subsonic Wind Tunnel to study the aeroacoustic interaction of 1/5th scale main rotor, tail rotor, and fuselage models. An extensive aeroacoustic data base was acquired for main rotor, tail rotor, fuselage aerodynamic interaction for moderate forward speed flight conditions. The details of the rotor models, experimental design and procedure, aerodynamic and acoustic data acquisition and reduction are presented. The model was initially operated in trim for selected fuselage angle of attack, main rotor tip-path-plane angle, and main rotor thrust combinations. The effects of repositioning the tail rotor in the main rotor wake and the corresponding tail rotor countertorque requirements were determined. Each rotor was subsequently tested in isolation at the thrust and angle of attack combinations for trim. The acoustic data indicated that the noise was primarily dominated by the main rotor, especially for moderate speed main rotor blade-vortex interaction conditions. The tail rotor noise increased when the main rotor was removed indicating that tail rotor inflow was improved with the main rotor present.

  5. Interaction of pyrrolobenzodiazepine (PBD) ligands with parallel intermolecular G-quadruplex complex using spectroscopy and ESI-MS.

    PubMed

    Raju, Gajjela; Srinivas, Ragampeta; Reddy, Vangala Santhosh; Idris, Mohammed M; Kamal, Ahmed; Nagesh, Narayana

    2012-01-01

    Studies on ligand interaction with quadruplex DNA, and their role in stabilizing the complex at concentration prevailing under physiological condition, has attained high interest. Electrospray ionization mass spectrometry (ESI-MS) and spectroscopic studies in solution were used to evaluate the interaction of PBD and TMPyP4 ligands, stoichiometry and selectivity to G-quadruplex DNA. Two synthetic ligands from PBD family, namely pyrene-linked pyrrolo[2,1-c][1,4]benzodiazepine hybrid (PBD1), mixed imine-amide pyrrolobenzodiazepine dimer (PBD2) and 5,10,15,20-tetrakis(N-methyl-4-pyridyl)porphyrin (TMPyP4) were studied. G-rich single-stranded oligonucleotide d(5'GGGGTTGGGG3') designated as d(T(2)G(8)), from the telomeric region of Tetrahymena Glaucoma, was considered for the interaction with ligands. ESI-MS and spectroscopic methods viz., circular dichroism (CD), UV-Visible, and fluorescence were employed to investigate the G-quadruplex structures formed by d(T(2)G(8)) sequence and its interaction with PBD and TMPyP4 ligands. From ESI-MS spectra, it is evident that the majority of quadruplexes exist as d(T(2)G(8))(2) and d(T(2)G(8))(4) forms possessing two to ten cations in the centre, thereby stabilizing the complex. CD band of PBD1 and PBD2 showed hypo and hyperchromicity, on interaction with quadruplex DNA, indicating unfolding and stabilization of quadruplex DNA complex, respectively. UV-Visible and fluorescence experiments suggest that PBD1 bind externally where as PBD2 intercalate moderately and bind externally to G-quadruplex DNA. Further, melting experiments using SYBR Green indicate that PBD1 unfolds and PBD2 stabilizes the G-quadruplex complex. ITC experiments using d(T(2)G(8)) quadruplex with PBD ligands reveal that PBD1 and PBD2 prefer external/loop binding and external/intercalative binding to quadruplex DNA, respectively. From experimental results it is clear that the interaction of PBD2 and TMPyP4 impart higher stability to the quadruplex complex. PMID:22558271

  6. Massively parallel visualization: Parallel rendering

    SciTech Connect

    Hansen, C.D.; Krogh, M.; White, W.

    1995-12-01

    This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.

  7. Parallel Monte Carlo reactor neutronics

    SciTech Connect

    Blomquist, R.N.; Brown, F.B.

    1994-03-01

    The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved.

  8. Parallel Dislocation Simulator

    Energy Science and Technology Software Center (ESTSC)

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  9. Non-equilibrium reaction and relaxation dynamics in a strongly interacting explicit solvent: F + CD3CN treated with a parallel multi-state EVB model

    NASA Astrophysics Data System (ADS)

    Glowacki, David R.; Orr-Ewing, Andrew J.; Harvey, Jeremy N.

    2015-07-01

    We describe a parallelized linear-scaling computational framework developed to implement arbitrarily large multi-state empirical valence bond (MS-EVB) calculations within CHARMM and TINKER. Forces are obtained using the Hellmann-Feynman relationship, giving continuous gradients, and good energy conservation. Utilizing multi-dimensional Gaussian coupling elements fit to explicitly correlated coupled cluster theory, we built a 64-state MS-EVB model designed to study the F + CD3CN ? DF + CD2CN reaction in CD3CN solvent (recently reported in Dunning et al. [Science 347(6221), 530 (2015)]). This approach allows us to build a reactive potential energy surface whose balanced accuracy and efficiency considerably surpass what we could achieve otherwise. We ran molecular dynamics simulations to examine a range of observables which follow in the wake of the reactive event: energy deposition in the nascent reaction products, vibrational relaxation rates of excited DF in CD3CN solvent, equilibrium power spectra of DF in CD3CN, and time dependent spectral shifts associated with relaxation of the nascent DF. Many of our results are in good agreement with time-resolved experimental observations, providing evidence for the accuracy of our MS-EVB framework in treating both the solute and solute/solvent interactions. The simulations provide additional insight into the dynamics at sub-picosecond time scales that are difficult to resolve experimentally. In particular, the simulations show that (immediately following deuterium abstraction) the nascent DF finds itself in a non-equilibrium regime in two different respects: (1) it is highly vibrationally excited, with ˜23 kcal mol-1 localized in the stretch and (2) its post-reaction solvation environment, in which it is not yet hydrogen-bonded to CD3CN solvent molecules, is intermediate between the non-interacting gas-phase limit and the solution-phase equilibrium limit. Vibrational relaxation of the nascent DF results in a spectral blue shift, while relaxation of the post-reaction solvation environment results in a red shift. These two competing effects mean that the post-reaction relaxation profile is distinct from what is observed when Franck-Condon vibrational excitation of DF occurs within a microsolvation environment initially at equilibrium. Our conclusions, along with the theoretical and parallel software framework presented in this paper, should be more broadly applicable to a range of complex reactive systems.

  10. Parallel pipelining

    SciTech Connect

    Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.

    1995-09-01

    In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.

  11. International Conference on Machine Design and Production June 19 22, 2012, Pamukkale, Denizli, Turkey

    E-print Network

    Yaman, Yavuz

    . Cyclic pitch, blade vortex interactions, shock waves, blade stall and aeroelastic coupling of blades problems like the undesired failure in flight and possible accidents should also be avoided. Therefore

  12. High-speed noise of helicopter rotors

    NASA Technical Reports Server (NTRS)

    Shenoy, K. R.

    1982-01-01

    Various parameters of helicopter rotor noise were considered. Impulsive noise, flow regions of a helicopter rotor, noise prediction, aspect ratio, blade tip shape and speed, and blade-vortex interaction noise were among the topics addressed. Recommendations were also given.

  13. Wooded Area 

    E-print Network

    Unknown

    2011-09-05

    A theoretical and experimental investigation is undertaken to determine the effects of an actively deployable trailing edge flap on the disturbances created during blade-vortex interactions (BVI). The theoretical model consists of an unsteady panel...

  14. Monkey 

    E-print Network

    Unknown

    2011-08-17

    A theoretical and experimental investigation is undertaken to determine the effects of an actively deployable trailing edge flap on the disturbances created during blade-vortex interactions (BVI). The theoretical model consists of an unsteady panel...

  15. Getting a feel for parameters: using interactive parallel plots as a tool for parameter identification in the new rainfall-runoff model WALRUS

    NASA Astrophysics Data System (ADS)

    Brauer, Claudia; Torfs, Paul; Teuling, Ryan; Uijlenhoet, Remko

    2015-04-01

    Recently, we developed the Wageningen Lowland Runoff Simulator (WALRUS) to fill the gap between complex, spatially distributed models often used in lowland catchments and simple, parametric models which have mostly been developed for mountainous catchments (Brauer et al., 2014ab). This parametric rainfall-runoff model can be used all over the world in both freely draining lowland catchments and polders with controlled water levels. The open source model code is implemented in R and can be downloaded from www.github.com/ClaudiaBrauer/WALRUS. The structure and code of WALRUS are simple, which facilitates detailed investigation of the effect of parameters on all model variables. WALRUS contains only four parameters requiring calibration; they are intended to have a strong, qualitative relation with catchment characteristics. Parameter estimation remains a challenge, however. The model structure contains three main feedbacks: (1) between groundwater and surface water; (2) between saturated and unsaturated zone; (3) between catchment wetness and (quick/slow) flowroute division. These feedbacks represent essential rainfall-runoff processes in lowland catchments, but increase the risk of parameter dependence and equifinality. Therefore, model performance should not only be judged based on a comparison between modelled and observed discharges, but also based on the plausibility of the internal modelled variables. Here, we present a method to analyse the effect of parameter values on internal model states and fluxes in a qualitative and intuitive way using interactive parallel plotting. We applied WALRUS to ten Dutch catchments with different sizes, slopes and soil types and both freely draining and polder areas. The model was run with a large number of parameter sets, which were created using Latin Hypercube Sampling. The model output was characterised in terms of several signatures, both measures of goodness of fit and statistics of internal model variables (such as the percentage of rain water travelling through the quickflow reservoir). End users can then eliminate parameter combinations with unrealistic outcomes based on expert knowledge using interactive parallel plots. In these plots, for instance, ranges can be selected for each signature and only model runs which yield signature values in these ranges are highlighted. The resulting selection of realistic parameter sets can be used for ensemble simulations. C.C. Brauer, A.J. Teuling, P.J.J.F. Torfs, R. Uijlenhoet (2014a): The Wageningen Lowland Runoff Simulator (WALRUS): a lumped rainfall-runoff model for catchments with shallow groundwater, Geoscientific Model Development, 7, 2313-2332, www.geosci-model-dev.net/7/2313/2014/gmd-7-2313-2014.pdf C.C. Brauer, P.J.J.F. Torfs, A.J. Teuling, R. Uijlenhoet (2014b): The Wageningen Lowland Runoff Simulator (WALRUS): application to the Hupsel Brook catchment and Cabauw polder, Hydrology and Earth System Sciences, 18, 4007-4028, www.hydrol-earth-syst-sci.net/18/4007/2014/hess-18-4007-2014.pdf

  16. Development of a prototype PET scanner with depth-of-interaction measurement using solid-state photomultiplier arrays and parallel readout electronics

    PubMed Central

    Shao, Yiping; Sun, Xishan; Lan, Kejian A.; Bircher, Chad; Lou, Kai; Deng, Zhi

    2014-01-01

    In this study, we developed a prototype animal PET by applying several novel technologies to use the solid-state photomultiplier (SSPM) arrays for measuring the depth-of-interaction (DOI) and improving imaging performance. Each PET detector has an 8×8 array of about 1.9×1.9×30.0 mm3 lutetium-yttrium-oxyorthosilicate (LYSO) scintillators, with each end optically connected to a SSPM array (16-channel in a 4×4 matrix) through a light guide to enable continuous DOI measurement. Each SSPM has an active area of about 3×3 mm2, and its output is read by a custom-developed application-specific-integrated-circuit (ASIC) to directly convert analog signals to digital timing pulses that encode the interaction information. These pulses are transferred to and be decoded by a field-programmable-gate-array (FPGA) based time-to-digital convertor for coincident event selection and data acquisition. The independent readout of each SSPM and the parallel signal process can significantly improve the signal-to-noise ratio and enable using flexible algorithms for different data processes. The prototype PET consists of two rotating detector panels on a portable gantry with four detectors in each panel to provide 16 mm axial and variable transaxial field-of-view (FOV) sizes. List-mode ordered-subset-expectation-maximization image reconstruction was implemented. The measured mean energy, coincidence timing, and DOI resolution for a crystal were about 17.6%, 2.8 ns, and 5.6 mm, respectively. The measured transaxial resolutions at the center of the FOV were 2.0 mm and 2.3 mm for images reconstructed with and without DOI, respectively. In addition, the resolutions across the FOV with DOI were substantially better than those without DOI. The quality of PET images of both a hot-rod phantom and mouse acquired with DOI was much higher than that of images obtained without DOI. This study demonstrates that SSPM arrays and advanced readout/processing electronics can be used to develop a practical DOI-measureable PET scanner. PMID:24556629

  17. Development of a prototype PET scanner with depth-of-interaction measurement using solid-state photomultiplier arrays and parallel readout electronics

    NASA Astrophysics Data System (ADS)

    Shao, Yiping; Sun, Xishan; Lan, Kejian A.; Bircher, Chad; Lou, Kai; Deng, Zhi

    2014-03-01

    In this study, we developed a prototype animal PET by applying several novel technologies to use solid-state photomultiplier (SSPM) arrays to measure the depth of interaction (DOI) and improve imaging performance. Each PET detector has an 8 × 8 array of about 1.9 × 1.9 × 30.0 mm3 lutetium-yttrium-oxyorthosilicate scintillators, with each end optically connected to an SSPM array (16 channels in a 4 × 4 matrix) through a light guide to enable continuous DOI measurement. Each SSPM has an active area of about 3 × 3 mm2, and its output is read by a custom-developed application-specific integrated circuit to directly convert analogue signals to digital timing pulses that encode the interaction information. These pulses are transferred to and are decoded by a field-programmable gate array-based time-to-digital convertor for coincident event selection and data acquisition. The independent readout of each SSPM and the parallel signal process can significantly improve the signal-to-noise ratio and enable the use of flexible algorithms for different data processes. The prototype PET consists of two rotating detector panels on a portable gantry with four detectors in each panel to provide 16 mm axial and variable transaxial field-of-view (FOV) sizes. List-mode ordered subset expectation maximization image reconstruction was implemented. The measured mean energy, coincidence timing and DOI resolution for a crystal were about 17.6%, 2.8 ns and 5.6 mm, respectively. The measured transaxial resolutions at the center of the FOV were 2.0 mm and 2.3 mm for images reconstructed with and without DOI, respectively. In addition, the resolutions across the FOV with DOI were substantially better than those without DOI. The quality of PET images of both a hot-rod phantom and mouse acquired with DOI was much higher than that of images obtained without DOI. This study demonstrates that SSPM arrays and advanced readout/processing electronics can be used to develop a practical DOI-measureable PET scanner.

  18. The Parallel Principle

    E-print Network

    Richard A Mould

    2001-11-19

    Von Neumann's psycho-physical parallelism requires the existence of an interaction between subjective experiences and material systems. A hypothesis is proposed that amends physics in a way that connects subjective states with physical states, and a general model of the interaction is provided. A specific example shows how the theory applies to pain consciousness. The implications concerning quantum mechanical state creation and reduction are discussed, and some mechanisms are suggested to seed the process. An experiment that tests the hypothesis is described elsewhere. Key Words: von Neumann, psycho-physical, consciousness, state reduction, state collapse, macroscopic superpositions, conscious observer.

  19. Serial Order: A Parallel Distributed Processing Approach.

    ERIC Educational Resources Information Center

    Jordan, Michael I.

    Human behavior shows a variety of serially ordered action sequences. This paper presents a theory of serial order which describes how sequences of actions might be learned and performed. In this theory, parallel interactions across time (coarticulation) and parallel interactions across space (dual-task interference) are viewed as two aspects of a…

  20. A Fast Parallel Simulation Code for Interaction between Proto-Planetary Disk and Embedded Proto-Planets: Implementation for 3D Code

    SciTech Connect

    Li, Shengtai; Li, Hui

    2012-06-14

    We develop a 3D simulation code for interaction between the proto-planetary disk and embedded proto-planets. The protoplanetary disk is treated as a three-dimensional (3D), self-gravitating gas whose motion is described by the locally isothermal Navier-Stokes equations in a spherical coordinate centered on the star. The differential equations for the disk are similar to those given in Kley et al. (2009) with a different gravitational potential that is defined in Nelson et al. (2000). The equations are solved by directional split Godunov method for the inviscid Euler equations plus operator-split method for the viscous source terms. We use a sub-cycling technique for the azimuthal sweep to alleviate the time step restriction. We also extend the FARGO scheme of Masset (2000) and modified in Li et al. (2001) to our 3D code to accelerate the transport in the azimuthal direction. Furthermore, we have implemented a reduced 2D (r, {theta}) and a fully 3D self-gravity solver on our uniform disk grid, which extends our 2D method (Li, Buoni, & Li 2008) to 3D. This solver uses a mode cut-off strategy and combines FFT in the azimuthal direction and direct summation in the radial and meridional direction. An initial axis-symmetric equilibrium disk is generated via iteration between the disk density profile and the 2D disk-self-gravity. We do not need any softening in the disk self-gravity calculation as we have used a shifted grid method (Li et al. 2008) to calculate the potential. The motion of the planet is limited on the mid-plane and the equations are the same as given in D'Angelo et al. (2005), which we adapted to the polar coordinates with a fourth-order Runge-Kutta solver. The disk gravitational force on the planet is assumed to evolve linearly with time between two hydrodynamics time steps. The Planetary potential acting on the disk is calculated accurately with a small softening given by a cubic-spline form (Kley et al. 2009). Since the torque is extremely sensitive to the position of the planet, we adopt the corotating frame that allows the planet moving only in radial direction if only one planet is present. This code has been extensively tested on a number of problems. For the earthmass planet with constant aspect ratio h = 0.05, the torque calculated using our code matches quite well with the the 3D linear theory results by Tanaka et al. (2002). The code is fully parallelized via message-passing interface (MPI) and has very high parallel efficiency. Several numerical examples for both fixed planet and moving planet are provided to demonstrate the efficacy of the numerical method and code.

  1. Detached-eddy simulation of flow non-linearity of fluid-structural interactions using high order schemes and parallel computation

    NASA Astrophysics Data System (ADS)

    Wang, Baoyuan

    The objective of this research is to develop an efficient and accurate methodology to resolve flow non-linearity of fluid-structural interaction. To achieve this purpose, a numerical strategy to apply the detached-eddy simulation (DES) with a fully coupled fluid-structural interaction model is established for the first time. The following novel numerical algorithms are also created: a general sub-domain boundary mapping procedure for parallel computation to reduce wall clock simulation time, an efficient and low diffusion E-CUSP (LDE) scheme used as a Riemann solver to resolve discontinuities with minimal numerical dissipation, and an implicit high order accuracy weighted essentially non-oscillatory (WENO) scheme to capture shock waves. The Detached-Eddy Simulation is based on the model proposed by Spalart in 1997. Near solid walls within wall boundary layers, the Reynolds averaged Navier-Stokes (RANS) equations are solved. Outside of the wall boundary layers, the 3D filtered compressible Navier-Stokes equations are solved based on large eddy simulation(LES). The Spalart-Allmaras one equation turbulence model is solved to provide the Reynolds stresses in the RANS region and the subgrid scale stresses in the LES region. An improved 5th order finite differencing weighted essentially non-oscillatory (WENO) scheme with an optimized epsilon value is employed for the inviscid fluxes. The new LDE scheme used with the WENO scheme is able to capture crisp shock profiles and exact contact surfaces. A set of fully conservative 4th order finite central differencing schemes are used for the viscous terms. The 3D Navier-Stokes equations are discretized based on a conservative finite differencing scheme. The unfactored line Gauss-Seidel relaxation iteration is employed for time marching. A general sub-domain boundary mapping procedure is developed for arbitrary topology multi-block structured grids with grid points matched on sub-domain boundaries. Extensive numerical experiments are conducted to test the performance of the numerical algorithms. The RANS simulation with the Spalart-Allmaras one equation turbulence model is the foundation for DES and is hence validated with other transonic flows. The predicted results agree very well with the experiments. The RANS code is then further used to study the slot size effect of a co-flow jet (CFJ) airfoil. The DES solver with fully coupled fluid-structural interaction methodology is validated with vortex induced vibration of a cylinder and a transonic forced pitching airfoil. For the cylinder, the laminar Navier-Stokes equations are solved due to the low Reynolds number. The 3D effects are observed in both stationary and oscillating cylinder simulation because of the flow separations behind the cylinder. For the transonic forced pitching airfoil DES computation, there is no flow separation in the flow field. The DES results agree well with the RANS results. These two cases indicate that the DES is more effective on predicting flow separation. The DES code is used to simulate the limited cycle oscillation of NLR7301 airfoil. For the cases computed in this research, the predicted LCO frequency, amplitudes, averaged lift and moment, all agree excellently with the experiment. The solutions appear to have bifurcation and are dependent on the initial perturbation. The developed methodology is able to capture the LCO with very small amplitudes measured in the experiment. This is attributed to the high order low diffusion schemes, fully coupled FSI model, and the turbulence model used. This research appears to be the first time that a numerical simulation of LCO matches the experiment. The DES code is also used to simulate the CFJ airfoil jet mixing at high angle of attack. In conclusion, the numerical strategy of the high order DES with fully coupled FSI model and parallel computing developed in this research is demonstrated to have high accuracy, robustness, and efficiency. Future work to further maturate the methodology is suggested. (Abstract shortened by UMI.)

  2. Insights into the Hendra virus NTAIL-XD complex: Evidence for a parallel organization of the helical MoRE at the XD surface stabilized by a combination of hydrophobic and polar interactions.

    PubMed

    Erales, Jenny; Beltrandi, Matilde; Roche, Jennifer; Maté, Maria; Longhi, Sonia

    2015-08-01

    The Hendra virus is a member of the Henipavirus genus within the Paramyxoviridae family. The nucleoprotein, which consists of a structured core and of a C-terminal intrinsically disordered domain (N(TAIL)), encapsidates the viral genome within a helical nucleocapsid. N(TAIL) partly protrudes from the surface of the nucleocapsid being thus capable of interacting with the C-terminal X domain (XD) of the viral phosphoprotein. Interaction with XD implies a molecular recognition element (MoRE) that is located within N(TAIL) residues 470-490, and that undergoes ?-helical folding. The MoRE has been proposed to be embedded in the hydrophobic groove delimited by helices ?2 and ?3 of XD, although experimental data could not discriminate between a parallel and an antiparallel orientation of the MoRE. Previous studies also showed that if the binding interface is enriched in hydrophobic residues, charged residues located close to the interface might play a role in complex formation. Here, we targeted for site directed mutagenesis two acidic and two basic residues within XD and N(TAIL). ITC studies showed that electrostatics plays a crucial role in complex formation and pointed a parallel orientation of the MoRE as more likely. Further support for a parallel orientation was afforded by SAXS studies that made use of two chimeric constructs in which XD and the MoRE were covalently linked to each other. Altogether, these studies unveiled the multiparametric nature of the interactions established within this complex and contribute to shed light onto the molecular features of protein interfaces involving intrinsically disordered regions. PMID:25960280

  3. Runtime volume visualization for parallel CFD

    NASA Technical Reports Server (NTRS)

    Ma, Kwan-Liu

    1995-01-01

    This paper discusses some aspects of design of a data distributed, massively parallel volume rendering library for runtime visualization of parallel computational fluid dynamics simulations in a message-passing environment. Unlike the traditional scheme in which visualization is a postprocessing step, the rendering is done in place on each node processor. Computational scientists who run large-scale simulations on a massively parallel computer can thus perform interactive monitoring of their simulations. The current library provides an interface to handle volume data on rectilinear grids. The same design principles can be generalized to handle other types of grids. For demonstration, we run a parallel Navier-Stokes solver making use of this rendering library on the Intel Paragon XP/S. The interactive visual response achieved is found to be very useful. Performance studies show that the parallel rendering process is scalable with the size of the simulation as well as with the parallel computer.

  4. Parallelization for reaction

    E-print Network

    Louvet, Violaine

    Parallelization for reaction waves with complex chemistry Context Application Background Numerical Results Conclusions and Perspectives Parallelization strategies for multi-scale reaction waves for Engineering - Paraguay 2010 #12;Parallelization for reaction waves with complex chemistry Context Application

  5. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  6. Two Fundamental Concepts in Skeletal Parallel Programming

    E-print Network

    Cole, Murray

    to be fundamental to the definition of any skeletal parallel programming system, but which have been addressed onlyTwo Fundamental Concepts in Skeletal Parallel Programming Anne Benoit and Murray Cole School define the concepts of nesting mode and interaction mode as they arise in the description of skeletal

  7. To appear in the IEEE 2001 Symposium on Parallel and Large Data Visualization and Graphics proceedings Jupiter: A Toolkit for Interactive Large Model Visualization

    E-print Network

    Bartz, Dirk

    proceedings Jupiter: A Toolkit for Interactive Large Model Visualization Dirk Bartz, Dirk Staneker, Wolfgang Jupiter, a toolkit for the interactive visualization of large models which exploits the above mentioned techniques. Jupiter was originally developed by Hewlett-Packard and EAI, and it was recently equipped

  8. Applications Parallel PIC plasma simulation through particle

    E-print Network

    Vlad, Gregorio

    for the con®nement degradation of the plasma. www.elsevier.com/locate/parco Parallel Computing 27 (2001) 295, each of them representing a cloud of non-mutually interacting physical particles. The mutual

  9. Parallel computations and control of adaptive structures

    NASA Technical Reports Server (NTRS)

    Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (editor); Liu, S. C. (editor); Li, J. C. (editor)

    1991-01-01

    The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.

  10. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  11. Massively Parallel QCD

    SciTech Connect

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-04-11

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.

  12. Parallelization of a treecode

    E-print Network

    R. Valdarnini

    2003-03-18

    I describe here the performance of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture using the MPI message-passing library. For a configuration with a constant number of particles per processor the scalability of the code was tested up to P=128 processors on an IBM SP4 machine. In the large $P$ limit the average CPU time per processor necessary for solving the gravitational interactions is $\\sim 10 %$ higher than that expected from the ideal scaling relation. The processor domains are determined every large timestep according to a recursive orthogonal bisection, using a weighting scheme which takes into account the total particle computational load within the timestep. The results of the numerical tests show that the load balancing efficiency $L$ of the code is high ($>=90%$) up to P=32, and decreases to $L\\sim 80%$ when P=128. In the latter case it is found that some aspects of the code performance are affected by machine hardware, while the proposed weighting scheme can achieve a load balance as high as $L\\sim 90%$ even in the large $P$ limit.

  13. Applied Parallel Metadata Indexing

    SciTech Connect

    Jacobi, Michael R

    2012-08-01

    The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.

  14. The ParaScope parallel programming environment

    NASA Technical Reports Server (NTRS)

    Cooper, Keith D.; Hall, Mary W.; Hood, Robert T.; Kennedy, Ken; Mckinley, Kathryn S.; Mellor-Crummey, John M.; Torczon, Linda; Warren, Scott K.

    1993-01-01

    The ParaScope parallel programming environment, developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by providing a variety of interactive program transformations that are effective in exposing parallelism. The debugging system detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis, program instrumentation, and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in FORTRAN D, a machine-independent parallel programming language intended for use with both distributed-memory and shared-memory parallel computers.

  15. Visualization and Tracking of Parallel CFD Simulations

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kremenetsky, Mark

    1995-01-01

    We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

  16. Wave-particle interactions with parallel whistler waves: Nonlinear and time-dependent effects revealed by particle-in-cell simulations

    NASA Astrophysics Data System (ADS)

    Camporeale, Enrico; Zimbardo, Gaetano

    2015-09-01

    We present a self-consistent Particle-in-Cell simulation of the resonant interactions between anisotropic energetic electrons and a population of whistler waves, with parameters relevant to the Earth's radiation belt. By tracking PIC particles and comparing with test-particle simulations, we emphasize the importance of including nonlinear effects and time evolution in the modeling of wave-particle interactions, which are excluded in the resonant limit of quasi-linear theory routinely used in radiation belt studies. In particular, we show that pitch angle diffusion is enhanced during the linear growth phase, and it rapidly saturates well before a single bounce period. This calls into question the widely used bounce average performed in most radiation belt diffusion calculations. Furthermore, we discuss how the saturation is related to the fact that the domain in which the particles pitch angle diffuses is bounded, and to the well-known problem of 90° diffusion barrier.

  17. Parallel flow diffusion battery

    DOEpatents

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  18. Parallel flow diffusion battery

    DOEpatents

    Yeh, Hsu-Chi (Albuquerque, NM); Cheng, Yung-Sung (Albuquerque, NM)

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  19. Parallel Discrete Event Simulation

    NASA Astrophysics Data System (ADS)

    Kunz, Georg

    Ever since discrete event simulation has been adopted by a large research community, simulation developers have attempted to draw benefits from executing a simulation on multiple processing units in parallel. Hence, a wide range of research has been conducted on Parallel Discrete Event Simulation (PDES). In this chapter we give an overview of the challenges and approaches of parallel simulation. Furthermore, we present a survey of the parallelization capabilities of the network simulators OMNeT++, ns-2, DSIM and JiST.

  20. Linked-View Parallel Coordinate Plot Renderer

    Energy Science and Technology Software Center (ESTSC)

    2011-06-28

    This software allows multiple linked views for interactive querying via map-based data selection, bar chart analytic overlays, and high dynamic range (HDR) line renderings. The major component of the visualization package is a parallel coordinate renderer with binning, curved layouts, shader-based rendering, and other techniques to allow interactive visualization of multidimensional data.

  1. Research in parallel computing

    NASA Technical Reports Server (NTRS)

    Ortega, James M.; Henderson, Charles

    1994-01-01

    This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.

  2. Combinatorial Parallel and Scientific

    E-print Network

    Pinar, Ali

    Combinatorial algorithms have long played a pivotal enabling role in many applica- tions of parallel computing computational techniques and rapidly changing computational platforms. But the relationship be- tween parallel the relationship between discrete algorithms and parallel computing. In addition to their traditional role

  3. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  4. Verbal and Visual Parallelism

    ERIC Educational Resources Information Center

    Fahnestock, Jeanne

    2003-01-01

    This study investigates the practice of presenting multiple supporting examples in parallel form. The elements of parallelism and its use in argument were first illustrated by Aristotle. Although real texts may depart from the ideal form for presenting multiple examples, rhetorical theory offers a rationale for minimal, parallel presentation. The…

  5. Advanced parallel processing with supercomputer architectures

    SciTech Connect

    Hwang, K.

    1987-10-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers.

  6. Parallel algorithm development

    SciTech Connect

    Adams, T.F.

    1996-06-01

    Rapid changes in parallel computing technology are causing significant changes in the strategies being used for parallel algorithm development. One approach is simply to write computer code in a standard language like FORTRAN 77 or with the expectation that the compiler will produce executable code that will run in parallel. The alternatives are: (1) to build explicit message passing directly into the source code; or (2) to write source code without explicit reference to message passing or parallelism, but use a general communications library to provide efficient parallel execution. Application of these strategies is illustrated with examples of codes currently under development.

  7. Collisions between quasi-parallel shocks

    NASA Technical Reports Server (NTRS)

    Cargill, Peter J.

    1991-01-01

    The collision between pairs of quasi-parallel shocks is examined using hybrid numerical simulations. In the interaction, the two shocks are transmitted through each other leaving behind a hot plasma with a population of particles with energies in excess of 40 E0, where E0 is the kinetic energy of particles in the shock frame prior to the collision. The energization is more efficient for quasi-parallel shocks than parallel shocks. Collisions between shocks of equal strengths are more efficient than those that are unequal. The results are of importance for phenomena during the impulsive phase of solar flares, in the distant solar wind and at planetary bow shocks.

  8. Synchronization Of Parallel Discrete Event Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S.

    1992-01-01

    Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.

  9. Aerodynamic, aeroacoustic, and aeroelastic investigations of airfoil-vortex interaction using large-eddy simulation

    NASA Astrophysics Data System (ADS)

    Ilie, Marcel

    In helicopters, vortices (generated at the tip of the rotor blades) interact with the next advancing blades during certain flight and manoeuvring conditions, generating undesirable levels of acoustic noise and vibration. These Blade-Vortex Interactions (BVIs), which may cause the most disturbing acoustic noise, normally occur in descent or high-speed forward flight. Acoustic noise characterization (and potential reduction) is one the areas generating intensive research interest to the rotorcraft industry. Since experimental investigations of BVI are extremely costly, some insights into the BVI or AVI (2-D Airfoil-Vortex Interaction) can be gained using Computational Fluid Dynamics (CFD) numerical simulations. Numerical simulation of BVI or AVI has been of interest to CFD for many years. There are still difficulties concerning an accurate numerical prediction of BVI. One of the main issues is the inherent dissipation of CFD turbulence models, which severely affects the preservation of the vortex characteristics. Moreover this is not an issue only for aerodynamic and aeroacoustic analysis but also for aeroelastic investigations as well, especially when the strong (two-way) aeroelastic coupling is of interest. The present investigation concentrates mainly on AVI simulations. The simulations are performed for Mach number, Ma = 0.3, resulting in a Reynolds number, Re = 1.3 x 106, which is based on the chord, c, of the airfoil (NACA0012). Extensive literature search has indicated that the present work represents the first comprehensive investigation of AVI using the LES numerical approach, in the rotorcraft research community. The major factor affecting the aerodynamic coefficients and aeroacoustic field as a result of airfoil-vortex interaction is observed to be the unsteady pressure generated at the location of the interaction. The present numerical results show that the aerodynamic coefficients (lift, moment, and drag) and aeroacoustic field are strongly dependent on the airfoil-vortex vertical miss-distance, airfoil angle of attack, vortex characteristics, and aeroelastic response of airfoil to airfoil-vortex interaction. A decay of airfoil-vortex interactions with the increase of vertical miss-distance and angle of attack was observed. Also, a decay of airfoil-vortex interactions is observed for the case of a flexible structure when compared with the case of a rigid structure. The decay of vortex core size produces a decrease in the aerodynamic coefficients.

  10. The STAPL Parallel Container Framework 

    E-print Network

    Tanase, Ilie Gabriel

    2012-02-14

    The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C with support for parallelism. STAPL provides a run-time system, a collection of distributed data structures (pContainers) and parallel...

  11. Genome-Wide Fitness and Genetic Interactions Determined by Tn-seq, a High-Throughput Massively Parallel Sequencing Method for Microorganisms

    PubMed Central

    van Opijnen, Tim; Lazinski, David W.; Camilli, Andrew

    2015-01-01

    The lagging annotation of bacterial genomes and the inherent genetic complexity of many phenotypes is hindering the discovery of new drug targets and the development of new antimicrobials and vaccines. Here we present the method Tn-seq, with which it has become possible to quantitatively determine fitness for most genes in a microorganism and to screen for quantitative genetic interactions on a genome-wide scale and in a high-throughput fashion. Tn-seq can thus direct studies in the annotation of genes and untangle complex phenotypes. The method is based on the construction of a saturated transposon insertion library. After library selection, changes in frequency of each insertion mutant are determined by sequencing of the flanking regions en masse. These changes are used to calculate each mutant's fitness. The method was originally developed for the Gram-positive bacterium Streptococcus pneumoniae, a causative agent of pneumonia and meningitis, but has now been applied to several different microbial species. PMID:24733243

  12. Genome-Wide Fitness and Genetic Interactions Determined by Tn-seq, a High-Throughput Massively Parallel Sequencing Method for Microorganisms

    PubMed Central

    van Opijnen, Tim; Lazinski, David W.; Camilli, Andrew

    2015-01-01

    The lagging annotation of bacterial genomes and the inherent genetic complexity of many phenotypes is hindering the discovery of new drug targets and the development of new antimicrobial agents and vaccines. This unit presents Tn-seq, a method that has made it possible to quantitatively determine fitness for most genes in a microorganism and to screen for quantitative genetic interactions on a genome-wide scale and in a high-throughput fashion. Tn-seq can thus direct studies on the annotation of genes and untangle complex phenotypes. The method is based on the construction of a saturated transposon insertion library. After library selection, changes in the frequency of each insertion mutant are determined by sequencing flanking regions en masse. These changes are used to calculate each mutant’s fitness. The method was originally developed for the Gram-positive bacterium Streptococcus pneumoniae, a causative agent of pneumonia and meningitis, but has now been applied to several different microbial species. PMID:25641100

  13. Hyper-Systolic Parallel Computing

    E-print Network

    Th. Lippert; A. Seyfried; A. Bode; K. Schilling

    1995-07-25

    A new class of parallel algorithms is introduced that can achieve a complexity of O(n^3/2) with respect to the interprocessor communication, in the exact computation of systems with pairwise mutual interactions of all elements. Hitherto, conventional methods exhibit a communicational complexity of O(n^2). The amount of computation operations is not altered for the new algorithm which can be formulated as a kind of h-range problem, known from the mathematical field of Additive Number Theory. We will demonstrate the reduction in communicational expense by comparing the standard-systolic algorithm and the new algorithm on the connection machine CM5 and the CRAY T3D. The parallel method can be useful in various scientific and engineering fields like exact n-body dynamics with long range forces, polymer chains, protein folding or signal processing.

  14. Mirror versus parallel bimanual reaching

    PubMed Central

    2013-01-01

    Background In spite of their importance to everyday function, tasks that require both hands to work together such as lifting and carrying large objects have not been well studied and the full potential of how new technology might facilitate recovery remains unknown. Methods To help identify the best modes for self-teleoperated bimanual training, we used an advanced haptic/graphic environment to compare several modes of practice. In a 2-by-2 study, we compared mirror vs. parallel reaching movements, and also compared veridical display to one that transforms the right hand’s cursor to the opposite side, reducing the area that the visual system has to monitor. Twenty healthy, right-handed subjects (5 in each group) practiced 200 movements. We hypothesized that parallel reaching movements would be the best performing, and attending to one visual area would reduce the task difficulty. Results The two-way comparison revealed that mirror movement times took an average 1.24 s longer to complete than parallel. Surprisingly, subjects’ movement times moving to one target (attending to one visual area) also took an average of 1.66 s longer than subjects moving to two targets. For both hands, there was also a significant interaction effect, revealing the lowest errors for parallel movements moving to two targets (p?parallel movements with a veridical display (moving to two separate targets). These results point to the expected levels of challenge for these bimanual training modes, which could be used to advise therapy choices in self-neurorehabilitation. PMID:23837908

  15. Parallel digital forensics infrastructure.

    SciTech Connect

    Liebrock, Lorie M.; Duggan, David Patrick

    2009-10-01

    This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.

  16. Selectivity of small molecule ligands for parallel and anti-parallel DNA G-quadruplex structures.

    PubMed

    Garner, Thomas P; Williams, Huw E L; Gluszyk, Katarzyna I; Roe, Stephen; Oldham, Neil J; Stevens, Malcolm F G; Moses, John E; Searle, Mark S

    2009-10-21

    We report CD, ESI-MS and molecular modelling studies of ligand binding interactions with DNA quadruplex structures derived from the human telomeric repeat sequence (h-Tel) and the proto-oncogenic c-kit promoter sequence. These sequences form anti-parallel (both 2 + 2 and 3 + 1) and parallel conformations, respectively, and demonstrate distinctively different degrees of structural plasticity in binding ligands. With h-Tel, we show that an extended heteroaromatic 1,4-triazole (TRZ), designed to exploit pi-stacking interactions and groove-specific contacts, shows some selectivity for parallel folds, however, the polycyclic fluorinated acridinium cation (RHPS4), which is a similarly potent telomerase inhibitor, shows selectivity for anti-parallel conformations implicating favourable interactions with lateral and diagonal loops. In contrast, the unique c-kit parallel-stranded quadruplex shows none of the structural plasticity of h-Tel with either ligand. We show by quantitative ESI-MS analysis that both sequences are able to bind a ligand on either end of the quadruplex. In the case of h-Tel the two sites have similar affinities, however, in the case of the c-kit quadruplex the affinities of the two sites are different and ligand-dependent. We demonstrate that two different small molecule architectures result in significant differences in selectivity for parallel and anti-parallel quadruplex structures that may guide quadruplex targeted drug-design. PMID:19795057

  17. Parallel MR Imaging

    PubMed Central

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A.; Seiberlich, Nicole

    2015-01-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the under-sampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. PMID:22696125

  18. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  19. Eclipse Parallel Tools Platform

    Energy Science and Technology Software Center (ESTSC)

    2005-02-18

    Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices,more »and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures, and basis Fortran integration. Future versions will extend the functionality substantially, provide a number of core parallel tools, and provide support across a wide rang of parallel architectures and languages.« less

  20. Code Parallelization with CAPO: A User Manual

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

  1. Totally parallel multilevel algorithms

    NASA Technical Reports Server (NTRS)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  2. Parallel computing works

    SciTech Connect

    Not Available

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  3. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  4. Parallel Adaptive Mesh Refinement

    SciTech Connect

    Diachin, L; Hornung, R; Plassmann, P; WIssink, A

    2005-03-04

    As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the ability of both meshing methods to resolve simulation details by varying the local grid spacing.

  5. Parallel nearest neighbor calculations

    NASA Astrophysics Data System (ADS)

    Trease, Harold

    We are just starting to parallelize the nearest neighbor portion of our free-Lagrange code. Our implementation of the nearest neighbor reconnection algorithm has not been parallelizable (i.e., we just flip one connection at a time). In this paper we consider what sort of nearest neighbor algorithms lend themselves to being parallelized. For example, the construction of the Voronoi mesh can be parallelized, but the construction of the Delaunay mesh (dual to the Voronoi mesh) cannot because of degenerate connections. We will show our most recent attempt to tessellate space with triangles or tetrahedrons with a new nearest neighbor construction algorithm called DAM (Dial-A-Mesh). This method has the characteristics of a parallel algorithm and produces a better tessellation of space than the Delaunay mesh. Parallel processing is becoming an everyday reality for us at Los Alamos. Our current production machines are Cray YMPs with 8 processors that can run independently or combined to work on one job. We are also exploring massive parallelism through the use of two 64K processor Connection Machines (CM2), where all the processors run in lock step mode. The effective application of 3-D computer models requires the use of parallel processing to achieve reasonable "turn around" times for our calculations.

  6. NAS parallel benchmark results

    NASA Technical Reports Server (NTRS)

    Bailey, D. H.; Barszcz, E.; Dagum, L.; Simon, H. D.

    1992-01-01

    The NAS (Numerical Aerodynamic Simulation) parallel benchmarks have been developed at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a 'pencil and paper' fashion. The performance results of various systems using the NAS parallel benchmarks are presented. These results represent the best results that have been reported to the authors for the specific systems listed. They represent implementation efforts performed by personnel in both the NAS Applied Research Branch of NASA Ames Research Center and in other organizations.

  7. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, David (editor); Barton, John (editor); Lasinski, Thomas (editor); Simon, Horst (editor)

    1993-01-01

    A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  8. Parallel Molecular Dynamics Program for Molecules

    Energy Science and Technology Software Center (ESTSC)

    1995-03-07

    ParBond is a parallel classical molecular dynamics code that models bonded molecular systems, typically of an organic nature. It uses classical force fields for both non-bonded Coulombic and Van der Waals interactions and for 2-, 3-, and 4-body bonded (bond, angle, dihedral, and improper) interactions. It integrates Newton''s equation of motion for the molecular system and evaluates various thermodynamical properties of the system as it progresses.

  9. Deoxyribo Nanonucleic Acid: Antiparallel, Parallel and Unparalleled

    SciTech Connect

    Egli, M.

    2010-03-05

    The crystal structure of a single-stranded DNA oligonucleotide has revealed formation of a unique three-dimensional array by continuous antiparallel and parallel pairing between monomers. The array is based on tertiary interactions and represents a second-generation nanotechnological system.

  10. The Parallel Axiom

    ERIC Educational Resources Information Center

    Rogers, Pat

    1972-01-01

    Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)

  11. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  12. UCLA Parallel PIC Framework

    NASA Astrophysics Data System (ADS)

    Decyk, Viktor K.; Norton, Charles D.

    2004-12-01

    The UCLA Parallel PIC Framework (UPIC) has been developed to provide trusted components for the rapid construction of new, parallel Particle-in-Cell (PIC) codes. The Framework uses object-based ideas in Fortran95, and is designed to provide support for various kinds of PIC codes on various kinds of hardware. The focus is on student programmers. The Framework supports multiple numerical methods, different physics approximations, different numerical optimizations and implementations for different hardware. It is designed with "defensive" programming in mind, meaning that it contains many error checks and debugging helps. Above all, it is designed to hide the complexity of parallel processing. It is currently being used in a number of new Parallel PIC codes.

  13. Introduction to Parallel Processing

    E-print Network

    Evans, Hal

    for interprocess communication · industry standard in parallel and cluster computing decades · implementations!) · Optimization Strategy · Message Passing Interface (MPI) · Graphics Processing Units (GPU) · Thursday: Physics Optimization Strategy · Overall one needs to understand where the software is spending the most time · Free

  14. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.

  15. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  16. Artificial intelligence in parallel

    SciTech Connect

    Waldrop, M.M.

    1984-08-10

    The current rage in the Artificial Intelligence (AI) community is parallelism: the idea is to build machines with many independent processors doing many things at once. The upshot is that about a dozen parallel machines are now under development for AI alone. As might be expected, the approaches are diverse yet there are a number of fundamental issues in common: granularity, topology, control, and algorithms.

  17. Perfromance analysis of the Parallel Community Atmosphere Model (CAM) application 

    E-print Network

    Shawky Sharkawi, Sameh Sherif

    2009-06-02

    of different architectures on this application. In the analysis of CAM, kernel coupling, which quantifies the interaction between adjacent and chains of kernels in an application, is used. All experiments are conducted on four parallel platforms: NERSC...

  18. Parallel time integration software

    Energy Science and Technology Software Center (ESTSC)

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds mustmore »come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.« less

  19. Progress in parallelizing XOOPIC

    SciTech Connect

    Mardahl, P.J.; Verboncoeur, J.P.

    1998-12-31

    XOOPIC (Object Oriented Particle in Cell code for X11-based Unix workstations) is presently a serial 2d 3v particle-in-cell plasma simulation. The present effort focuses on using parallel and distributed processing to optimize the simulation for large problems. The benefits include increased capacity for memory intensive problems, and improved performance for processor-intensive problems. The MPI library enables the parallel version to be easily ported to massively parallel, SMP, and distributed computers. The philosophy employed here is to spatially decompose the system into computational regions separated by virtual boundaries, objects which contain the local data and algorithms to perform the local field solve and particle communication between regions. This implementation will reduce the changes required in the rest of the program by parallelization. Specific implementation details such as the hiding of communication latency behind local computation will also be discussed. The initial implementation includes manual, partitioning in one spatial coordinate, electromagnetic models, diagnostics by computational region, and effective transmission of both fields and particles across virtual boundaries. This version was able to perform greater than 600,000 particle-pushes-per-second using 8 200MHz UltraSPARC CPU`s. In this work the authors extend parallel XOOPIC to have 2-d partitioning, automated partitioning, and global diagnostics.

  20. Parallel time integration software

    SciTech Connect

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.

  1. Parallelizing Quantum Circuits

    E-print Network

    Anne Broadbent; Elham Kashefi

    2007-04-13

    We present a novel automated technique for parallelizing quantum circuits via forward and backward translation to measurement-based quantum computing patterns and analyze the trade off in terms of depth and space complexity. As a result we distinguish a class of polynomial depth circuits that can be parallelized to logarithmic depth while adding only polynomial many auxiliary qubits. In particular, we provide for the first time a full characterization of patterns with flow of arbitrary depth, based on the notion of influencing paths and a simple rewriting system on the angles of the measurement. Our method leads to insightful knowledge for constructing parallel circuits and as applications, we demonstrate several constant and logarithmic depth circuits. Furthermore, we prove a logarithmic separation in terms of quantum depth between the quantum circuit model and the measurement-based model.

  2. Parallel Magnetic Resonance Imaging

    E-print Network

    Uecker, Martin

    2015-01-01

    The main disadvantage of Magnetic Resonance Imaging (MRI) are its long scan times and, in consequence, its sensitivity to motion. Exploiting the complementary information from multiple receive coils, parallel imaging is able to recover images from under-sampled k-space data and to accelerate the measurement. Because parallel magnetic resonance imaging can be used to accelerate basically any imaging sequence it has many important applications. Parallel imaging brought a fundamental shift in image reconstruction: Image reconstruction changed from a simple direct Fourier transform to the solution of an ill-conditioned inverse problem. This work gives an overview of image reconstruction from the perspective of inverse problems. After introducing basic concepts such as regularization, discretization, and iterative reconstruction, advanced topics are discussed including algorithms for auto-calibration, the connection to approximation theory, and the combination with compressed sensing.

  3. Parallelism in System Tools

    SciTech Connect

    Matney, Sr., Kenneth D; Shipman, Galen M

    2010-01-01

    The Cray XT, when employed in conjunction with the Lustre filesystem, has provided the ability to generate huge amounts of data in the form of many files. Typically, this is accommodated by satisfying the requests of large numbers of Lustre clients in parallel. In contrast, a single service node (Lustre client) cannot adequately service such datasets. This means that the use of traditional UNIX tools like cp, tar, et alli (with have no parallel capability) can result in substantial impact to user productivity. For example, to copy a 10 TB dataset from the service node using cp would take about 24 hours, under more or less ideal conditions. During production operation, this could easily extend to 36 hours. In this paper, we introduce the Lustre User Toolkit for Cray XT, developed at the Oak Ridge Leadership Computing Facility (OLCF). We will show that Linux commands, implementing highly parallel I/O algorithms, provide orders of magnitude greater performance, greatly reducing impact to productivity.

  4. Parallel optical sampler

    DOEpatents

    Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A

    2014-05-20

    An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.

  5. A massively asynchronous, parallel brain.

    PubMed

    Zeki, Semir

    2015-05-19

    Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously--with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871

  6. A massively asynchronous, parallel brain

    PubMed Central

    Zeki, Semir

    2015-01-01

    Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871

  7. Adaptive parallel logic networks

    NASA Technical Reports Server (NTRS)

    Martinez, Tony R.; Vidal, Jacques J.

    1988-01-01

    Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

  8. The NAS Parallel Benchmarks

    SciTech Connect

    Bailey, David H.

    2009-11-15

    The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if so, which of the parallel offerings would be most useful in real-world scientific computation. In part to draw attention to some of the performance reporting abuses prevalent at the time, the present author wrote a humorous essay 'Twelve Ways to Fool the Masses,' which described in a light-hearted way a number of the questionable ways in which both vendor marketing people and scientists were inflating and distorting their performance results. All of this underscored the need for an objective and scientifically defensible measure to compare performance on these systems.

  9. "Parallel" transport - revisited

    E-print Network

    Stasheff, Jim

    2011-01-01

    Parallel transport in a fibre bundle with respect to smooth paths in the base space B have recently been extended to representations of the smooth singular simplicial set Sing_{smooth}(B). Inspired by these extensions,I revisit the development of a notion of `parallel' transport in the topological setting of fibrations with the homotopy lifting property and then extend it to representations of Sing(B) on such fibrations. Closely related is the notion of (strong or `infty') homotopy action, which has variants under a variety of names.

  10. A Parallel Tree Code

    E-print Network

    John Dubinski

    1996-03-18

    We describe a new implementation of a parallel N-body tree code. The code is load-balanced using the method of orthogonal recursive bisection to subdivide the N-body system into independent rectangular volumes each of which is mapped to a processor on a parallel computer. On the Cray T3D, the load balance in the range of 70-90\\% depending on the problem size and number of processors. The code can handle simulations with $>$ 10 million particles roughly a factor of 10 greater than allowed in vectorized tree codes.

  11. Self-testing in parallel

    E-print Network

    Matthew McKague

    2015-11-13

    Self-testing allows us to determine, through classical interaction only, whether some players in a non-local game share particular quantum states. Most work on self-testing has concentrated on developing tests for small states like one pair of maximally entangled qubits, or on tests where there is a separate player for each qubit, as in a graph state. Here we consider the case of testing many maximally entangled pairs of qubits shared between two players. Previously such a test was shown where testing is sequential, i.e., one pair is tested at a time. Here we consider the parallel case where all pairs are tested simultaneously, giving considerably more power to dishonest players. We derive sufficient conditions for a self-test for many maximally entangled pairs of qubits shared between two players and also two constructions for self-tests where all pairs are tested simultaneously.

  12. Parallel software support for computational structural mechanics

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.

    1987-01-01

    The application of the parallel programming methodology known as the Force was conducted. Two application issues were addressed. The first involves the efficiency of the implementation and its completeness in terms of satisfying the needs of other researchers implementing parallel algorithms. Support for, and interaction with, other Computational Structural Mechanics (CSM) researchers using the Force was the main issue, but some independent investigation of the Barrier construct, which is extremely important to overall performance, was also undertaken. Another efficiency issue which was addressed was that of relaxing the strong synchronization condition imposed on the self-scheduled parallel DO loop. The Force was extended by the addition of logical conditions to the cases of a parallel case construct and by the inclusion of a self-scheduled version of this construct. The second issue involved applying the Force to the parallelization of finite element codes such as those found in the NICE/SPAR testbed system. One of the more difficult problems encountered is the determination of what information in COMMON blocks is actually used outside of a subroutine and when a subroutine uses a COMMON block merely as scratch storage for internal temporary results.

  13. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  14. Parallel fast gauss transform

    SciTech Connect

    Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan

    2010-01-01

    We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.

  15. NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Subhash, Saini; Bailey, David H.; Lasinski, T. A. (Technical Monitor)

    1995-01-01

    The NAS Parallel Benchmarks (NPB) were developed in 1991 at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a pencil and paper fashion i.e. the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. In this paper, we present new NPB performance results for the following systems: (a) Parallel-Vector Processors: Cray C90, Cray T'90 and Fujitsu VPP500; (b) Highly Parallel Processors: Cray T3D, IBM SP2 and IBM SP-TN2 (Thin Nodes 2); (c) Symmetric Multiprocessing Processors: Convex Exemplar SPP1000, Cray J90, DEC Alpha Server 8400 5/300, and SGI Power Challenge XL. We also present sustained performance per dollar for Class B LU, SP and BT benchmarks. We also mention NAS future plans of NPB.

  16. Parallel hierarchical global illumination

    SciTech Connect

    Snell, Q.O.

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  17. Parallel Total Energy

    Energy Science and Technology Software Center (ESTSC)

    2004-10-21

    This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.

  18. Massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Fung, L. W. (inventor)

    1983-01-01

    An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

  19. Parallel hierarchical radiosity rendering

    SciTech Connect

    Carter, M.

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  20. High performance parallel architectures

    SciTech Connect

    Anderson, R.E. )

    1989-09-01

    In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.

  1. Progress in parallelizing XOOPIC

    NASA Astrophysics Data System (ADS)

    Mardahl, Peter; Verboncoeur, J. P.

    1997-11-01

    XOOPIC (Object Orient Particle in Cell code for X11-based Unix workstations) is presently a serial 2-D 3v particle-in-cell plasma simulation (J.P. Verboncoeur, A.B. Langdon, and N.T. Gladd, ``An object-oriented electromagnetic PIC code.'' Computer Physics Communications 87 (1995) 199-211.). The present effort focuses on using parallel and distributed processing to optimize the simulation for large problems. The benefits include increased capacity for memory intensive problems, and improved performance for processor-intensive problems. The MPI library is used to enable the parallel version to be easily ported to massively parallel, SMP, and distributed computers. The philosophy employed here is to spatially decompose the system into computational regions separated by 'virtual boundaries', objects which contain the local data and algorithms to perform the local field solve and particle communication between regions. This implementation will reduce the changes required in the rest of the program by parallelization. Specific implementation details such as the hiding of communication latency behind local computation will also be discussed.

  2. Time sharing massively parallel machines. Draft

    SciTech Connect

    Gorda, B.; Wolski, R.

    1995-03-01

    As part of the Massively Parallel Computing Initiative (MPCI) at the Lawrence Livermore National Laboratory, the authors have developed a simple, effective and portable time sharing mechanism by scheduling gangs of processes on tightly coupled parallel machines. By time-sharing the resources, the system interleaves production and interactive jobs. Immediate priority is given to interactive use, maintaining good response time. Production jobs are scheduled during idle periods, making use of the otherwise unused resources. In this paper the authors discuss their experience with gang scheduling over the 3 year life-time of the project. In section 2, they motivate the project and discuss some of its details. Section 3.0 describes the general scheduling problem and how gang scheduling addresses it. In section 4.0, they describe the implementation. Section 8.0 presents results culled over the lifetime of the project. They conclude this paper with some observations and possible future directions.

  3. A parallel world in the dark

    SciTech Connect

    Higaki, Tetsutaro; Jeong, Kwang Sik; Takahashi, Fuminobu E-mail: ksjeong@tuhep.phys.tohoku.ac.jp

    2013-08-01

    The baryon-dark matter coincidence is a long-standing issue. Interestingly, the recent observations suggest the presence of dark radiation, which, if confirmed, would pose another coincidence problem of why the density of dark radiation is comparable to that of photons. These striking coincidences may be traced back to the dark sector with particle contents and interactions that are quite similar, if not identical, to the standard model: a dark parallel world. It naturally solves the coincidence problems of dark matter and dark radiation, and predicts a sterile neutrino(s) with mass of O(0.1?1) eV, as well as self-interacting dark matter made of the counterpart of ordinary baryons. We find a robust prediction for the relation between the abundance of dark radiation and the sterile neutrino, which can serve as the smoking-gun evidence of the dark parallel world.

  4. Resistor Combinations for Parallel Circuits.

    ERIC Educational Resources Information Center

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  5. Descriptive Simplicity in Parallel Computing 

    E-print Network

    Marr, Marcus

    The programming of parallel computers is recognised as being a difficult task and there exist a wide selection of parallel programming languages and environments. This thesis presents and examines the Hierarchical ...

  6. Tinkertoy Parallel Programming: Complicated Applications

    E-print Network

    Plimpton, Steve

    parallel algorithms for particle modeling, crash simulations and transferring data between two independent of activity within the parallel computing community. A key \\Lambda This work was funded by the Applied

  7. On parallel machine scheduling 1

    E-print Network

    Magdeburg, Universität

    On parallel machine scheduling 1 machines with setup times. The setup has to be performed by a single server. The objective is to minimize even for the case of two identical parallel machines. This paper presents a pseudopolynomial

  8. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.

  9. Ultrascalable petaflop parallel supercomputer

    DOEpatents

    Blumrich, Matthias A. (Ridgefield, CT); Chen, Dong (Croton On Hudson, NY); Chiu, George (Cross River, NY); Cipolla, Thomas M. (Katonah, NY); Coteus, Paul W. (Yorktown Heights, NY); Gara, Alan G. (Mount Kisco, NY); Giampapa, Mark E. (Irvington, NY); Hall, Shawn (Pleasantville, NY); Haring, Rudolf A. (Cortlandt Manor, NY); Heidelberger, Philip (Cortlandt Manor, NY); Kopcsay, Gerard V. (Yorktown Heights, NY); Ohmacht, Martin (Yorktown Heights, NY); Salapura, Valentina (Chappaqua, NY); Sugavanam, Krishnan (Mahopac, NY); Takken, Todd (Brewster, NY)

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  10. Homology, convergence and parallelism.

    PubMed

    Ghiselin, Michael T

    2016-01-01

    Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721

  11. Parallel Subconvolution Filtering Architectures

    NASA Technical Reports Server (NTRS)

    Gray, Andrew A.

    2003-01-01

    These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.

  12. Parallel multilevel preconditioners

    SciTech Connect

    Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.

    1989-01-01

    In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.

  13. Parallel grid population

    DOEpatents

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  14. Parallel Anisotropic Tetrahedral Adaptation

    NASA Technical Reports Server (NTRS)

    Park, Michael A.; Darmofal, David L.

    2008-01-01

    An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.

  15. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  16. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Painter, J.; Hansen, C.

    1996-10-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.

  17. Xyce(?) Parallel Electronic Simulator

    Energy Science and Technology Software Center (ESTSC)

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore »network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less

  18. Xyce(?) Parallel Electronic Simulator

    SciTech Connect

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.

  19. Parallel Pascal - An extended Pascal for parallel computers

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1984-01-01

    Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

  20. Using Motivational Interviewing Techniques to Address Parallel Process in Supervision

    ERIC Educational Resources Information Center

    Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

    2013-01-01

    Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…

  1. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  2. Rotary wing aerodynamically generated noise

    NASA Technical Reports Server (NTRS)

    Schmitz, F. J.; Morse, H. A.

    1982-01-01

    The history and methodology of aerodynamic noise reduction in rotary wing aircraft are presented. Thickness noise during hover tests and blade vortex interaction noise are determined and predicted through the use of a variety of computer codes. The use of test facilities and scale models for data acquisition are discussed.

  3. Computer Assisted Parallel Program Generation

    E-print Network

    Kawata, Shigeo

    2015-01-01

    Parallel computation is widely employed in scientific researches, engineering activities and product development. Parallel program writing itself is not always a simple task depending on problems solved. Large-scale scientific computing, huge data analyses and precise visualizations, for example, would require parallel computations, and the parallel computing needs the parallelization techniques. In this Chapter a parallel program generation support is discussed, and a computer-assisted parallel program generation system P-NCAS is introduced. Computer assisted problem solving is one of key methods to promote innovations in science and engineering, and contributes to enrich our society and our life toward a programming-free environment in computing science. Problem solving environments (PSE) research activities had started to enhance the programming power in 1970's. The P-NCAS is one of the PSEs; The PSE concept provides an integrated human-friendly computational software and hardware system to solve a target ...

  4. Extended Parallelism Models for Optimization on Massively Parallel Computers

    SciTech Connect

    Eldred, M.S.; Schimel, B.D.

    1999-05-24

    Single-level parallel optimization approaches, those in which either the simulation code executes in parallel or the optimiza- tion algorithm invokes multiple simultaneous single-processor analyses, have been investigated previously and been shown to be effective in reducing the time required to compute optimal solutions. However, these approaches have clear performance limita- tions that prevent effective scaling with the thousands of processors available in massively parallel supercomputers. In more recent work, a capability has been developed for multilevel parallelism in which multiple instances of multiprocessor simulations are coordinated simultaneously. This implementation employs a master-slave approach using the Message Passing Interface (MPI) within the DAKOTA software toolkit. Mathematical analysis on achieving peak efficiency in multilevel parallelism has shown that the most effective processor partitioning scheme is the one that limits the size of multiprocessor simulations in favor of concurrent execution of multiple simulations. That is, if both coarse-grained and fine-grained parallelism can be exploited, then preference should be given to the coarse-grained parallelism. This analysis was verified in multilevel paralIel computatiorud experiments on networks of workstations (NOWS) and on the Intel TeraFLOPS massively parallel supercomputer. In current work, methods for exploiting additional coarse-grained parallelism in optimization are being investigated so that fine-grained efficiency losses can be further minimized. These activities are focusing on both algorithmic coarse-grained parallel- ism (multiple independent function evaluations) through the development of speculative gradient methods and concurrent iterator strategies and on function evaluation coarse-grained parallelism (multiple separable simulations within a function evaluation) through the development of general partitioning and nested synchronization facilities. The net result is a total of four separate lev- els of parallelism which can minimize efficiency losses and achieve near linear scaling on massively parallel computers.

  5. Programming parallel architectures: The BLAZE family of languages

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  6. Tile-based Level of Detail for the Parallel Age

    SciTech Connect

    Niski, K; Cohen, J D

    2007-08-15

    Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.

  7. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.

  8. Learning in Parallel

    E-print Network

    Vitter, Jeffrey Scott; Lin, Jyh-Han

    1992-01-01

    no restrictionson the representationof the hypothesisreturned by the parallel learning algorithm other than that it isNC-evaluatable: De#0Cnition 2.4. A concept class C n is NC-evaluatable if the problem of determining whether a given hypothesis c2C n is consistent... n is NC-evaluatable, then C n isNC-learnable. Section 3. NC-learnable Concept Classes #2F 5 Proof. The proof is a straightforwardadaptation of the proof for the sequential case given in #5BBlumer, Ehrenfeucht, Haussler, and Warmuth 1989#5D. Suppose...

  9. Parallel sphere rendering

    SciTech Connect

    Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.

    1995-05-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.

  10. Parallel Eclipse Project Checkout

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.

    2011-01-01

    Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any Eclipse-based repository with a similar structure. It also can apply build parameters and preferences automatically at the end of the checkout.

  11. Task Parallel Skeletons for Irregularly Structured Problems

    E-print Network

    Hofstedt, Petra

    Task Parallel Skeletons for Irregularly Structured Problems Petra Hofstedt Department of Computer parallel skeleton into a functional programming language is presented. Task parallel skeletons, as other al- gorithmic skeletons, represent general parallelization patterns. They are introduced into otherwise

  12. Tolerant (parallel) Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Bailey, David H. (Technical Monitor)

    1997-01-01

    In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

  13. Parallel ptychographic reconstruction

    PubMed Central

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris

    2014-01-01

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174

  14. Benchmarking massively parallel architectures

    SciTech Connect

    Lubeck, O.; Moore, J.; Simmons, M.; Wasserman, H.

    1993-07-01

    The purpose of this paper is to summarize some initial experiences related to measuring the performance of massively parallel processors (MPPs) at Los Alamos National Laboratory (LANL). Actually, the range of MPP architectures the authors have used is rather limited, being confined mostly to the Thinking Machines Corporation (TMC) Connection Machine CM-2 and CM-5. Some very preliminary work has been carried out on the Kendall Square KSR-1, and efforts related to other machines, such as the Intel Paragon and the soon-to-be-released CRAY T3D are planned. This paper will concentrate more on methodology rather than discuss specific architectural strengths and weaknesses; the latter is expected to be the subject of future reports. MPP benchmarking is a field in critical need of structure and definition. As the authors have stated previously, such machines have enormous potential, and there is certainly a dire need for orders of magnitude computational power over current supercomputers. However, performance reports for MPPs must emphasize actual sustainable performance from real applications in a careful, responsible manner. Such has not always been the case. A recent paper has described in some detail, the problem of potentially misleading performance reporting in the parallel scientific computing field. Thus, in this paper, the authors briefly offer a few general ideas on MPP performance analysis.

  15. Adaptive, multiresolution visualization of large data sets using parallel octrees.

    SciTech Connect

    Freitag, L. A.; Loy, R. M.

    1999-06-10

    The interactive visualization and exploration of large scientific data sets is a challenging and difficult task; their size often far exceeds the performance and memory capacity of even the most powerful graphics work-stations. To address this problem, we have created a technique that combines hierarchical data reduction methods with parallel computing to allow interactive exploration of large data sets while retaining full-resolution capability. The hierarchical representation is built in parallel by strategically inserting field data into an octree data structure. We provide functionality that allows the user to interactively adapt the resolution of the reduced data sets so that resolution is increased in regions of interest without sacrificing local graphics performance. We describe the creation of the reduced data sets using a parallel octree, the software architecture of the system, and the performance of this system on the data from a Rayleigh-Taylor instability simulation.

  16. Mass-loading and parallel magnetized shocks

    NASA Technical Reports Server (NTRS)

    Zank, G. P.; Oughton, S.; Neubauer, F. M.; Webb, G. M.

    1991-01-01

    Recent observations at comets Giacobini-Zinner and Halley suggest that simple nonreacting gas dynamics or MHD is an inappropriate description for the bow shock. The thickness of the observed (sub)shock implies that mass-loading is an important dynamical process within the shock itself, thereby requiring that the Rankine-Hugoniot conditions possess source terms. This leads to shocks with properties similar to those of combustion shocks. The paper considers parallel magnetized shocks subjected to mass-loading, describes some properties which distinguish them from classical MHD parallel shocks, and establishes the existence of a new kind of MHD compound shock. These results will be of importance both to observations and numerical simulations of the comet-solar wind interaction.

  17. Performance characteristics of a parallel treecode

    E-print Network

    R. Valdarnini

    2002-12-11

    I describe here the performances of a parallel treecode with individual particle timesteps. The code is based on the Barnes-Hut algorithm and runs cosmological N-body simulations on parallel machines with a distributed memory architecture using the MPI message passing library. For a configuration with a constant number of particles per processor the scalability of the code has been tested up to P=32 processors. The average CPU time per processor necessary for solving the gravitational interactions is within $\\sim 10 %$ of that expected from the ideal scaling relation. The load balancing efficiency is high ($\\simgt90%$) if the processor domains are determined every large timestep according to a weighting scheme which takes into account the total particle computational load within the timestep.

  18. Parallel Computing in SCALE

    SciTech Connect

    DeHart, Mark D; Williams, Mark L; Bowman, Stephen M

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement activities has been developed to provide an integrated framework for future methods development. Some of the major components of the SCALE parallel computing development plan are parallelization and multithreading of computationally intensive modules and redesign of the fundamental SCALE computational architecture.

  19. Parallel Grid Manipulations in Earth Science Calculations

    NASA Technical Reports Server (NTRS)

    Sawyer, W.; Lucchesi, R.; daSilva, A.; Takacs, L. L.

    1999-01-01

    The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center is moving its data assimilation system to massively parallel computing platforms. This parallel implementation of GEOS DAS will be used in the DAO's normal activities, which include reanalysis of data, and operational support for flight missions. Key components of GEOS DAS, including the gridpoint-based general circulation model and a data analysis system, are currently being parallelized. The parallelization of GEOS DAS is also one of the HPCC Grand Challenge Projects. The GEOS-DAS software employs several distinct grids. Some examples are: an observation grid- an unstructured grid of points at which observed or measured physical quantities from instruments or satellites are associated- a highly-structured latitude-longitude grid of points spanning the earth at given latitude-longitude coordinates at which prognostic quantities are determined, and a computational lat-lon grid in which the pole has been moved to a different location to avoid computational instabilities. Each of these grids has a different structure and number of constituent points. In spite of that, there are numerous interactions between the grids, e.g., values on one grid must be interpolated to another, or, in other cases, grids need to be redistributed on the underlying parallel platform. The DAO has designed a parallel integrated library for grid manipulations (PILGRIM) to support the needed grid interactions with maximum efficiency. It offers a flexible interface to generate new grids, define transformations between grids and apply them. Basic communication is currently MPI, however the interfaces defined here could conceivably be implemented with other message-passing libraries, e.g., Cray SHMEM, or with shared-memory constructs. The library is written in Fortran 90. First performance results indicate that even difficult problems, such as above-mentioned pole rotation- a sparse interpolation with little data locality between the physical lat-lon grid and a pole rotated computational grid- can be solved efficiently and at the GFlop/s rates needed to solve tomorrow's high resolution earth science models. In the subsequent presentation we will discuss the design and implementation of PILGRIM as well as a number of the problems it is required to solve. Some conclusions will be drawn about the potential performance of the overall earth science models on the supercomputer platforms foreseen for these problems.

  20. Toward Parallel Document Clustering

    SciTech Connect

    Mogill, Jace A.; Haglin, David J.

    2011-09-01

    A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.

  1. Parallel processor engine model program

    NASA Technical Reports Server (NTRS)

    Mclaughlin, P.

    1984-01-01

    The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.

  2. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Lau, Sonie

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

  3. Parallel Imaging Microfluidic Cytometer

    PubMed Central

    Ehrlich, Daniel J.; McKenna, Brian K.; Evans, James G.; Belkina, Anna C.; Denis, Gerald V.; Sherr, David; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of flow cytometry (FACS) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1-D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity and, (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in approximately 6–10 minutes, about 30-times the speed of most current FACS systems. In 1-D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times the sample throughput of CCD-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. PMID:21704835

  4. Parallel TreeSPH

    NASA Astrophysics Data System (ADS)

    Davé, Romeel; Dubinski, John; Hernquist, Lars

    1997-08-01

    We describe PTreeSPH, a gravity treecode combined with an SPH hydrodynamics code designed for parallel supercomputers having distributed memory. Our computational algorithm is based on the popular TreeSPH code of Hernquist & Katz (1989)[ApJS, 70, 419]. PTreeSPH utilizes a domain decomposition procedure and a synchronous hypercube communication paradigm to build self-contained subvolumes of the simulation on each processor at every timestep. Computations then proceed in a manner analogous to a serial code. We use the Message Passing Interface (MPI) communications package, making our code easily portable to a variety of parallel systems. PTreeSPH uses individual smoothing lengths and timesteps, with a communication algorithm designed to minimize exchange of information while still providing all information required to accurately perform SPH computations. We have incorporated periodic boundary conditions with forces calculated using a quadrupole Ewald summation method, and comoving integration under a variety of cosmologies. Following algorithms presented in Katz et al. (1996)[ApJS, 105, 19], we have also included radiative cooling, heating from a parameterized ionizing background, and star formation. A cosmological simulation from z = 49 to z = 2 with 64 3 gas particles and 64 3 dark matter particles requires ˜ 1800 node-hours on a Cray T3D, with a communications overhead of ˜ 8%, load balanced to ? 95% level. When used on the new Cray T3E, this code will be capable of performing cosmological hydrodynamical simulations down to z = 0 with ˜ 2 × 10 6 particles, or to z = 2 with ˜ 10 7 particles, in a reasonable amount of time. Even larger simulations will be practical in situations where the matter is not highly clustered or when periodic boundaries are not required.

  5. The Parallel Java 2 Library Parallel Programming in 100% Java

    E-print Network

    Kaminsky, Alan

    written using Nvidia's CUDA. PJ2 also includes a lightweight map- reduce framework for big data parallel the chosen language and should support all the paradigms--multicore, cluster, GPU, big data, and so on's CUDA for GPU parallel programming, Apa che's Hadoop for mapreduce big data programming. None

  6. Parallel Smoothed Aggregation Multigrid: Aggregation Strategies on Massively Parallel Machines

    SciTech Connect

    Ray S. Tuminaro

    2000-11-09

    Algebraic multigrid methods offer the hope that multigrid convergence can be achieved (for at least some important applications) without a great deal of effort from engineers and scientists wishing to solve linear systems. In this paper the authors consider parallelization of the smoothed aggregation multi-grid method. Smoothed aggregation is one of the most promising algebraic multigrid methods. Therefore, developing parallel variants with both good convergence and efficiency properties is of great importance. However, parallelization is nontrivial due to the somewhat sequential aggregation (or grid coarsening) phase. In this paper, they discuss three different parallel aggregation algorithms and illustrate the advantages and disadvantages of each variant in terms of parallelism and convergence. Numerical results will be shown on the Intel Teraflop computer for some large problems coming from nontrivial codes: quasi-static electric potential simulation and a fluid flow calculation.

  7. Instrumentation for parallel magnetic resonance imaging 

    E-print Network

    Brown, David Gerald

    2007-04-25

    field. This image is formed by the application of a series of RF and static magnetic field gradient pulses (a pulse sequence) which interact with the nuclear magnetic dipoles, or spins, contained within the sample. Pulse sequences are used to scan...) process of assembling the prototype 64-channel parallel receiver system. IGC, Inc. is also gratefully acknowledged for its generous donation of a 0.16 T, whole body, permanent magnet. The IGC magnet was used as a testbed for the development of RF coils...

  8. The science of computing - The evolution of parallel processing

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1985-01-01

    The present paper is concerned with the approaches to be employed to overcome the set of limitations in software technology which impedes currently an effective use of parallel hardware technology. The process required to solve the arising problems is found to involve four different stages. At the present time, Stage One is nearly finished, while Stage Two is under way. Tentative explorations are beginning on Stage Three, and Stage Four is more distant. In Stage One, parallelism is introduced into the hardware of a single computer, which consists of one or more processors, a main storage system, a secondary storage system, and various peripheral devices. In Stage Two, parallel execution of cooperating programs on different machines becomes explicit, while in Stage Three, new languages will make parallelism implicit. In Stage Four, there will be very high level user interfaces capable of interacting with scientists at the same level of abstraction as scientists do with each other.

  9. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  10. Parallel contingency statistics with Titan.

    SciTech Connect

    Thompson, David C.; Pebay, Philippe Pierre

    2009-09-01

    This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.

  11. Parallel node placement method by bubble simulation

    NASA Astrophysics Data System (ADS)

    Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang

    2014-03-01

    An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.

  12. Parallel Strategies for Crash and Impact Simulations

    SciTech Connect

    Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.

    1998-12-07

    We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.

  13. Evaluation of the Interactions between Water Extractable Soil Organic Matter and Metal Cations (Cu(II), Eu(III)) Using Excitation-Emission Matrix Combined with Parallel Factor Analysis

    PubMed Central

    Wei, Jing; Han, Lu; Song, Jing; Chen, Mengfang

    2015-01-01

    The objectives of this study were to evaluate the binding behavior of Cu(II) and Eu(III) with water extractable organic matter (WEOM) in soil, and assess the competitive effect of the cations. Excitation-emission matrix (EEM) fluorescence spectrometry was used in combination with parallel factor analysis (PARAFAC) to obtain four WEOM components: fulvic-like, humic-like, microbial degraded humic-like, and protein-like substances. Fluorescence titration experiments were performed to obtain the binding parameters of PARAFAC-derived components with Cu(II) and Eu(III). The conditional complexation stability constants (logKM) of Cu(II) with the four components ranged from 5.49 to 5.94, and the Eu(III) logKM values were between 5.26 to 5.81. The component-specific binding parameters obtained from competitive binding experiments revealed that Cu(II) and Eu(III) competed for the same binding sites on the WEOM components. These results would help understand the molecular binding mechanisms of Cu(II) and Eu(III) with WEOM in soil environment. PMID:26121300

  14. Rochester checkers player: Multi-model parallel programming for animate vision. Technical report

    SciTech Connect

    Marsh, B.D.; Brown, C.M.; LeBlanc, T.J.; Scott, M.L.; Becker, T.G.

    1991-06-01

    Animate vision systems couple computer vision and robotics to achieve robust and accurate vision, as well as other complex behavior. These systems combine low-level sensory processing and effector output with high-level cognitive planning - all computationally intensive tasks that can benefit from parallel processing. No single model of parallel programming is likely to serve for all tasks, however. Early vision algorithms are intensely data parallel, often utilizing fine-grain parallel computations that share an image, while cognition algorithms decompose naturally by function, often consisting of loosely-coupled, coarse-grain parallel units. A typical animate vision application will likely consist of many tasks, each of which may require a different parallel programming model, and all of which must cooperate to achieve the desired behavior. These multi-model programs require an underlying software system that not only supports several different models of parallel computation simultaneously, but which also allows tasks implemented in different models to interact.

  15. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Lau, Sonie; Yan, Jerry C.

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

  16. Is Monte Carlo embarrassingly parallel?

    SciTech Connect

    Hoogenboom, J. E.

    2012-07-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  17. Template based parallel checkpointing in a massively parallel computer system

    DOEpatents

    Archer, Charles Jens (Rochester, MN); Inglett, Todd Alan (Rochester, MN)

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  18. EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS

    SciTech Connect

    F. PETRINI; W. FENG

    1999-09-01

    We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.

  19. A parallel algorithm for implicit depletant simulations

    NASA Astrophysics Data System (ADS)

    Glaser, Jens; Karas, Andrew S.; Glotzer, Sharon C.

    2015-11-01

    We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded volume surrounding a single translated and/or rotated colloid. A configurational bias scheme is used to enhance the acceptance rate. The method is validated and benchmarked both on multi-core processors and graphics processing units for the case of hard spheres, hemispheres, and discoids. With depletants, we report novel cluster phases in which hemispheres first assemble into spheres, which then form ordered hcp/fcc lattices. The method is significantly faster than any method without cluster moves and that tracks depletants explicitly, for systems of colloid packing fraction ?c < 0.50, and additionally enables simulation of the fluid-solid transition.

  20. Parallel Architecture For Robotics Computation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1990-01-01

    Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.

  1. Parallel inverse iteration with reorthogonalization

    SciTech Connect

    Fann, G.I.; Littlefield, R.J.

    1993-03-01

    A parallel method for finding orthogonal eigenvectors of real symmetric tridiagonal is described. The method uses inverse iteration with repeated Modified Gram-Schmidt (MGS) reorthogonalization of the unconverged iterates for clustered eigenvalues. This approach is more parallelizable than reorthogonalizing against fully converged eigenvectors, as is done by LAPACK`s current DSTEIN routine. The new method is found to provide accuracy and speed comparable to DSTEIN`s and to have good parallel scalability even for matrices with large clusters of eigenvalues. We present al results for residual and orthogonality tests, plus timings on IBM RS/6000 (sequential) and Intel Touchstone DELTA (parallel) computers.

  2. Parallel inverse iteration with reorthogonalization

    SciTech Connect

    Fann, G.I.; Littlefield, R.J.

    1993-03-01

    A parallel method for finding orthogonal eigenvectors of real symmetric tridiagonal is described. The method uses inverse iteration with repeated Modified Gram-Schmidt (MGS) reorthogonalization of the unconverged iterates for clustered eigenvalues. This approach is more parallelizable than reorthogonalizing against fully converged eigenvectors, as is done by LAPACK's current DSTEIN routine. The new method is found to provide accuracy and speed comparable to DSTEIN's and to have good parallel scalability even for matrices with large clusters of eigenvalues. We present al results for residual and orthogonality tests, plus timings on IBM RS/6000 (sequential) and Intel Touchstone DELTA (parallel) computers.

  3. Parallelizing Sequential Programs With Statistical Accuracy Tests

    E-print Network

    Misailovic, Sasa

    2010-08-05

    We present QuickStep, a novel system for parallelizing sequential programs. QuickStep deploys a set of parallelization transformations that together induce a search space of candidate parallel programs. Given a sequential ...

  4. Parallel Marker Based Image Segmentation with Watershed

    E-print Network

    Parallel Marker Based Image Segmentation with Watershed Transformation Alina N. Moga Albert; Parallel Marker Based Watershed Transformation Abstract. The parallel watershed transformation used homogeneity with the watershed transformation. Boundary­based region merging is then effected to condense non

  5. Parallel stochastic systems biology in the cloud.

    PubMed

    Aldinucci, Marco; Torquati, Massimo; Spampinato, Concetto; Drocco, Maurizio; Misale, Claudia; Calcagno, Cristina; Coppo, Mario

    2014-09-01

    The stochastic modelling of biological systems, coupled with Monte Carlo simulation of models, is an increasingly popular technique in bioinformatics. The simulation-analysis workflow may result computationally expensive reducing the interactivity required in the model tuning. In this work, we advocate the high-level software design as a vehicle for building efficient and portable parallel simulators for the cloud. In particular, the Calculus of Wrapped Components (CWC) simulator for systems biology, which is designed according to the FastFlow pattern-based approach, is presented and discussed. Thanks to the FastFlow framework, the CWC simulator is designed as a high-level workflow that can simulate CWC models, merge simulation results and statistically analyse them in a single parallel workflow in the cloud. To improve interactivity, successive phases are pipelined in such a way that the workflow begins to output a stream of analysis results immediately after simulation is started. Performance and effectiveness of the CWC simulator are validated on the Amazon Elastic Compute Cloud. PMID:23780997

  6. STALK : an interactive virtual molecular docking system.

    SciTech Connect

    Levine, D.; Facello, M.; Hallstrom, P.; Reeder, G.; Walenz, B.; Stevens, F.; Univ. of Illinois

    1997-04-01

    Several recent technologies-genetic algorithms, parallel and distributed computing, virtual reality, and high-speed networking-underlie a new approach to the computational study of how biomolecules interact or 'dock' together. With the Stalk system, a user in a virtual reality environment can interact with a genetic algorithm running on a parallel computer to help in the search for likely geometric configurations.

  7. Demonstrating Forces between Parallel Wires.

    ERIC Educational Resources Information Center

    Baker, Blane

    2000-01-01

    Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)

  8. "Feeling" Series and Parallel Resistances.

    ERIC Educational Resources Information Center

    Morse, Robert A.

    1993-01-01

    Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

  9. Predicting performance of parallel computations

    NASA Technical Reports Server (NTRS)

    Mak, Victor W.; Lundstrom, Stephen F.

    1990-01-01

    An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described. A parallel computation is modeled as a task system with precedence relationships expressed as a series-parallel directed acyclic graph. Resources in a concurrent system are modeled as service centers in a queuing network model. Using these two models as inputs, the method outputs predictions of expected execution time of the parallel computation and the concurrent system utilization. The method is validated against both detailed simulation and actual execution on a commercial multiprocessor. Using 100 test cases, the average error of the prediction when compared to simulation statistics is 1.7 percent, with a standard deviation of 1.5 percent; the maximum error is about 10 percent.

  10. New NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

    1997-01-01

    NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.

  11. Parallel Networks for Machine Vision

    E-print Network

    Horn, Berthold K.P.

    1988-12-01

    The amount of computation required to solve many early vision problems is prodigious, and so it has long been thought that systems that operate in a reasonable amount of time will only become feasible when parallel ...

  12. Turbomachinery CFD on parallel computers

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.

    1992-01-01

    The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.

  13. Parallel execution for conflicting transactions

    E-print Network

    Narula, Neha

    2015-01-01

    Multicore main-memory databases only obtain parallel performance when transactions do not conflict. Conflicting transactions are executed one at a time in order to ensure that they have serializable effects. Sequential ...

  14. Web based parallel/distributed medical data mining using software agents

    SciTech Connect

    Kargupta, H.; Stafford, B.; Hamzaoglu, I.

    1997-12-31

    This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.

  15. Physics of the Earth and Planetary Interiors 163 (2007) 222 Toward an automated parallel computing

    E-print Network

    Liu, Mian

    2007-01-01

    Physics of the Earth and Planetary Interiors 163 (2007) 2­22 Toward an automated parallel computing modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment

  16. Nuclear Fragmentation and Its Parallels

    E-print Network

    K. Chase; A. Mekjian

    1993-09-13

    A model for the fragmentation of a nucleus is developed. Parallels of the description of this process with other areas are shown which include Feynman's theory of the $\\lambda$ transition in liquid Helium, Bose condensation, and Markov process models used in stochastic networks and polymer physics. These parallels are used to generalize and further develop a previous exactly solvable model of nuclear fragmentation. An analysis of some experimental data is given.

  17. Graphics applications utilizing parallel processing

    NASA Technical Reports Server (NTRS)

    Rice, John R.

    1990-01-01

    The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.

  18. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  19. Parallel processing of natural language

    SciTech Connect

    Chang, H.O.

    1986-01-01

    Two types of parallel natural language processing are studied in this work: (1) the parallelism between syntactic and nonsyntactic processing and (2) the parallelism within syntactic processing. It is recognized that a syntactic category can potentially be attached to more than one node in the syntactic tree of a sentence. Even if all the attachments are syntactically well-formed, nonsyntactic factors such as semantic and pragmatic consideration may require one particular attachment. Syntactic processing must synchronize and communicate with nonsyntactic processing. Two syntactic processing algorithms are proposed for use in a parallel environment: Early's algorithm and the LR(k) algorithm. Conditions are identified to detect the syntactic ambiguity and the algorithms are augmented accordingly. It is shown that by using nonsyntactic information during syntactic processing, backtracking can be reduced, and the performance of the syntactic processor is improved. For the second type of parallelism, it is recognized that one portion of a grammar can be isolated from the rest of the grammar and be processed by a separate processor. A partial grammar of a larger grammar is defined. Parallel syntactic processing is achieved by using two processors concurrently: the main processor (mp) and the two processors concurrently: the main processor (mp) and the auxiliary processor (ap).

  20. Efficiency of parallel direct optimization.

    PubMed

    Janies, D A; Wheeler, W C

    2001-03-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. PMID:12240679

  1. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  2. Parallel architectures for problem solving

    SciTech Connect

    Kale, L.V.

    1985-01-01

    The problem of exploiting a large amount of hardware in parallel is one of the biggest challenges facing computer science today. The problem of designing parallel architectures and execution methods for solving large combinatorially explosive problems is studied here. Such problems typically do not have a regular structure that can be readily exploited for parallel execution. Prolog is chosen as a language to specify computation because it is seen as a language that is conceptually simple as well as amenable to parallel interpretation. A tree representation of Prolog computation called the REDUCE-OR tree is described as an alternative to the AND-OR tree representation. A process model based on this representation is developed; it captures more parallelism than most other proposed models. A class of bus architectures is proposed to implement the process model. A general model of parallel Prolog systems is developed and the proposed architectures examined in its framework. One of the important features of the proposed architectures is that they limit contracting of work to a close neighborhood. Various interconnection networks are analyzed, and a new one called the lattice-mesh is proposed. The lattice-mesh improves on the square grid of buses, while retaining its linear-area property. An extensive simulation framework was built. Results of some of the experiments conducted on the simulation system are given.

  3. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  4. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  5. A mechanism for efficient debugging of parallel programs

    SciTech Connect

    Miller, B.P.; Choi, J.D.

    1988-01-01

    This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions of the co-operating processes.

  6. Parallel processing for view-dependent polygonal virtual environments

    NASA Astrophysics Data System (ADS)

    El-Sana, Jihad; Varshney, Amitabh

    1999-03-01

    This paper presents a parallel algorithm for preprocessing as well as real-time navigation of view-dependent virtual environments on shared memory multiprocessors. The algorithm proceeds by hierarchical spatial subdivision of the input dataset by an octree. The parallel algorithm is robust and does not generate any artifacts such as degenerate triangles and mesh foldovers. The algorithm performance scales linearly with increase in the number of processors as well as increase in the input dataset complexity. The resulting visualization performance is fast enough to enable interleaved acquisition and modification with interactive visualization.

  7. Application of the hypercube parallel processor to a large-scale moment method code

    NASA Technical Reports Server (NTRS)

    Manshadi, Farzin; Liewer, Paulet C.; Patterson, Jean E.

    1988-01-01

    The applicability of a parallel computing architecture to the solution of a large-scale moment-method code is investigated. Specifically, the NEC (Numerical Electromagnetics Code) method-of-moments scattering program is implemented on a hypercube parallel processor. The accuracy and the increase in the speed of execution on this parallel architecture are demonstrated. The results show a very large reduction in execution time for large problems. The great potential of this parallel processor is shown for interactive solution of large NEC problems as well as other moment-method techniques such as the finite-element method.

  8. Method for resource control in parallel environments using program organization and run-time support

    NASA Technical Reports Server (NTRS)

    Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

    1999-01-01

    A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.

  9. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  10. Computing contingency statistics in parallel.

    SciTech Connect

    Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

    2010-09-01

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

  11. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  12. Parallel plasma fluid turbulence calculations

    NASA Astrophysics Data System (ADS)

    Leboeuf, J. N.; Carreras, B. A.; Charlton, L. A.; Drake, J. B.; Lynch, V. E.; Newman, D. E.; Sidikman, K. L.; Spong, D. A.

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated.

  13. File concepts for parallel I/O

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1989-01-01

    The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.

  14. Use of networked workstations for parallel nonlinear structural dynamic simulations of rotating bladed-disk assemblies

    NASA Technical Reports Server (NTRS)

    Hsieh, Shang-Hsien; Abel, J. F.

    1993-01-01

    The principal objective of this research is to investigate, develop and demonstrate coarse-grained, parallel-processing strategies for nonlinear dynamic simulations for rotating bladed-disk assemblies. The parallel -processing strategies addressed include numerical algorithms for parallel nonlinear solutions and techniques to effect load balancing among processors. The parallel environment employed is a distributed-memory, coarse-grained one consisting of networked workstations. A parallel explicit time integration method has been implemented for transient nonlinear solutions of rotationg bladed-disk assemblies. Automatic domain partitioning techniques have been investigated for load balancing among processors. Advanced computing environments, data structures and interactive computer graphics all contribute to an integrated parallel finite element analysis system to facilitate more efficient and powerful dynamic simulations.

  15. Parallel integrated frame synchronizer chip

    NASA Technical Reports Server (NTRS)

    Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)

    2000-01-01

    A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.

  16. Parallel Adaptive Mesh Refinement Library

    NASA Technical Reports Server (NTRS)

    Mac-Neice, Peter; Olson, Kevin

    2005-01-01

    Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.

  17. Visualizing Parallel Computer System Performance

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.

    1988-01-01

    Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.

  18. Massively Parallel MRI Detector Arrays

    PubMed Central

    Keil, Boris; Wald, Lawrence L

    2013-01-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  19. Fast data parallel polygon rendering

    SciTech Connect

    Ortega, F.A.; Hansen, C.D.

    1993-09-01

    This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.

  20. Nerve-pulse interactions

    SciTech Connect

    Scott, A.C.

    1982-01-01

    Some recent experimental and theoretical results on mechanisms through which individual nerve pulses can interact are reviewed. Three modes of interactions are considered: (1) interaction of pulses as they travel along a single fiber which leads to velocity dispersion; (2) propagation of pairs of pulses through a branching region leading to quantum pulse code transformations; and (3) interaction of pulses on parallel fibers through which they may form a pulse assembly. This notion is analogous to Hebb's concept of a cell assembly, but on a lower level of the neural hierarchy.

  1. Using the Parallel Virtual Machine for Everyday Analysis

    NASA Astrophysics Data System (ADS)

    Noble, M. S.; Houck, J. C.; Davis, J. E.; Young, A.; Nowak, M.

    2006-07-01

    A review of the literature reveals that while parallel computing is sometimes employed by astronomers for custom, large-scale calculations, no package fosters the routine application of parallel methods to standard problems in astronomical data analysis. This paper describes our attempt to close that gap by wrapping the Parallel Virtual Machine (PVM) as a scriptable S-Lang module. Using PVM within ISIS, the Interactive Spectral Interpretation System, we've distributed a number of representive calculations over a network of 25+ CPUs to achieve dramatic reductions in execution times. We discuss how the approach applies to a wide class of modeling problems, outline our efforts to make it more transparent for common use, and note its growing importance in the context of the large, multi-wavelength datasets used in modern analysis.

  2. A vestibular prosthesis with highly-isolated parallel multichannel stimulation.

    PubMed

    Jiang, Dai; Cirmirakis, Dominik; Demosthenous, Andreas

    2015-02-01

    This paper presents an implantable vestibular stimulation system capable of providing high flexibility independent parallel stimulation to the semicircular canals in the inner ear for restoring three-dimensional sensation of head movements. To minimize channel interaction during parallel stimulation, the system is implemented with a power isolation method for crosstalk reduction. Experimental results demonstrate that, with this method, electrodes for different stimulation channels located in close proximity ( mm) can deliver current pulses simultaneously with minimum inter-channel crosstalk. The design features a memory-based scheme that manages stimulation to the three canals in parallel. A vestibular evoked potential (VEP) recording unit is included for closed-loop adaptive stimulation control. The main components of the prototype vestibular prosthesis are three ASICs, all implemented in a 0.6- ?m high-voltage CMOS technology. The measured performance was verified using vestibular electrodes in vitro. PMID:25073175

  3. Parallel computation of seismic analysis of high arch dam

    NASA Astrophysics Data System (ADS)

    Chen, Houqun; Ma, Huaifa; Tu, Jin; Cheng, Guangqing; Tang, Juzhen

    2008-03-01

    Parallel computation programs are developed for three-dimensional meso-mechanics analysis of fully-graded dam concrete and seismic response analysis of high arch dams (ADs), based on the Parallel Finite Element Program Generator (PFEPG). The computational algorithms of the numerical simulation of the meso-structure of concrete specimens were studied. Taking into account damage evolution, static preload, strain rate effect, and the heterogeneity of the meso-structure of dam concrete, the fracture processes of damage evolution and configuration of the cracks can be directly simulated. In the seismic response analysis of ADs, all the following factors are involved, such as the nonlinear contact due to the opening and slipping of the contraction joints, energy dispersion of the far-field foundation, dynamic interactions of the dam-foundation-reservoir system, and the combining effects of seismic action with all static loads. The correctness, reliability and efficiency of the two parallel computational programs are verified with practical illustrations.

  4. Multitarget tracking algorithm parallelization for distributed-memory computing systems

    SciTech Connect

    Popp, R.L.; Pattipati, K.R.; Bar-Shalom, Y.

    1996-12-31

    In this paper we present a robust scalable parallelization of a multitarget tracking algorithm developed for air traffic surveillance. We couple the state estimation and data association problems by embedding an Interacting Multiple Model (IMM) state estimator into an optimization-based assignment framework. A SPMD distributed-memory parallelization is described, wherein the interface to the optimization problem, namely, computing the rather numerous gating and IMM state estimates, covariance calculations, and likelihood function evaluations (used as cost coefficients in the assignment problem), is parallelized. We describe several heuristic algorithms developed for the inherent task allocation problem, wherein the problem is one of assigning track tasks, having uncertain processing costs and negligible communication costs, across a set of homogeneous processors to minimize workload imbalances. Using a measurement database based on two FAA air traffic control radars, courtesy of Rome Laboratory, we show that near linear speedups are obtainable on a 32-node Intel Paragon supercomputer using simple task allocation algorithms.

  5. Using the Parallel Virtual Machine for Everyday Analysis

    E-print Network

    M. S. Noble; J. C. Houck; J. E. Davis; A. Young; M. Nowak

    2005-10-24

    A review of the literature reveals that while parallel computing is sometimes employed by astronomers for custom, large-scale calculations, no package fosters the routine application of parallel methods to standard problems in astronomical data analysis. This paper describes our attempt to close that gap by wrapping the Parallel Virtual Machine (PVM) as a scriptable S-Lang module. Using PVM within ISIS, the Interactive Spectral Interpretation System, we've distributed a number of representive calculations over a network of 25+ CPUs to achieve dramatic reductions in execution times. We discuss how the approach applies to a wide class of modeling problems, outline our efforts to make it more transparent for common use, and note its growing importance in the context of the large, multi-wavelength datasets used in modern analysis.

  6. Parallel-processing techniques for production systems

    SciTech Connect

    da Mota Tenorio, M.F.

    1987-01-01

    Production systems static and dynamic characteristics are modeled with the use of graph grammar, in order to create means to increase the processing efficiency and the use of parallel computation through compile-time analysis. The model is used to explicate rule interaction, so that proofs of equivalence between knowledge bases can be attempted. Solely relying on program static characteristics shown by the model, a series of observations are made to determine the system dynamic characteristics and modifications to the original knowledge base are suggested as a means of increasing efficiency and decreasing overall search and computational effort. Dependencies between the rules are analyzed and different approaches for automatic detection are presented. From rule dependences, tools for programming environments,logical evaluation of search spaces and Petri net models of production systems are shown. An algorithm for the allocation and partitioning of a production system into a multiprocessor system is also shown, and addresses the problems of communication and execution of these systems in parallel. Finally, the results of a simulator constructed to test several strategies, networks, and algorithms are presented.

  7. Orientation-Enhanced Parallel Coordinate Plots.

    PubMed

    Raidou, Renata Georgia; Eisemann, Martin; Breeuwer, Marcel; Eisemann, Elmar; Vilanova, Anna

    2016-01-01

    Parallel Coordinate Plots (PCPs) is one of the most powerful techniques for the visualization of multivariate data. However, for large datasets, the representation suffers from clutter due to overplotting. In this case, discerning the underlying data information and selecting specific interesting patterns can become difficult. We propose a new and simple technique to improve the display of PCPs by emphasizing the underlying data structure. Our Orientation-enhanced Parallel Coordinate Plots (OPCPs) improve pattern and outlier discernibility by visually enhancing parts of each PCP polyline with respect to its slope. This enhancement also allows us to introduce a novel and efficient selection method, the Orientation-enhanced Brushing (O-Brushing). Our solution is particularly useful when multiple patterns are present or when the view on certain patterns is obstructed by noise. We present the results of our approach with several synthetic and real-world datasets. Finally, we conducted a user evaluation, which verifies the advantages of the OPCPs in terms of discernibility of information in complex data. It also confirms that O-Brushing eases the selection of data patterns in PCPs and reduces the amount of necessary user interactions compared to state-of-the-art brushing techniques. PMID:26529720

  8. Parallel algorithms for mapping pipelined and parallel computations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  9. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-03-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.

  10. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-12-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processes. User programs and their gangs of processes are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantum are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory.

  11. Parallelization of the SIR code

    NASA Astrophysics Data System (ADS)

    Thonhofer, S.; Bellot Rubio, L. R.; Utz, D.; Jur?ak, J.; Hanslmeier, A.; Piantschitsch, I.; Pauritsch, J.; Lemmerer, B.; Guttenbrunner, S.

    A high-resolution 3-dimensional model of the photospheric magnetic field is essential for the investigation of small-scale solar magnetic phenomena. The SIR code is an advanced Stokes-inversion code that deduces physical quantities, e.g. magnetic field vector, temperature, and LOS velocity, from spectropolarimetric data. We extended this code by the capability of directly using large data sets and inverting the pixels in parallel. Due to this parallelization it is now feasible to apply the code directly on extensive data sets. Besides, we included the possibility to use different initial model atmospheres for the inversion, which enhances the quality of the results.

  12. parallel_dp: The Parallel Dynamic Programming Design Pattern as an IntelR

    E-print Network

    Tang, Peiyi

    parallel_dp: The Parallel Dynamic Programming Design Pattern as an IntelR Threading Building Blocks and participant collaboration of this design pattern. We pro- pose the parallel dp algorithm template by parallel dp. We analyze the per- formance of our solution by applying parallel dp to create four TBB

  13. Debugging and analysis of large-scale parallel programs. Doctoral thesis

    SciTech Connect

    Mellor-Crummey, J.M.

    1989-09-01

    One of the most serious problems in the development cycle of large-scale parallel programs is the lack of tools for debugging and performance analysis. Parallel programs are more difficult to analyze than their sequential counterparts for several reasons. First, race conditions in parallel programs can cause non-deterministic behavior, which reduces the effectiveness of traditional cyclic debugging techniques. Second, invasive, interactive analysis can distort a parallel program's execution beyond recognition. Finally, comprehensive analysis of a parallel program's execution requires collection, management, and presentation of an enormous amount of information. This dissertation addresses the problem of debugging and analysis of large-scale parallel programs executing on shared-memory multiprocessors. It proposes a methodology for top-down analysis of parallel program executions that replaces previous ad-hoc approaches. To support this methodology, a formal model for shared-memory communication among processes in a parallel program is developed. It is shown how synchronization traces based on this abstract model can be used to create indistinguishable executions that form the basis for debugging. This result is used to develop a practical technique for tracing parallel program executions on shared-memory parallel processors so that their executions can be repeated deterministically on demand.

  14. A brief parallel I/O tutorial.

    SciTech Connect

    Ward, H. Lee

    2010-03-01

    This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a way that minimizes interaction with the I/O system, maximizes the amount of data moved per call, and provides the I/O system the most information about the I/O transfer per request.

  15. Parallel processing of atmospheric chemistry calculations: Preliminary considerations

    SciTech Connect

    Elliott, S.; Jones, P.

    1995-01-01

    Global climate calculations are already saturating the class modern vector supercomputers with only a few central processing units. Increased resolution and inclusion of routines to deal with biogeochemical portions of the terrestrial climate system will soon demand massively parallel approaches. The atmospheric photochemistry ensemble is intimately linked to climate through the trace greenhouse gases ozone and methane and modules for representing it are being attached to global three dimensional transport and GCM frameworks. Atmospheric kinetics involve dozens of highly interactive tracers and so will accentuate the need for parallel processing of earth system simulations. In the present text we lay some of the groundwork for addition of atmospheric kinetics packages to GCM and global scale atmospheric models on multiply parallel computers. The discussion is tailored for consumption by the photochemical modelling community. After a review of numerical atmospheric chemistry methods, we examine how kinetics can be implemented on a parallel computer. We concentrate especially on data layout and flexibility and how these can be implemented in various programming models. We conclude that chemistry can be implemented rather easily within existing frameworks of several parallel atmospheric models. However, memory limitations may preclude high resolution studies of global chemistry.

  16. Standard Templates Adaptive Parallel Library 

    E-print Network

    Arzu, Francisco Jose

    2000-01-01

    of parallelism in areas such as geometric modeling or graph algorithms, which use dynamic linked data structures. STAPL is intended to replace STL in a user transparent manner and run on a small to medium scale shared memory multiprocessor machine, which supports...

  17. Parallel, Distributed Scripting with Python

    SciTech Connect

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI library gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.

  18. Matpar: Parallel Extensions for MATLAB

    NASA Technical Reports Server (NTRS)

    Springer, P. L.

    1998-01-01

    Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

  19. Towards Pervasive Parallelism Kunle Olukotun

    E-print Network

    John, Lizy Kurian

    (vs.VAX-11/780) 25%/year 52%/year ??%/year Stanford Hydra Project CMP + TLS Afara Websystems Sun Niagara 1 Heterogeneous cores (CPU+GPUs), app-specific accelerators Deep memory hierarchies Challenge: harness, the requirements of emerging applications, and the challenges of future parallel architectures #12;The Stanford

  20. Hybrid Parallel Part I. Preliminaries

    E-print Network

    Kaminsky, Alan

    bitcoin mining program in Chapter 13 doesn't nec essarily take full advantage of the cluster's parallel have to mine 40 or more bitcoins to take full advantage of the cluster. If I mine fewer than 40 bitcoins, some of the cores will be idle. That's not good. I want to put those idle cores to use. T I can

  1. Parallels between wind and crowd loading of bridges.

    PubMed

    McRobie, Allan; Morgenthal, Guido; Abrams, Danny; Prendergast, John

    2013-06-28

    Parallels between the dynamic response of flexible bridges under the action of wind and under the forces induced by crowds allow each field to inform the other. Wind-induced behaviour has been traditionally classified into categories such as flutter, galloping, vortex-induced vibration and buffeting. However, computational advances such as the vortex particle method have led to a more general picture where effects may occur simultaneously and interact, such that the simple semantic demarcations break down. Similarly, the modelling of individual pedestrians has progressed the understanding of human-structure interaction, particularly for large-amplitude lateral oscillations under crowd loading. In this paper, guided by the interaction of flutter and vortex-induced vibration in wind engineering, a framework is presented, which allows various human-structure interaction effects to coexist and interact, thereby providing a possible synthesis of previously disparate experimental and theoretical results. PMID:23690640

  2. A CS1 pedagogical approach to parallel thinking

    NASA Astrophysics Data System (ADS)

    Rague, Brian William

    Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of any initial three-week CS1 level module when compared with student comprehension levels just prior to starting the course. Survey results measured during the ninth week of the course reveal that performance levels remained high compared to pre-course performance scores. A second result produced by this study reveals no statistically significant interaction effect between the intervention method and student performance as measured by the evaluation instrument over three separate testing periods. However, visual inspection of survey score trends and the low p-value generated by the interaction analysis (0.062) indicate that further studies may verify improved concept retention levels for the lecture w/PAT group.

  3. Parallel VLSI Circuit Analysis and Optimization 

    E-print Network

    Ye, Xiaoji

    2012-02-14

    circuit simulation techniques and achieves superlinear speedup in practice. The second part of the dissertation talks about parallel circuit optimization. A modified asynchronous parallel pattern search (APPS) based method which utilizes the efficient...

  4. Adaptively Parallel Processor Allocation for Cilk Jobs

    E-print Network

    Sen, Siddhartha

    The problem of allocating processor resources fairly and efficiently to parallel jobs has been studied extensively in the past. Most of this work, however, assumes that the instantaneous parallelism of the jobs is known ...

  5. Parallelizing alternating direction implicit solver on GPUs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...

  6. Implementing clips on a parallel computer

    NASA Technical Reports Server (NTRS)

    Riley, Gary

    1987-01-01

    The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.

  7. Parallelizing Sequential Programs with Statistical Accuracy Tests

    E-print Network

    Misailovic, Sasa

    We present QuickStep, a novel system for parallelizing sequential programs. Unlike standard parallelizing compilers (which are designed to preserve the semantics of the original sequential computation), QuickStep is instead ...

  8. Speculative parallelism in Intel Cilk Plus

    E-print Network

    Perez, Ruben (Ruben M.)

    2012-01-01

    Certain algorithms can be effectively parallelized at the cost of performing some redundant work. One example is searching an unordered tree graph for a particular node. Each subtree can be searched in parallel by a separate ...

  9. Parallel Coupled Micro-Macro Actuators

    E-print Network

    Morrell, John Bryant

    1996-01-01

    This thesis presents a new actuator system consisting of a micro-actuator and a macro-actuator coupled in parallel via a compliant transmission. The system is called the Parallel Coupled Micro-Macro Actuator, or PaCMMA. ...

  10. Parallel supercomputing with commodity components

    NASA Technical Reports Server (NTRS)

    Warren, M. S.; Goda, M. P.; Becker, D. J.

    1997-01-01

    We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10(sup 15) floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.

  11. Merlin - Massively parallel heterogeneous computing

    NASA Technical Reports Server (NTRS)

    Wittie, Larry; Maples, Creve

    1989-01-01

    Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

  12. Parallel supercomputing with commodity components

    SciTech Connect

    Warren, M.S.; Goda, M.P.; Becker, D.J.

    1997-09-01

    We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10{sup 15} floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.

  13. Parallel processing spacecraft communication system

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)

    1998-01-01

    An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.

  14. Minisymposium MS19 Parallel Crypto

    E-print Network

    Kaminsky, Alan

    Rochester Institute of Technology February 24, 2010 2010 SIAM Conference on Parallel Processing 2014 2015 2016 2017 2018 2019 1E1 1E2 1E3 1E4 1E5 1E6 1E7 1E8 1E9 1 minute 1 hour 1 day 1 month 1 year Crypto Attack Times on a Top Supercomputer 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 1E-3 1E

  15. Parallel Processing in Combustion Analysis

    NASA Technical Reports Server (NTRS)

    Schunk, Richard Gregory; Chung, T. J.

    2000-01-01

    The objective of this research is to demonstrate the application of the Flow-field Dependent Variation (FDV) method to a problem of current interest in supersonic chemical combustion. Due in part to the stiffness of the chemical reactions, the solution of such problems on unstructured three dimensional grids often dictates the use of parallel computers. Preliminary results for the injection of a supersonic hydrogen stream into vitiated air are presented.

  16. Efficient, massively parallel eigenvalue computation

    NASA Technical Reports Server (NTRS)

    Huo, Yan; Schreiber, Robert

    1993-01-01

    In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.

  17. Force user's manual: A portable, parallel FORTRAN

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1990-01-01

    The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.

  18. Parallel machine architecture and compiler design facilities

    NASA Technical Reports Server (NTRS)

    Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

    1990-01-01

    The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.

  19. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  20. Highly parallel sparse Cholesky factorization

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1990-01-01

    Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.

  1. A flight investigation of blade section aerodynamics for a helicopter main rotor having NLR-1T airfoil sections

    NASA Technical Reports Server (NTRS)

    Morris, C. E. K., Jr.; Stevens, D. D.; Tomaine, R. L.

    1980-01-01

    A flight investigation was conducted using a teetering-rotor AH-1G helicopter to obtain data on the aerodynamic behavior of main-rotor blades with the NLR-1T blade section. The data system recorded blade-section aerodynamic pressures at 90 percent rotor radius as well as vehicle flight state, performance, and loads. The test envelope included hover, forward flight, and collective-fixed maneuvers. Data were obtained on apparent blade-vortex interactions, negative lift on the advancing blade in high-speed flight and wake interactions in hover. In many cases, good agreement was achieved between chordwise pressure distributions predicted by airfoil theory and flight data with no apparent indications of blade-vortex interactions.

  2. MASSIVELY PARALLEL MICROFLUIDIC CELL-PAIRING PLATFORM FOR THE STATISTICAL STUDY OF IMMUNOLOGICAL CELL-CELL

    E-print Network

    Voldman, Joel

    MASSIVELY PARALLEL MICROFLUIDIC CELL-PAIRING PLATFORM FOR THE STATISTICAL STUDY OF IMMUNOLOGICAL CELL-CELL INTERACTIONS Melanie M. Hoehl1 , Stephanie K. Dougan2 , Hidde L. Ploegh2 , Joel Voldman3 1 of Electrical Engineering and Computer Science, MIT ABSTRACT Variability in cell-cell interactions is ubiquitous

  3. Parallelizing Time With Polynomial Circuits Ryan Williams

    E-print Network

    Boneh, Dan

    Parallelizing Time With Polynomial Circuits Ryan Williams Institute for Advanced Study, Princeton computations with circuits of polynomial size. We give an algorithmic size-depth tradeoff for parallelizing parallel simulation yields logspace-uniform tO(1) size, O(t/ log t)-depth Boolean circuits having semi

  4. Structured Hardware Compilation of Parallel Programs

    E-print Network

    Luk, Wayne

    implementation. A circuit module is developed for each control structure, such as sequential or parallelStructured Hardware Compilation of Parallel Programs Wayne Luk, David Ferguson and Ian Page. The potential of this approach is evaluated. INTRODUCTION Recent work has shown how parallel programs

  5. Scheduling Computationally Intensive Data Parallel Raghu Subramanian

    E-print Network

    Scherson, Isaac D.

    Scheduling Computationally Intensive Data Parallel Programs Raghu Subramanian Systems Technology of Information and Computer Science University of California, Irvine Abstract. We consider the problem of how to run a workload of multiple parallel jobs on a single parallel machine. Jobs are assumed to be data

  6. Parallel Computing Using Web Servers and "Servlets".

    ERIC Educational Resources Information Center

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  7. Parallel Programming in MATLAB Piotr Luszczek

    E-print Network

    Luszczek, Piotr

    value decomposition (SVD). This theorem can be represented with the following MATLAB code: N = 1000Parallel Programming in MATLAB Piotr Luszczek July 20, 2009 Abstract A visit to the neighborhood PC in the software infrastructure ­ a shift to parallel computing. 1 Introduction MATLAB® and Parallel Computing

  8. Parallel Communicating Grammar Systems Lila Santean

    E-print Network

    Kari, Lila

    Parallel Communicating Grammar Systems Lila Santean Academy of Finland and Mathematics Department], Russian parallel [4] and Indian parallel [4] grammars are some ex- amples of such models based on formal and introduce most embarassing performance limitations. Cooperating /distributed grammar systems are an attempt

  9. The horrors of parallel programming Max Neunhffer

    E-print Network

    Neunhöffer, Max

    The horrors of parallel programming Max Neunhöffer HPCGAP workshop 19­23 August 2013 Max Neunhöffer (University of St Andrews) The horrors of parallel programming 22 August 2013 1 / 11 #12;Changing mutable data -3 Max Neunhöffer (University of St Andrews) The horrors of parallel programming 22 August 2013 2

  10. Parallel Processing at the High School Level.

    ERIC Educational Resources Information Center

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  11. Data Parallel Computing on Graphics Hardware

    E-print Network

    Bejerano, Gill

    Data Parallel Computing on Graphics Hardware Data Parallel Computing on Graphics Hardware Ian Buck of GPUs as Streaming processor #12;July 27th, 2003 3 Why graphics hardwareWhy graphics hardware Raw variables ­ No Read-Modify-Write textures · Multiple "pixel pipes" ­ Data Parallelism · Support ALU heavy

  12. Refinement Transformation Using Abstract Parallel Machines

    E-print Network

    Goodman, Joy

    defined in an Abstract Parallel Machine. We illustrate the method with a simple case study: the refinementRefinement Transformation Using Abstract Parallel Machines Joy Goodman 1 , John O'Donnell 1 and Gudula R¨unger 2 1 University of Glasgow 2 Universit¨at Leipzig Abstract. Abstract Parallel Machines

  13. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... there is a parallel proceeding described in paragraph (a)(1) of this section and shall return the... pursuant to this paragraph. (2) If notice of a parallel proceeding described in paragraph (a)(1) of this... parallel proceeding described in paragraph (a)(2) or (a)(3) of this section, and shall notify all...

  14. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... there is a parallel proceeding described in paragraph (a)(1) of this section and shall return the... pursuant to this paragraph. (2) If notice of a parallel proceeding described in paragraph (a)(1) of this... parallel proceeding described in paragraph (a)(2) or (a)(3) of this section, and shall notify all...

  15. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... there is a parallel proceeding described in paragraph (a)(1) of this section and shall return the... pursuant to this paragraph. (2) If notice of a parallel proceeding described in paragraph (a)(1) of this... parallel proceeding described in paragraph (a)(2) or (a)(3) of this section, and shall notify all...

  16. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... there is a parallel proceeding described in paragraph (a)(1) of this section and shall return the... pursuant to this paragraph. (2) If notice of a parallel proceeding described in paragraph (a)(1) of this... parallel proceeding described in paragraph (a)(2) or (a)(3) of this section, and shall notify all...

  17. Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    ERIC Educational Resources Information Center

    Agarwal, Mayank

    2009-01-01

    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…

  18. Parallel Approaches to Stochastic Global Optimization

    E-print Network

    Neumaier, Arnold

    Parallel Approaches to Stochastic Global Optimization G¨unter Rudolph rudolph@ls11.informatik In this paper we review parallel implementations of some stochas­ tic global optimization methods on MIMD computers. Moreover, we present a new parallel version of an Evolutionary Algorithm for global optimization

  19. PARALLEL BLOCK VECTORS: COLLECTION, ANALYSIS, AND USES

    E-print Network

    ................................................................................................................................................................................................................... PARALLEL BLOCK VECTORS (PBVS) LINK A MULTITHREADED APPLICATION'S CODE TO THE VARYING DEGREES OF PARALLELISM can be teased apart for indi- vidual analysis, and users can track the changes in parallel- ism of fine-grained code regions. With careful engineering effort, PBVs are neither complex nor expensive

  20. NWChem: scalable parallel computational chemistry

    SciTech Connect

    van Dam, Hubertus JJ; De Jong, Wibe A.; Bylaska, Eric J.; Govind, Niranjan; Kowalski, Karol; Straatsma, TP; Valiev, Marat

    2011-11-01

    NWChem is a general purpose computational chemistry code specifically designed to run on distributed memory parallel computers. The core functionality of the code focuses on molecular dynamics, Hartree-Fock and density functional theory methods for both plane-wave basis sets as well as Gaussian basis sets, tensor contraction engine based coupled cluster capabilities and combined quantum mechanics/molecular mechanics descriptions. It was realized from the beginning that scalable implementations of these methods required a programming paradigm inherently different from what message passing approaches could offer. In response a global address space library, the Global Array Toolkit, was developed. The programming model it offers is based on using predominantly one-sided communication. This model underpins most of the functionality in NWChem and the power of it is exemplified by the fact that the code scales to tens of thousands of processors. In this paper the core capabilities of NWChem are described as well as their implementation to achieve an efficient computational chemistry code with high parallel scalability. NWChem is a modern, open source, computational chemistry code1 specifically designed for large scale parallel applications2. To meet the challenges of developing efficient, scalable and portable programs of this nature a particular code design was adopted. This code design involved two main features. First of all, the code is build up in a modular fashion so that a large variety of functionality can be integrated easily. Secondly, to facilitate writing complex parallel algorithms the Global Array toolkit was developed. This toolkit allows one to write parallel applications in a shared memory like approach, but offers additional mechanisms to exploit data locality to lower communication overheads. This framework has proven to be very successful in computational chemistry but is applicable to any engineering domain. Within the context created by the features above NWChem has grown into a general purpose computational chemistry code that supports a wide variety of energy expressions and capabilities to calculate properties based there upon. The main energy expressions are classical mechanics force fields, Hartree-Fock and DFT both for finite systems and condensed phase systems, coupled cluster, as well as QM/MM. For most energy expressions single point calculations, geometry optimizations, excited states, and other properties are available. Below we briefly discuss each of the main energy expressions and the critical points involved in scalable implementations thereof.

  1. Interacting Parallel Constructions of Knowledge in a CAS Context

    ERIC Educational Resources Information Center

    Kidron, Ivy; Dreyfus, Tommy

    2010-01-01

    We consider the influence of a CAS context on a learner's process of constructing a justification for the bifurcations in a logistic dynamical process. We describe how instrumentation led to cognitive constructions and how the roles of the learner and the CAS intertwine, especially close to the branching and combining of constructing actions. The…

  2. Collective Interaction of a Compressible Periodic Parallel Jet Flow

    NASA Technical Reports Server (NTRS)

    Miles, Jeffrey Hilton

    1997-01-01

    A linear instability model for multiple spatially periodic supersonic rectangular jets is solved using Floquet-Bloch theory. The disturbance environment is investigated using a two dimensional perturbation of a mean flow. For all cases large temporal growth rates are found. This work is motivated by an increase in mixing found in experimental measurements of spatially periodic supersonic rectangular jets with phase-locked screech. The results obtained in this paper suggests that phase-locked screech or edge tones may produce correlated spatially periodic jet flow downstream of the nozzles which creates a large span wise multi-nozzle region where a disturbance can propagate. The large temporal growth rates for eddies obtained by model calculation herein are related to the increased mixing since eddies are the primary mechanism that transfer energy from the mean flow to the large turbulent structures. Calculations of growth rates are presented for a range of Mach numbers and nozzle spacings corresponding to experimental test conditions where screech synchronized phase locking was observed. The model may be of significant scientific and engineering value in the quest to understand and construct supersonic mixer-ejector nozzles which provide increased mixing and reduced noise.

  3. Parallel multiscale simulations of a brain aneurysm

    SciTech Connect

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver N??T?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (N??T?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

  4. Large-Scale Parallel Programming: Experience with the BBN Butterfly Parallel Processor

    E-print Network

    Scott, Michael L.

    Large-Scale Parallel Programming: Experience with the BBN Butterfly Parallel Processor Thomas J. Le of Rochester have used a collection of BBN Butterfly TM Parallel Processors to conduct research in parallel with the Butterfly we have ported three compilers, developed five major and several minor library packages, built two

  5. Ion Pair-? Interactions.

    PubMed

    Fujisawa, Kaori; Humbert-Droz, Marie; Letrun, Romain; Vauthey, Eric; Wesolowski, Tomasz A; Sakai, Naomi; Matile, Stefan

    2015-09-01

    We report that anion-? and cation-? interactions can occur on the same aromatic surface. Interactions of this type are referred to as ion pair-? interactions. Their existence, nature, and significance are elaborated in the context of spectral tuning, ion binding in solution, and activation of cell-penetrating peptides. The origin of spectral tuning by ion pair-? interactions is unraveled with energy-minimized excited-state structures: The solvent- and pH-independent red shift of absorption and emission of push-pull fluorophores originates from antiparallel ion pair-? attraction to their polarized excited state. In contrast, the complementary parallel ion pair-? repulsion is spectroscopically irrelevant, in part because of charge neutralization by intriguing proton and electron transfers on excited push-pull surfaces. With time-resolved fluorescence measurements, very important differences between antiparallel and parallel ion pair-? interactions are identified and quantitatively dissected from interference by aggregation and ion pair dissociation. Contributions from hydrogen bonding, proton transfer, ?-? interactions, chromophore twisting, ion pairing, and self-assembly are systematically addressed and eliminated by concise structural modifications. Ion-exchange studies in solution, activation of cell-penetrating peptides in vesicles, and computational analysis all imply that the situation in the ground state is complementary to spectral tuning in the excited state; i.e., parallel rather than antiparallel ion pair-? interactions are preferred, despite repulsion from the push-pull dipole. The overall quite complete picture of ion pair-? interactions provided by these remarkably coherent yet complex results is expected to attract attention throughout the multiple disciplines of chemistry involved. PMID:26291550

  6. Parallel ecological networks in ecosystems

    PubMed Central

    Olff, Han; Alonso, David; Berg, Matty P.; Eriksson, B. Klemens; Loreau, Michel; Piersma, Theunis; Rooney, Neil

    2009-01-01

    In ecosystems, species interact with other species directly and through abiotic factors in multiple ways, often forming complex networks of various types of ecological interaction. Out of this suite of interactions, predator–prey interactions have received most attention. The resulting food webs, however, will always operate simultaneously with networks based on other types of ecological interaction, such as through the activities of ecosystem engineers or mutualistic interactions. Little is known about how to classify, organize and quantify these other ecological networks and their mutual interplay. The aim of this paper is to provide new and testable ideas on how to understand and model ecosystems in which many different types of ecological interaction operate simultaneously. We approach this problem by first identifying six main types of interaction that operate within ecosystems, of which food web interactions are one. Then, we propose that food webs are structured among two main axes of organization: a vertical (classic) axis representing trophic position and a new horizontal ‘ecological stoichiometry’ axis representing decreasing palatability of plant parts and detritus for herbivores and detrivores and slower turnover times. The usefulness of these new ideas is then explored with three very different ecosystems as test cases: temperate intertidal mudflats; temperate short grass prairie; and tropical savannah. PMID:19451126

  7. Parallel ecological networks in ecosystems.

    PubMed

    Olff, Han; Alonso, David; Berg, Matty P; Eriksson, B Klemens; Loreau, Michel; Piersma, Theunis; Rooney, Neil

    2009-06-27

    In ecosystems, species interact with other species directly and through abiotic factors in multiple ways, often forming complex networks of various types of ecological interaction. Out of this suite of interactions, predator-prey interactions have received most attention. The resulting food webs, however, will always operate simultaneously with networks based on other types of ecological interaction, such as through the activities of ecosystem engineers or mutualistic interactions. Little is known about how to classify, organize and quantify these other ecological networks and their mutual interplay. The aim of this paper is to provide new and testable ideas on how to understand and model ecosystems in which many different types of ecological interaction operate simultaneously. We approach this problem by first identifying six main types of interaction that operate within ecosystems, of which food web interactions are one. Then, we propose that food webs are structured among two main axes of organization: a vertical (classic) axis representing trophic position and a new horizontal 'ecological stoichiometry' axis representing decreasing palatability of plant parts and detritus for herbivores and detrivores and slower turnover times. The usefulness of these new ideas is then explored with three very different ecosystems as test cases: temperate intertidal mudflats; temperate short grass prairie; and tropical savannah. PMID:19451126

  8. Instant well-log inversion with a parallel computer

    SciTech Connect

    Kimminau, S.J.; Trivedi, H.

    1993-08-01

    Well-log analysis requires several vectors of input data to be inverted with a physical model that produces more vectors of output data. The problem is inherently suited to either vectorization or parallelization. PLATO (parallel log analysis, timely output) is a research prototype system that uses a parallel architecture computer with memory-mapped graphics to invert vector data and display the result rapidly. By combining this high-performance computing and display system with a graphical user interface, the analyst can interact with the system in real time'' and can visualize the result of changing parameters on up to 1,000 levels of computed volumes and reconstructed logs. It is expected that such instant'' inversion will remove the main disadvantages frequently cited for simultaneous analysis methods, namely difficulty in assessing sensitivity to different parameters and slow output response. Although the prototype system uses highly specific features of a parallel processor, a subsequent version has been implemented on a conventional (Serial) workstation with less performance but adequate functionality to preserve the apparently instant response. PLATO demonstrates the feasibility of petroleum computing applications combining an intuitive graphical interface, high-performance computing of physical models, and real-time output graphics.

  9. High Performance Parallel Computational Nanotechnology

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to control mini robotic manipulators for positional control; scalable numerical algorithms for reliability, verifications and testability. There appears no fundamental obstacle to simulating molecular compilers and molecular computers on high performance parallel computers, just as the Boeing 777 was simulated on a computer before manufacturing it.

  10. A Semi-Empirical Noise Modeling Method for Helicopter Maneuvering Flight Operations

    NASA Technical Reports Server (NTRS)

    Greenwood, Eric; Schmitz, Fredric; Sickenberger, Richard D.

    2012-01-01

    A new model for Blade-Vortex Interaction noise generation during maneuvering flight is developed in this paper. Acoustic and performance data from both flight and wind tunnels are used to derive a non-dimensional and analytical performance/acoustic model that describes BVI noise in steady flight. The model is extended to transient maneuvering flight (pure pitch and roll transients) by using quasisteady assumptions throughout the prescribed maneuvers. Ground noise measurements, taken during maneuvering flight of a Bell 206B helicopter, show that many of the noise radiation details are captured. The result is a computationally efficient Blade-Vortex Interaction noise model with sufficient accuracy to account for transient maneuvering flight. The code can be run in real time to predict transient maneuver noise and is suitable for use in an acoustic mission-planning tool.

  11. The prediction of transonic loading advancing helicopter rotors

    NASA Technical Reports Server (NTRS)

    Strawn, R.; Tung, C.

    1986-01-01

    Two different schemes are presented for including the effect of rotor wakes on the finite-difference prediction of rotor loads. The first formulation includes wake effects by means of a blade-surface inflow specification. This approach is sufficiently simple to permit coupling of a full-potential finite-difference rotor code to a comprehensive integral model for the rotor wake and blade motion. The coupling involves a transfer of appropriate loads and inflow data between the two computer codes. Results are compared with experimental data for two advancing rotor cases. The second rotor wake modeling scheme in this paper is a split potential formulation for computing unsteady blade-vortex interactions. Discrete vortex fields are introduced into a three-dimensional, conservative, full-potential rotor code. Computer predictions are compared with two experimental blade-vortex interaction cases.

  12. Time parallel gravitational collapse simulation

    E-print Network

    Andreas Kreienbuehl; Pietro Benedusi; Daniel Ruprecht; Rolf Krause

    2015-09-04

    This article demonstrates the applicability of the parallel-in-time method Parareal to the numerical solution of the Einstein gravity equations for the spherical collapse of a massless scalar field. To account for the shrinking of the spatial domain in time, a tailored load balancing scheme is proposed and compared to load balancing based on number of time steps alone. The performance of Parareal is studied for both the sub-critical and black hole case; our experiments show that Parareal generates substantial speedup and, in the super-critical regime, can also reproduce the black hole mass scaling law.

  13. True Shear Parallel Plate Viscometer

    NASA Technical Reports Server (NTRS)

    Ethridge, Edwin; Kaukler, William

    2010-01-01

    This viscometer (which can also be used as a rheometer) is designed for use with liquids over a large temperature range. The device consists of horizontally disposed, similarly sized, parallel plates with a precisely known gap. The lower plate is driven laterally with a motor to apply shear to the liquid in the gap. The upper plate is freely suspended from a double-arm pendulum with a sufficiently long radius to reduce height variations during the swing to negligible levels. A sensitive load cell measures the shear force applied by the liquid to the upper plate. Viscosity is measured by taking the ratio of shear stress to shear rate.

  14. Parallel Assembly of LIGA Components

    SciTech Connect

    Christenson, T.R.; Feddema, J.T.

    1999-03-04

    In this paper, a prototype robotic workcell for the parallel assembly of LIGA components is described. A Cartesian robot is used to press 386 and 485 micron diameter pins into a LIGA substrate and then place a 3-inch diameter wafer with LIGA gears onto the pins. Upward and downward looking microscopes are used to locate holes in the LIGA substrate, pins to be pressed in the holes, and gears to be placed on the pins. This vision system can locate parts within 3 microns, while the Cartesian manipulator can place the parts within 0.4 microns.

  15. Use Computer-Aided Tools to Parallelize Large CFD Applications

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Yan, J.

    2000-01-01

    Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.

  16. Integrated Task and Data Parallel Programming

    NASA Technical Reports Server (NTRS)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.

  17. Characterizations of parallel complexity classes

    SciTech Connect

    Venkateswaran, H.

    1986-01-01

    A new two-person pebble game that abstracts the control structure of many parallel algorithms is defined and studied. This game extends the two-person pebble game defined by Dymond and Tompa (JCSS, Vol. 30, no.2, 1985, pp. 149-161) in two ways: (a) the game is played on a Boolean circuit rather than on an unlabeled graph, and takes into consideration the types of the gates in the circuit, and (b) the two players' roles are completely symmetric. The new game is used to study the relationship between two natural parallel complexity classes, namely LOGCFL and AC/sup 1/. LOGCFL is the class of languages log space reducible to context-free languages. AC/sup 1/ is the class of languages accepted by an alternating Turning machine in space O(log n) and alternation depth O(log n). LOGCFL is a subclass of AC/sup 1/, but it is not known whether the inclusion is proper. For many problems in LOGCFL the algorithms that show their membership in that class also show their membership in AC/sup 1/. However, these algorithms do not use the full power of AC/sup 1/ computations. The two-person game defined here provides a model of computation in which this perceived difference can be quantified. This is done by characterizing the two classes using the same measures of resources in the game model.

  18. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  19. Parallel line analysis: multifunctional software for the biomedical sciences

    NASA Technical Reports Server (NTRS)

    Swank, P. R.; Lewis, M. L.; Damron, K. L.; Morrison, D. R.

    1990-01-01

    An easy to use, interactive FORTRAN program for analyzing the results of parallel line assays is described. The program is menu driven and consists of five major components: data entry, data editing, manual analysis, manual plotting, and automatic analysis and plotting. Data can be entered from the terminal or from previously created data files. The data editing portion of the program is used to inspect and modify data and to statistically identify outliers. The manual analysis component is used to test the assumptions necessary for parallel line assays using analysis of covariance techniques and to determine potency ratios with confidence limits. The manual plotting component provides a graphic display of the data on the terminal screen or on a standard line printer. The automatic portion runs through multiple analyses without operator input. Data may be saved in a special file to expedite input at a future time.

  20. Single-cell mechanics: the parallel plates technique.

    PubMed

    Bufi, Nathalie; Durand-Smet, Pauline; Asnacios, Atef

    2015-01-01

    We describe here the parallel plates technique which enables quantifying single-cell mechanics, either passive (cell deformability) or active (whole-cell traction forces). Based on the bending of glass microplates of calibrated stiffness, it is easy to implement on any microscope, and benefits from protocols and equipment already used in biology labs (coating of glass slides, pipette pullers, micromanipulators, etc.). We first present the principle of the technique, the design and calibration of the microplates, and various surface coatings corresponding to different cell-substrate interactions. Then we detail the specific cell preparation for the assays, and the different mechanical assays that can be carried out. Finally, we discuss the possible technical simplifications and the specificities of each mechanical protocol, as well as the possibility of extending the use of the parallel plates to investigate the mechanics of cell aggregates or tissues. PMID:25640430

  1. Parallel discrete event simulation: A shared memory approach

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1987-01-01

    With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

  2. Analysis of the numerical effects of parallelism on a parallel genetic algorithm

    SciTech Connect

    Hart, W.E.; Belew, R.K.; Kohn, S.; Baden, S.

    1995-09-18

    This paper examines the effects of relaxed synchronization on both the numerical and parallel efficiency of parallel genetic algorithms (GAs). We describe a coarse-grain geographically structured parallel genetic algorithm. Our experiments show that asynchronous versions of these algorithms have a lower run time than-synchronous GAs. Furthermore, we demonstrate that this improvement in performance is partly due to the fact that the numerical efficiency of the asynchronous genetic algorithm is better than the synchronous genetic algorithm. Our analysis includes a critique of the utility of traditional parallel performance measures for parallel GAs, and we evaluate the claims made by several researchers that parallel GAs can have superlinear speedup.

  3. Towards Distributed Memory Parallel Program Analysis

    SciTech Connect

    Quinlan, D; Barany, G; Panas, T

    2008-06-17

    This paper presents a parallel attribute evaluation for distributed memory parallel computer architectures where previously only shared memory parallel support for this technique has been developed. Attribute evaluation is a part of how attribute grammars are used for program analysis within modern compilers. Within this work, we have extended ROSE, a open compiler infrastructure, with a distributed memory parallel attribute evaluation mechanism to support user defined global program analysis required for some forms of security analysis which can not be addressed by a file by file view of large scale applications. As a result, user defined security analyses may now run in parallel without the user having to specify the way data is communicated between processors. The automation of communication enables an extensible open-source parallel program analysis infrastructure.

  4. A generic fine-grained parallel C

    NASA Technical Reports Server (NTRS)

    Hamet, L.; Dorband, John E.

    1988-01-01

    With the present availability of parallel processors of vastly different architectures, there is a need for a common language interface to multiple types of machines. The parallel C compiler, currently under development, is intended to be such a language. This language is based on the belief that an algorithm designed around fine-grained parallelism can be mapped relatively easily to different parallel architectures, since a large percentage of the parallelism has been identified. The compiler generates a FORTH-like machine-independent intermediate code. A machine-dependent translator will reside on each machine to generate the appropriate executable code, taking advantage of the particular architectures. The goal of this project is to allow a user to run the same program on such machines as the Massively Parallel Processor, the CRAY, the Connection Machine, and the CYBER 205 as well as serial machines such as VAXes, Macintoshes and Sun workstations.

  5. Design considerations for parallel graphics libraries

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  6. Linearly exact parallel closures for slab geometry

    SciTech Connect

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-15

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients)

  7. Parallel automated adaptive procedures for unstructured meshes

    NASA Technical Reports Server (NTRS)

    Shephard, M. S.; Flaherty, J. E.; Decougny, H. L.; Ozturan, C.; Bottasso, C. L.; Beall, M. W.

    1995-01-01

    Consideration is given to the techniques required to support adaptive analysis of automatically generated unstructured meshes on distributed memory MIMD parallel computers. The key areas of new development are focused on the support of effective parallel computations when the structure of the numerical discretization, the mesh, is evolving, and in fact constructed, during the computation. All the procedures presented operate in parallel on already distributed mesh information. Starting from a mesh definition in terms of a topological hierarchy, techniques to support the distribution, redistribution and communication among the mesh entities over the processors is given, and algorithms to dynamically balance processor workload based on the migration of mesh entities are given. A procedure to automatically generate meshes in parallel, starting from CAD geometric models, is given. Parallel procedures to enrich the mesh through local mesh modifications are also given. Finally, the combination of these techniques to produce a parallel automated finite element analysis procedure for rotorcraft aerodynamics calculations is discussed and demonstrated.

  8. Multipactor saturation in parallel-plate waveguides

    SciTech Connect

    Sorolla, E.; Mattes, M.

    2012-07-15

    The saturation stage of a multipactor discharge is considered of interest, since it can guide towards a criterion to assess the multipactor onset. The electron cloud under multipactor regime within a parallel-plate waveguide is modeled by a thin continuous distribution of charge and the equations of motion are calculated taking into account the space charge effects. The saturation is identified by the interaction of the electron cloud with its image charge. The stability of the electron population growth is analyzed and two mechanisms of saturation to explain the steady-state multipactor for voltages near above the threshold onset are identified. The impact energy in the collision against the metal plates decreases during the electron population growth due to the attraction of the electron sheet on the image through the initial plate. When this growth remains stable till the impact energy reaches the first cross-over point, the electron surface density tends to a constant value. When the stability is broken before reaching the first cross-over point the surface charge density oscillates chaotically bounded within a certain range. In this case, an expression to calculate the maximum electron surface charge density is found whose predictions agree with the simulations when the voltage is not too high.

  9. Algorithmically Specialized Parallel Architecture For Robotics

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1991-01-01

    Computing system called Robot Mathematics Processor (RMP) contains large number of processor elements (PE's) connected in various parallel and serial combinations reconfigurable via software. Special-purpose architecture designed for solving diverse computational problems in robot control, simulation, trajectory generation, workspace analysis, and like. System an MIMD-SIMD parallel architecture capable of exploiting parallelism in different forms and at several computational levels. Major advantage lies in design of cells, which provides flexibility and reconfigurability superior to previous SIMD processors.

  10. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.

  11. Virtual reality visualization of parallel molecular dynamics simulation

    SciTech Connect

    Disz, T.; Papka, M.; Stevens, R.; Pellegrino, M.; Taylor, V.

    1995-12-31

    When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy resealing events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.

  12. Conservation of writhe helicity under anti-parallel reconnection

    PubMed Central

    Laing, Christian E.; Ricca, Renzo L.; Sumners, De Witt L.

    2015-01-01

    Reconnection is a fundamental event in many areas of science, from the interaction of vortices in classical and quantum fluids, and magnetic flux tubes in magnetohydrodynamics and plasma physics, to the recombination in polymer physics and DNA biology. By using fundamental results in topological fluid mechanics, the helicity of a flux tube can be calculated in terms of writhe and twist contributions. Here we show that the writhe is conserved under anti-parallel reconnection. Hence, for a pair of interacting flux tubes of equal flux, if the twist of the reconnected tube is the sum of the original twists of the interacting tubes, then helicity is conserved during reconnection. Thus, any deviation from helicity conservation is entirely due to the intrinsic twist inserted or deleted locally at the reconnection site. This result has important implications for helicity and energy considerations in various physical contexts. PMID:25820408

  13. Device for balancing parallel strings

    DOEpatents

    Mashikian, Matthew S. (Storrs, CT)

    1985-01-01

    A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.

  14. Parallel Network Simulations with NEURON

    PubMed Central

    Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.

    2009-01-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488

  15. Hybrid Optimization Parallel Search PACKage

    SciTech Connect

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework provides a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.

  16. Hybrid Optimization Parallel Search PACKage

    Energy Science and Technology Software Center (ESTSC)

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework providesmore »a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less

  17. Parallelization of Apriori algorithm using Charm++ library

    NASA Astrophysics Data System (ADS)

    Pu?cian, Marek; Grabski, Waldemar

    2015-09-01

    This paper deals with the problem of adapting sequential frequent item sets mining algorithm to parallel processing. The original Bodon's Apriori algorithm has been partitioned into loosely coupled tasks and prepared to be executed on several computation nodes using Charm++ library. Variety of optimization methods have been proposed and successfully implemented in parallel environment. The work provides enhancements to achieve good efficiency during parallelization of existing solutions, e.g.: how to organize communication between tasks. The presented approach has been illustrated with many experiments and measurements performed on parallelized algorithm.

  18. Instrumentation for the development of parallel programs

    SciTech Connect

    Guarna, V.; Malony, A.

    1987-01-01

    Consider a parallel program composed of multiple tasks. These tasks execute independently except when they must synchronize to satisfy dependencies that exist in the program. The problem of instrumenting a multiple processor system for developing parallel programs is the need to identify, monitor and measure the activities of every program task, and to correlate this data with all program tasks. This need is present in all stages of the parallel program development cycle from debugging to performance analysis to optimization. This paper discusses the critical instrumentation issues for the development of parallel programs in the areas of performance analysis and debugging.

  19. A Parallel Multigrid Method for Neutronics Applications

    SciTech Connect

    Alcouffe, Raymond E.

    2001-01-01

    The multigrid method has been shown to be the most effective general method for solving the multi-dimensional diffusion equation encountered in neutronics. This being the method of choice, we develop a strategy for implementing the multigrid method on computers of massively parallel architecture. This leads us to strategies for parallelizing the relaxation, contraction (interpolation), and prolongation operators involved in the method. We then compare the efficiency of our parallel multigrid with other parallel methods for solving the diffusion equation on selected problems encountered in reactor physics.

  20. Applications of Parallel Processing to Astrodynamics

    NASA Astrophysics Data System (ADS)

    Coffey, S.; Healy, L.; Neal, H.

    1996-03-01

    Parallel processing is being used to improve the catalog of earth orbiting satellites and for problems associated with the catalog. Initial efforts centered around using SIMD parallel processors to perform debris conjunction analysis and satellite dynamics studies. More recently, the availability of cheap supercomputing processors and parallel processing software such as PVM have enabled the reutilization of existing astrodynamics software in distributed parallel processing environments, Computations once taking many days with traditional mainframes are now being performed in only a few hours. Efforts underway for the US Naval Space Command include conjunction prediction, uncorrelated target processing and a new space object catalog based on orbit determination and prediction with special perturbations methods.

  1. PARALLEL ALGORITHM DESIGN FOR BRANCH AND BOUND

    E-print Network

    Bader, David A.

    familiar with serial optimization. We present tech- niques to assist the porting of serial optimization parallel computers to problems such as weather and climate modeling, bioinformatics analysis, logistics

  2. VLSI complexity of parallel Fourier transform algorithms

    SciTech Connect

    Baradaran Seyed, T.

    1989-01-01

    Scope and method of study. The purpose of this study is to present a set of new parallel algorithms for discrete Fourier transform and compare the VLSI time and area complexity of the associated designs with the existing designs. The proposed parallel algorithms may be implemented easily in pipeline and mesh-connected parallel processing systems. Findings and conclusions. Several parallel algorithms have been proposed and associated cell layout for VLSI implementation have been presented. Comparative analysis shows that two of the designs presented by this study have better area-time performance than the existing designs in their architectural category.

  3. Parallel debugging using graphical views. Technical report

    SciTech Connect

    Bailey, M.; Socha, D.; Notkin, D.

    1988-03-01

    Graphical views are essential for debugging parallel programs because of the large quantity of state information contained in parallel programs. Voyeur, a prototype system for creating graphical views of parallel programs, provides a cost-effective way to construct such views for any parallel-programming system. We illustrate Voyeur by discussing four views created for debugging Poker programs. One is a general trace facility for any Poker program. The other three are tailored to display a specific type of algorithmic information. Each of these views has been instrumental in detecting bugs that would have been difficult to detect otherwise, yet were obvious with the views.

  4. Parallel auto-correlative statistics with VTK.

    SciTech Connect

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  5. A parallel algorithm for global routing

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall J.; Banerjee, Prithviraj

    1990-01-01

    A Parallel Hierarchical algorithm for Global Routing (PHIGURE) is presented. The router is based on the work of Burstein and Pelavin, but has many extensions for general global routing and parallel execution. Main features of the algorithm include structured hierarchical decomposition into separate independent tasks which are suitable for parallel execution and adaptive simplex solution for adding feedthroughs and adjusting channel heights for row-based layout. Alternative decomposition methods and the various levels of parallelism available in the algorithm are examined closely. The algorithm is described and results are presented for a shared-memory multiprocessor implementation.

  6. Parallel computing in enterprise modeling.

    SciTech Connect

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  7. The Uses and Abuses of the Acoustic Analogy in Helicopter Rotor Noise Prediction

    NASA Technical Reports Server (NTRS)

    Farassat, F.; Brentner, Kenneth S.

    1987-01-01

    This paper is theoretical in nature and addresses applications of the acoustic analogy in helicopter rotor noise prediction. It is argued that in many instances the acoustic analogy has not been used with care in rotor noise studies. By this it is meant that approximate or inappropriate formulations have been used. By considering various mechanisms of noise generation, such abuses are identified and the remedy is suggested. The mechanisms discussed are thickness, loading, quadrupole, and blade-vortex interaction noise. The quadrupole term of the Ffowcs Williams-Hawkings equation is written in a new form which separates the contributions of regions of high gradients such as shock surfaces. It is shown by order of magnitude studies that such regions are capable of producing noise with the same directivity as the thickness noise. The inclusion of this part of quadrupole sources in current acoustic codes is quite practical. Some of the difficulties with the use of loading noise formulations of the first author in predictions of blade-vortex interaction noise are discussed. It appears that there is a need for development of new theoretical results based on the acoustic analogy in this area. Because of the impulsive character of the blade surface pressure, a time scale of integration different from that used in loading and thickness computations must he used in a computer code for prediction of blade-vortex interaction noise.

  8. Tunable Josephson effect in hybrid parallel coupled double quantum dot-superconductor tunnel junction

    NASA Astrophysics Data System (ADS)

    Rajput, Gagan; Kumar, Rajendra; Ajay

    2014-09-01

    Using non-equilibrium Green's function approach, we study electronic transport through a parallel double quantum dot (DQD) system symmetrically coupled to conventional superconducting leads. Andreev bound states (ABS) and corresponding resonant Cooper pair electron transmission through such a DQD-superconductor tunnel junction around the Fermi energy, a manifestation of Josephson effect, occur due to proximity effect as a result of superconducting order parameter. Interdot tunnel coupling in parallel coupled DQD system and Coulomb interactions regulate the Josephson effect in a very significant manner. Further, it is also found that interdot tunnel coupling has reverse effect on ABS and Cooper pair tunneling in the presence and absence of Coulomb interactions.

  9. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  10. Parallel methods for the flight simulation model

    SciTech Connect

    Xiong, Wei Zhong; Swietlik, C.

    1994-06-01

    The Advanced Computer Applications Center (ACAC) has been involved in evaluating advanced parallel architecture computers and the applicability of these machines to computer simulation models. The advanced systems investigated include parallel machines with shared. memory and distributed architectures consisting of an eight processor Alliant FX/8, a twenty four processor sor Sequent Symmetry, Cray XMP, IBM RISC 6000 model 550, and the Intel Touchstone eight processor Gamma and 512 processor Delta machines. Since parallelizing a truly efficient application program for the parallel machine is a difficult task, the implementation for these machines in a realistic setting has been largely overlooked. The ACAC has developed considerable expertise in optimizing and parallelizing application models on a collection of advanced multiprocessor systems. One of aspect of such an application model is the Flight Simulation Model, which used a set of differential equations to describe the flight characteristics of a launched missile by means of a trajectory. The Flight Simulation Model was written in the FORTRAN language with approximately 29,000 lines of source code. Depending on the number of trajectories, the computation can require several hours to full day of CPU time on DEC/VAX 8650 system. There is an impetus to reduce the execution time and utilize the advanced parallel architecture computing environment available. ACAC researchers developed a parallel method that allows the Flight Simulation Model to be able to run in parallel on the multiprocessor system. For the benchmark data tested, the parallel Flight Simulation Model implemented on the Alliant FX/8 has achieved nearly linear speedup. In this paper, we describe a parallel method for the Flight Simulation Model. We believe the method presented in this paper provides a general concept for the design of parallel applications. This concept, in most cases, can be adapted to many other sequential application programs.

  11. Computing Degree of Parallelism for BPMN Processes Computing Degree of Parallelism

    E-print Network

    California at Santa Barbara, University of

    Document Archive (1 Day) (1 Day) (1 Day) (1 Day) (3 Days) (1 Day) #12;Dec 2011 Computing DegreeComputing Degree of Parallelism for BPMN Processes Dec 6 2011 Computing Degree of Parallelism 2011 Computing Degree of Parallelism for BPMN Processes 2 Find Peak Workload Demands · Apartment

  12. An Efficient Parallel Spatial Subdivision Algorithm for Object-Based Parallel Ray Tracing

    E-print Network

    Aykanat, Cevdet

    An Efficient Parallel Spatial Subdivision Algorithm for Object-Based Parallel Ray Tracing Cevdet University 06533 Ankara, Turkey Abstract Parallel ray tracing of complex scenes on multicom- puters requires years, research on ray tracing has been mostly concentrated on speeding up the algorithm by paral

  13. The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete; Saini, Subhash (Technical Monitor)

    1998-01-01

    Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

  14. Parallel Data Mining Team 2 Flash Coders

    E-print Network

    Kaminsky, Alan

    mining using parallel machine learning algorithms Application dimensions: Figure from BIG CPU, BIG DATA Algorithm · Parallel Algorithm · Reference Papers #12;Team Members #12;Overview Big data analytics - data do we need Big Data? Figure from Banko and Brill, 2001, p2. · Data-not the algorithm used- is what

  15. Massively Parallel Computing: A Sandia Perspective

    E-print Network

    Plimpton, Steve

    for these machines has been great. The reality is that progress has been slower than expected. Nevertheless by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address the parallel computing industry. 1 Sandia is a multiprogram laboratory operated by Sandia Corporation

  16. Parallel Programming for High-Performance Applications

    E-print Network

    Bal, Henri E.

    for many parallel applications #12;History · 1950s: first ideas (see Gill's quote) · 1967 first parallel processing for LOFAR telescope ­ Example: searching pulsars is extremely difficult, need to guess the direction, distance, and rotation frequency ­ Prepares for SKA telescope (2018) · 10-100 x global internet

  17. IBM Watson, Nov. 2008 1 Parallel Scheduling

    E-print Network

    Guestrin, Carlos

    IBM Watson, Nov. 2008 1 Parallel Scheduling Theory and Practice Guy Blelloch Carnegie Mellon University #12;IBM Watson, Nov. 2008 2 Parallel Languages User Scheduled MPI, Pthreads (typical usage) System. #12;IBM Watson, Nov. 2008 3 Example: Quicksort procedure QUICKSORT(S): if S contains at most one

  18. Parallel Education and Defining the Fourth Sector.

    ERIC Educational Resources Information Center

    Chessell, Diana

    1996-01-01

    Parallel to the primary, secondary, postsecondary, and adult/community education sectors is education not associated with formal programs--learning in arts and cultural sites. The emergence of cultural and educational tourism is an opportunity for adult/community education to define itself by extending lifelong learning opportunities into parallel

  19. Parallel Computing Strategies for Irregular Algorithms

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  20. Communication Characteristics in the NAS Parallel Benchmarks

    E-print Network

    Communication Characteristics in the NAS Parallel Benchmarks Ahmad Faraj Xin Yuan Department-- In this paper, we investigate the communication characteris- tics of the Message Passing Interface (MPI) implementation of the NAS parallel benchmarks and study the effectiveness of com- piled communication for MPI

  1. EPIC: E-field Parallel Imaging Correlator

    NASA Astrophysics Data System (ADS)

    Thyagarajan, Nithyanandan; Beardsley, Adam P.; Bowman, Judd D.; Morales, Miguel F.

    2015-11-01

    E-field Parallel Imaging Correlator (EPIC), a highly parallelized Object Oriented Python package, implements the Modular Optimal Frequency Fourier (MOFF) imaging technique. It also includes visibility-based imaging using the software holography technique and a simulator for generating electric fields from a sky model. EPIC can accept dual-polarization inputs and produce images of all four instrumental cross-polarizations.

  2. Divergence and Convergence in Enzyme Evolution: Parallel

    E-print Network

    Tawfik, Dan S.

    Divergence and Convergence in Enzyme Evolution: Parallel Evolution of Paraoxonases from Quorum 76100, Israel We discuss the basic features of divergent versus convergent evolution and of the common protein ancestors (9, 10). Divergent, Convergent, or Perhaps Parallel Evolution? Divergent evolution can

  3. IBM Parallel Environment for Linux Introduction

    E-print Network

    Hickman, Mark

    IBM Parallel Environment for Linux Introduction Version 4 Release 2 SA23-2218-00 #12;#12;IBM Parallel Environment for Linux Introduction Version 4 Release 2 SA23-2218-00 #12;Note Before using for Linux (product number 5724-N05) and to all subsequent releases and modifications until otherwise

  4. Asynchronous Parallel Prefix Computation Rajit Manohar \\Lambda

    E-print Network

    Manohar, Rajit

    to the parallel prefix problem that exploit this advantage of asynchronous circuits over their synchronousRVM36 ­1 Asynchronous Parallel Prefix Computation Rajit Manohar \\Lambda Department of Computer circuit to solve this problem with O(log n) latency and O(n log n) circuit size, with O

  5. Parallel algorithms for finding trigonometric sums

    SciTech Connect

    Stpiczynski, P.; Paprzycki, M.

    1995-12-01

    Parallel versions of Goertzel and Reinsch algorithms for finding trigonometric sums are introduced as a special case of effcient parallel algorithms for solving linear recurrence systems. The results of the experiments performed on a 20-processors Sequent Symmetry are presented and discussed.

  6. Parallel Activation in Bilingual Phonological Processing

    ERIC Educational Resources Information Center

    Lee, Su-Yeon

    2011-01-01

    In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…

  7. PARALLEL EVOLUTIONARY ALGORITHMS FOR UAV PATH PLANNING

    E-print Network

    PARALLEL EVOLUTIONARY ALGORITHMS FOR UAV PATH PLANNING Dong Jia Post-Doctoral Research Associate vehicles (UAVs). Premature convergence prevents evolutionary-based algorithms from reaching global optimal. To overcome this problem, this paper presents a framework of parallel evolutionary algorithms for UAV path

  8. Parallel Image Component Labeling with Watershed Transformation

    E-print Network

    Parallel Image Component Labeling with Watershed Transformation Alina N. Moga and Moncef Gabbouj; Abstract The parallel watershed transformation used in grayscale image seg­ mentation is reconsidered of the watershed transformation and to correctly delimit the extent of all connected components locally, on each

  9. AIAA 20031347 Parallel Computation of Wing Flutter

    E-print Network

    Liu, Feng

    AIAA 2003­1347 Parallel Computation of Wing Flutter with a Coupled Navier-Stokes/CSD Method M. Sadeghi, S. Yang, F. Liu Department of Mechanical and Aerospace Engineering University of California Alexander Bell Drive, Suite 500, Reston, VA 20191­4344 #12;Parallel Computation of Wing Flutter

  10. Cluster Parallel Loops Part I. Preliminaries

    E-print Network

    Kaminsky, Alan

    massively parallel bitcoin mining program in Chapter 14 still doesn't take full advantage of the cluster's parallel processing capabilities. Each bitcoin mining task uses all the cores on just one node. So on the 10-node tardis cluster, I have to mine 10 or more bitcoins to fully utilize all the cores

  11. 17 CFR 12.24 - Parallel proceedings.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 17 Commodity and Securities Exchanges 1 2011-04-01 2011-04-01 false Parallel proceedings. 12.24 Section 12.24 Commodity and Securities Exchanges COMMODITY FUTURES TRADING COMMISSION RULES RELATING TO REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of...

  12. Where parallel ...a geometric love story

    E-print Network

    Cavalieri, Renzo

    Cavalieri Where parallel lines meet... #12;A deus ex machina solution Let us beef up the plane by adding... #12;A deus ex machina solution Let us beef up the plane by adding a line (m-line) and a point (Mr. V a bizarro object pulled out of my hat. Renzo Cavalieri Where parallel lines meet... #12;A deus ex machina

  13. MULTIOBJECTIVE PARALLEL GENETIC ALGORITHM FOR WASTE MINIMIZATION

    EPA Science Inventory

    In this research we have developed an efficient multiobjective parallel genetic algorithm (MOPGA) for waste minimization problems. This MOPGA integrates PGAPack (Levine, 1996) and NSGA-II (Deb, 2000) with novel modifications. PGAPack is a master-slave parallel implementation of a...

  14. Semi-automatic process partitioning for parallel computation

    NASA Technical Reports Server (NTRS)

    Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John

    1988-01-01

    On current multiprocessor architectures one must carefully distribute data in memory in order to achieve high performance. Process partitioning is the operation of rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. A semi-automatic approach to process partitioning is considered in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting task system. This approach is illustrated with a picture processing example written in BLAZE, which is transformed into a task system maximizing locality of memory reference.

  15. penORNL: a parallel monte carlo photon and electron transport package using PENELOPE

    SciTech Connect

    Bekar, Kursat B.; Miller, Thomas Martin; Patton, Bruce W.; Weber, Charles F.

    2015-01-01

    The parallel Monte Carlo photon and electron transport code package penORNL was developed at Oak Ridge National Laboratory to enable advanced scanning electron microscope (SEM) simulations on high performance computing systems. This paper discusses the implementations, capabilities and parallel performance of the new code package. penORNL uses PENELOPE for its physics calculations and provides all available PENELOPE features to the users, as well as some new features including source definitions specifically developed for SEM simulations, a pulse-height tally capability for detailed simulations of gamma and x-ray detectors, and a modified interaction forcing mechanism to enable accurate energy deposition calculations. The parallel performance of penORNL was extensively tested with several model problems, and very good linear parallel scaling was observed with up to 512 processors. penORNL, along with its new features, will be available for SEM simulations upon completion of the new pulse-height tally implementation.

  16. The hypercluster: A parallel processing test-bed architecture for computational mechanics applications

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.

    1987-01-01

    The development of numerical methods and software tools for parallel processors can be aided through the use of a hardware test-bed. The test-bed architecture must be flexible enough to support investigations into architecture-algorithm interactions. One way to implement a test-bed is to use a commercial parallel processor. Unfortunately, most commercial parallel processors are fixed in their interconnection and/or processor architecture. In this paper, we describe a modified n cube architecture, called the hypercluster, which is a superset of many other processor and interconnection architectures. The hypercluster is intended to support research into parallel processing of computational fluid and structural mechanics problems which may require a number of different architectural configurations. An example of how a typical partial differential equation solution algorithm maps on to the hypercluster is given.

  17. Parallel numerical reservoir simulation: A feasibility study

    SciTech Connect

    Michielse, P.H.

    1994-12-31

    This paper discusses a feasibility study to implement a parallel reservoir simulator on parallel computers. The basis of this study is a reservoir simulator that models an injection-production mechanism. The simulator implements a multigrid solver for the elliptic part of the equations, and uses adaptive local grid refinement to rack moving fronts in the reservoir. The parallelization method is based on a domain decomposition method, which assigns the subdomains to the processors. In order to obtain a correct solution, communication across the internal boundaries between the subdomains is required. The implementation of the multigrid method imposes restrictions on the domain decomposition. Furthermore, the adaptive local grid refinement may cause the work load distribution over the processors to be out of balance. Hence, some load balancing technique is required to ensure parallel efficiency. This parallel efficiency is illustrated by experiments on a Convex MetaSeries system.

  18. National Combustion Code: Parallel Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

    2000-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.

  19. Differences Between Distributed and Parallel Systems

    SciTech Connect

    Brightwell, R.; Maccabe, A.B.; Rissen, R.

    1998-10-01

    Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.

  20. A parallel variable metric optimization algorithm

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.

    1973-01-01

    An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.

  1. Parallel hypergraph partitioning for scientific computing.

    SciTech Connect

    Heaphy, Robert; Devine, Karen Dragon; Catalyurek, Umit; Bisseling, Robert; Hendrickson, Bruce Alan; Boman, Erik Gunnar

    2005-07-01

    Graph partitioning is often used for load balancing in parallel computing, but it is known that hypergraph partitioning has several advantages. First, hypergraphs more accurately model communication volume, and second, they are more expressive and can better represent nonsymmetric problems. Hypergraph partitioning is particularly suited to parallel sparse matrix-vector multiplication, a common kernel in scientific computing. We present a parallel software package for hypergraph (and sparse matrix) partitioning developed at Sandia National Labs. The algorithm is a variation on multilevel partitioning. Our parallel implementation is novel in that it uses a two-dimensional data distribution among processors. We present empirical results that show our parallel implementation achieves good speedup on several large problems (up to 33 million nonzeros) with up to 64 processors on a Linux cluster.

  2. Implementation and performance of parallel Prolog interpreter

    SciTech Connect

    Wei, S.; Kale, L.V.; Balkrishna, R. . Dept. of Computer Science)

    1988-01-01

    In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

  3. Broadcasting a message in a parallel computer

    DOEpatents

    Berg, Jeremy E. (Rochester, MN); Faraj, Ahmad A. (Rochester, MN)

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  4. Parallel Universe, Dark Matter and Invisible Higgs Decays

    E-print Network

    Shreyashi Chakdar; Kirtiman Ghosh; S. Nandi

    2013-11-11

    The existence of the dark matter with amount about five times the ordinary matter is now well established experimentally. There are now many candidates for this dark matter. However, dark matter could be just like the ordinary matter in a parallel universe. If both universes are described by a non-abelian gauge symmetries, then there will be no kinetic mixing between the ordinary photon and the dark photon, and the dark proton, dark electron and the corresponding dark nuclei, belonging to the parallel universe, will be stable. If the strong coupling constant, $(\\alpha_s)_{dark}$ in the parallel universe is five times that of $\\alpha_s$, then the dark proton will be about five time heavier, explaining why the dark matter is five times the ordinary matter. However, the two sectors will still interact via the Higgs boson of the two sectors. This will lead to the existence of a second light Higss boson, just like the Standard Model Higgs boson. This gives rise to the invisible decay modes of the Higgs boson which can be tested at the LHC, and the proposed ILC.

  5. Airbreathing Propulsion System Analysis Using Multithreaded Parallel Processing

    NASA Technical Reports Server (NTRS)

    Schunk, Richard Gregory; Chung, T. J.; Rodriguez, Pete (Technical Monitor)

    2000-01-01

    In this paper, parallel processing is used to analyze the mixing, and combustion behavior of hypersonic flow. Preliminary work for a sonic transverse hydrogen jet injected from a slot into a Mach 4 airstream in a two-dimensional duct combustor has been completed [Moon and Chung, 1996]. Our aim is to extend this work to three-dimensional domain using multithreaded domain decomposition parallel processing based on the flowfield-dependent variation theory. Numerical simulations of chemically reacting flows are difficult because of the strong interactions between the turbulent hydrodynamic and chemical processes. The algorithm must provide an accurate representation of the flowfield, since unphysical flowfield calculations will lead to the faulty loss or creation of species mass fraction, or even premature ignition, which in turn alters the flowfield information. Another difficulty arises from the disparity in time scales between the flowfield and chemical reactions, which may require the use of finite rate chemistry. The situations are more complex when there is a disparity in length scales involved in turbulence. In order to cope with these complicated physical phenomena, it is our plan to utilize the flowfield-dependent variation theory mentioned above, facilitated by large eddy simulation. Undoubtedly, the proposed computation requires the most sophisticated computational strategies. The multithreaded domain decomposition parallel processing will be necessary in order to reduce both computational time and storage. Without special treatments involved in computer engineering, our attempt to analyze the airbreathing combustion appears to be difficult, if not impossible.

  6. Runtime system library for parallel finite difference models with nesting

    SciTech Connect

    Michalakes, J.

    1997-03-01

    RSL is a parallel run-time system library for implementing regular-grid models with nesting on distributed memory parallel computers. RSL provides support for automatically decomposing multiple model domains and for redistributing work between processors at run time for dynamic load balancing. A unique feature of RSL is that processor subdomains need not be rectangular patches; rather, grid points are independently allocated to processors, allowing more precisely balanced allocation of work to processors. Communication mechanisms are tailored to the application: RSL provides an efficient high-level stencil exchange operation for updating subdomain ghost areas and interdomain communication to support two-way interaction between nest levels. RSL also provides run-time support for local iteration over subdomains, global-local index translation, and distributed I/O from ordinary Fortran record-blocked data sets. The interface to RSL supports Fortran77 and Fortran90. RSL has been used to parallelize the NCAR/Penn State Mesoscale Model (MM5).

  7. BWR parallel flow recirculation system

    SciTech Connect

    Fennern, L.E.

    1992-01-21

    This patent describes a recirculation system for a boiling water reactor having a cylindrical shroud surrounding a reactor core and spaced radially inwardly from a pressure vessel to define an annular downcomer for channeling downwardly a recirculation reactor coolant into an inlet of the core disposed at a lower plenum of the vessel. It comprises an annular pump deck disposed in the downcomer and fixedly joined to the pressure vessel and the core shroud circumferentially spaced impeller-driven reactor internal pumps (RIPs) disposed in the downcomer and joined to the pump deck for pumping a first portion of the coolant in the downcomer downwardly through the pump deck and into the lower plenum as RIP discharge flow to the core inlet; and circumferentially spaced fluid-driven jet pumps (JPs) disposed in the downcomer and joined to the pump deck for pumping a second portion of the coolant in the downcomer downwardly through the pump deck and into the lower plenum as JP discharge flow to the core inlet in parallel flow with the RIP discharge flow.

  8. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1995-01-01

    The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.

  9. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  10. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

    PubMed Central

    2014-01-01

    Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926

  11. Non-Cartesian Parallel Imaging Reconstruction

    PubMed Central

    Wright, Katherine L.; Hamilton, Jesse I.; Griswold, Mark A.; Gulani, Vikas; Seiberlich, Nicole

    2014-01-01

    Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be employed to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the non-homogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian GRAPPA, and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. PMID:24408499

  12. Xyce parallel electronic simulator : users' guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

  13. Contrast coding in the electrosensory system: parallels with visual computation.

    PubMed

    Clarke, Stephen E; Longtin, André; Maler, Leonard

    2015-12-01

    To identify and interact with moving objects, including other members of the same species, an animal's nervous system must correctly interpret patterns of contrast in the physical signals (such as light or sound) that it receives from the environment. In weakly electric fish, the motion of objects in the environment and social interactions with other fish create complex patterns of contrast in the electric fields that they produce and detect. These contrast patterns can extend widely over space and time and represent a multitude of relevant features, as is also true for other sensory systems. Mounting evidence suggests that the computational principles underlying contrast coding in electrosensory neural networks are conserved elements of spatiotemporal processing that show strong parallels with the vertebrate visual system. PMID:26558527

  14. Knowledge representation into Ada parallel processing

    NASA Technical Reports Server (NTRS)

    Masotto, Tom; Babikyan, Carol; Harper, Richard

    1990-01-01

    The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.

  15. Distributed parallel messaging for multiprocessor systems

    DOEpatents

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  16. Parallelization of the Implicit RPLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Orkwis, Paul D.

    1994-01-01

    The multiblock reacting Navier-Stokes flow-solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.

  17. Parallelization of the Implicit RPLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Orkwis, Paul D.

    1997-01-01

    The multiblock reacting Navier-Stokes flow solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.

  18. On the parallel solution of parabolic equations

    NASA Technical Reports Server (NTRS)

    Gallopoulos, E.; Saad, Youcef

    1989-01-01

    Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.

  19. Language constructs for modular parallel programs

    SciTech Connect

    Foster, I.

    1996-03-01

    We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.

  20. Parallel Climate Analysis Toolkit (ParCAT)

    Energy Science and Technology Software Center (ESTSC)

    2013-06-30

    The parallel analysis toolkit (ParCAT) provides parallel statistical processing of large climate model simulation datasets. ParCAT provides parallel point-wise average calculations, frequency distributions, sum/differences of two datasets, and difference-of-average and average-of-difference for two datasets for arbitrary subsets of simulation time. ParCAT is a command-line utility that can be easily integrated in scripts or embedded in other application. ParCAT supports CMIP5 post-processed datasets as well as non-CMIP5 post-processed datasets. ParCAT reads and writes standard netCDF files.

  1. Massively parallel neurocomputing for aerospace applications

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Barhen, Jacob; Toomarian, Nikzad

    1993-01-01

    An innovative hybrid, analog-digital charge-domain technology, for the massively parallel VLSI implementation of certain large scale matrix-vector operations, has recently been introduced. It employs arrays of Charge Coupled/Charge Injection Device cells holding an analog matrix of charge, which process digital vectors in parallel by means of binary, non-destructive charge transfer operations. The impact of this technology on massively parallel processing is discussed. Fundamentally new classes of algorithms, specifically designed for this emerging technology, as applied to signal processing, are derived.

  2. An E cient Mapping Heuristic for Mesh-Connected Parallel Architectures Based

    E-print Network

    Aykanat, Cevdet

    : : : N ), which represents the atomic tasks of the parallel pro- gram.Vertex weight wi denotes task graph referred here as Task Interaction Graph (TIG). Vertices of this graph represent the atomic with weights which denote the relative computational costs of the respective tasks. Each edge denotes the need

  3. An Evaluation of Parallel Grid Construction for Ray Tracing Dynamic Scenes Thiago Ize Ingo Wald Chelsea Robertson Steven G. Parker

    E-print Network

    Wald, Ingo

    An Evaluation of Parallel Grid Construction for Ray Tracing Dynamic Scenes Thiago Ize Ingo Wald used for interactive ray tracing of dy- namic scenes. While doing so, we also analyze memory system interactive ray tracing of large multi-million triangle scenes. CR Categories: I.3.7 [Computer Graphics

  4. Parallel problem generation for structured problems in mathematical programming 

    E-print Network

    Qiang, Feng

    2015-11-26

    The aim of this research is to investigate parallel problem generation for structured optimization problems. The result of this research has produced a novel parallel model generator tool, namely the Parallel Structured ...

  5. A join algorithm for combining AND parallel solutions in AND/OR parallel systems

    SciTech Connect

    Ramkumar, B. ); Kale, L.V. )

    1992-02-01

    When two or more literals in the body of a Prolog clause are solved in (AND) parallel, their solutions need to be joined to compute solutions for the clause. This is often a difficult problem in parallel Prolog systems that exploit OR and independent AND parallelism in Prolog programs. In several AND/OR parallel systems proposed recently, this problem is side-stepped at the cost of unexploited OR parallelism in the program, in part due to the complexity of the backtracking algorithm beneath AND parallel branches. In some cases, the data dependency graphs used by these systems cannot represent all the exploitable independent AND parallelism known at compile time. In this paper, we describe the compile time analysis for an optimized join algorithm for supporting independent AND parallelism in logic programs efficiently without leaving and OR parallelism unexploited. We then discuss how this analysis can be used to yield very efficient runtime behavior. We also discuss problems associated with a tree representation of the search space when arbitrarily complex data dependency graphs are permitted. We describe how these problems can be resolved by mapping the search space onto data dependency graphs themselves. The algorithm has been implemented in a compiler for parallel Prolog based on the reduce-OR process model. The algorithm is suitable for the implementation of AND/OR systems on both shared and nonshared memory machines. Performance on benchmark programs.

  6. Circuit Optimization Using Efficient Parallel Pattern Search 

    E-print Network

    Narasimhan, Srinath S.

    2011-08-08

    evaluations and difficulty in getting explicit sensitivity information make these problems intractable to standard optimization methods. We propose to explore the recently developed asynchronous parallel pattern search (APPS) method for efficient driver size...

  7. Provably Efficient Adaptive Scheduling for Parallel Jobs

    E-print Network

    He, Yuxiong

    Scheduling competing jobs on multiprocessors has always been an important issue for parallel and distributed systems. The challenge is to ensure global, system-wide efficiency while offering a level of fairness to user ...

  8. Graphite : a parallel distributed simulator for multicores

    E-print Network

    Kasture, Harshad

    2010-01-01

    This thesis describes Graphite, a parallel, distributed simulator for simulating large-scale multicore architectures, and focuses particularly on the functional aspects of simulating a single, unmodified multi-threaded ...

  9. Graphite: A Distributed Parallel Simulator for Multicores

    E-print Network

    Beckmann, Nathan

    2009-11-09

    This paper introduces the open-source Graphite distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multicore processors containing dozens, hundreds, ...

  10. PARALLEL ADAPTIVE NUMERICAL SCHEMES FOR HYPERBOLIC ...

    E-print Network

    1910-50-80

    Assume, moreover, that Algorithm M is modified so that if an interval is smaller than .... processes executing identical copies of the above code in parallel. ..... Self-adjusting grid methods for one-dimensional hyperbolic conservation laws, ...

  11. A new approach to parallel SAT solvers

    E-print Network

    Nelson, Max (Max M.)

    2013-01-01

    We present a novel approach to solving SAT problems in parallel by partitioning the entire set of problem clauses into smaller pieces that can be solved by individual threads. We examine the complications that arise with ...

  12. Improved chopper circuit uses parallel transistors

    NASA Technical Reports Server (NTRS)

    1966-01-01

    Parallel transistor chopper circuit operates with one transistor in the forward mode and the other in the inverse mode. By using this method, it acts as a single, symmetrical, bidirectional transistor, and reduces and stabilizes the offset voltage.

  13. Feature Clustering for Accelerating Parallel Coordinate Descent

    SciTech Connect

    Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.

    2012-12-06

    We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

  14. Massively Parallel Computing: A Sandia Perspective

    SciTech Connect

    Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

    1999-05-06

    The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.

  15. Parallel processor programs in the Federal Government

    NASA Technical Reports Server (NTRS)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  16. Parallel programming with PCN. Revision 1

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  17. Parallel line scanning ophthalmoscope for retinal imaging.

    PubMed

    Vienola, Kari V; Damodaran, Mathi; Braaf, Boy; Vermeer, Koenraad A; de Boer, Johannes F

    2015-11-15

    A parallel line scanning ophthalmoscope (PLSO) is presented using a digital micromirror device (DMD) for parallel confocal line imaging of the retina. The posterior part of the eye is illuminated using up to seven parallel lines, which were projected at 100 Hz. The DMD offers a high degree of parallelism in illuminating the retina compared to traditional scanning laser ophthalmoscope systems utilizing scanning mirrors. The system operated at the shot-noise limit with a signal-to-noise ratio of 28 for an optical power measured at the cornea of 100 ?W. To demonstrate the imaging capabilities of the system, the macula and the optic nerve head of a healthy volunteer were imaged. Confocal images show good contrast and lateral resolution with a 10°×10° field of view. PMID:26565868

  18. NAS Parallel Benchmarks, Multi-Zone Versions

    NASA Technical Reports Server (NTRS)

    vanderWijngaart, Rob F.; Haopiang, Jin

    2003-01-01

    We describe an extension of the NAS Parallel Benchmarks (NPB) suite that involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy, which is common among structured-mesh production flow solver codes in use at NASA Ames and elsewhere, provides relatively easily exploitable coarse-grain parallelism between meshes. Since the individual application benchmarks also allow fine-grain parallelism themselves, this NPB extension, named NPB Multi-Zone (NPB-MZ), is a good candidate for testing hybrid and multi-level parallelization tools and strategies.

  19. FFT for the APE Parallel Computer

    E-print Network

    Thomas Lippert; Klaus Schilling; Federico Toschi; Sven Trentmann; Raffaele Tripiccione

    1997-10-16

    We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.

  20. Competitive Parallel Disk Prefetching and Buffer Management

    E-print Network

    Barve, Rakesh; Kallahalla, Mahesh; Varman, Peter J.; Vitter, Jeffrey Scott

    2000-01-01

    We provide a competitive analysis framework for online prefetching and buffer management algorithms in parallel I/O systems, using a read-once model of block references. This has widespread applicability to key I/O-bound applications...

  1. Nonlinear parameter estimation in parallel computing environments 

    E-print Network

    Li, Jie

    1996-01-01

    Paragon supercomputer. We use a two-dimensional permeability estimation problem as the example to test and demonstrate the usage of the parallel PEST. An existing simulator program called US3D, which solves the three-dimensional groudwater flow...

  2. A Communication Backend for Parallel Language Compilers

    E-print Network

    Shewchuk, Jonathan

    A Communication Backend for Parallel Language Compilers James M. Stichnoth and Thomas Gross Carnegie Mellon University Abstract. Generating good communication code is an important issue for all system usually implement the communication generation routines (e.g., message buffer packing

  3. Social Problems and Deviance: Some Parallel Issues

    ERIC Educational Resources Information Center

    Kitsuse, John I.; Spector, Malcolm

    1975-01-01

    Explores parallel developments in labeling theory and in the value conflict approach to social problems. Similarities in their critiques of functionalism and etiological theory as well as their emphasis on the definitional process are noted. (Author)

  4. Parallel Grobner Basis Computation using GASNet

    E-print Network

    California at Berkeley, University of

    Parallel Gr¨obner Basis Computation using GASNet Chris Wilkens cwilkens@cs.berkeley.edu The Gr and Applications, 1993. [DAC07] Donal O'Shea David A. Cox, John B. Little. Ideals, Varieties, and Algo- rithms

  5. Universal codes for parallel Gaussian channels

    E-print Network

    Modir Shanechi, Maryam

    2006-01-01

    In this thesis we study the design of universal codes for parallel Gaussian channels with 2 sub-channels present. We study the universality both in terms of the uncertainty in the relative quality of the two sub-channels ...

  6. Massive parallelism in the future of science

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    Massive parallelism appears in three domains of action of concern to scientists, where it produces collective action that is not possible from any individual agent's behavior. In the domain of data parallelism, computers comprising very large numbers of processing agents, one for each data item in the result will be designed. These agents collectively can solve problems thousands of times faster than current supercomputers. In the domain of distributed parallelism, computations comprising large numbers of resource attached to the world network will be designed. The network will support computations far beyond the power of any one machine. In the domain of people parallelism collaborations among large groups of scientists around the world who participate in projects that endure well past the sojourns of individuals within them will be designed. Computing and telecommunications technology will support the large, long projects that will characterize big science by the turn of the century. Scientists must become masters in these three domains during the coming decade.

  7. Parallel processing in finite element structural analysis

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K.

    1987-01-01

    A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).

  8. Parallel and Distributed Systems Speaker: Dick Epema

    E-print Network

    Kuzmanov, Georgi

    Alexandru Iosup Grids/Clouds P2P systems Big Data Online gaming Dick Epema Grids/Clouds e-Science P2P1 Parallel and Distributed Systems Speaker: Dick Epema Group: Alex Iosup, Johan Pouwelse, Henk Sips

  9. The PISCES 2 parallel programming environment

    NASA Technical Reports Server (NTRS)

    Pratt, Terrence W.

    1987-01-01

    PISCES 2 is a programming environment for scientific and engineering computations on MIMD parallel computers. It is currently implemented on a flexible FLEX/32 at NASA Langley, a 20 processor machine with both shared and local memories. The environment provides an extended Fortran for applications programming, a configuration environment for setting up a run on the parallel machine, and a run-time environment for monitoring and controlling program execution. This paper describes the overall design of the system and its implementation on the FLEX/32. Emphasis is placed on several novel aspects of the design: the use of a carefully defined virtual machine, programmer control of the mapping of virtual machine to actual hardware, forces for medium-granularity parallelism, and windows for parallel distribution of data. Some preliminary measurements of storage use are included.

  10. Parallel magnetic resonance imaging: characterization and comparison 

    E-print Network

    Rane, Swati Dnyandeo

    2005-11-01

    of the optimal method for parallel imaging depending on a particular imaging environment and scanning parameters. Simulations on real MR phased-array data show that SENSE and GRAPPA provide better image reconstructions when compared to the remaining techniques....

  11. Parallel algorithms for dynamically partitioning unstructured grids

    SciTech Connect

    Diniz, P.; Plimpton, S.; Hendrickson, B.; Leland, R.

    1994-10-01

    Grid partitioning is the method of choice for decomposing a wide variety of computational problems into naturally parallel pieces. In problems where computational load on the grid or the grid itself changes as the simulation progresses, the ability to repartition dynamically and in parallel is attractive for achieving higher performance. We describe three algorithms suitable for parallel dynamic load-balancing which attempt to partition unstructured grids so that computational load is balanced and communication is minimized. The execution time of algorithms and the quality of the partitions they generate are compared to results from serial partitioners for two large grids. The integration of the algorithms into a parallel particle simulation is also briefly discussed.

  12. Asynchronous parallel pattern search for nonlinear optimization

    SciTech Connect

    P. D. Hough; T. G. Kolda; V. J. Torczon

    2000-01-01

    Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems

  13. Parallel Implementation of the Discontinuous Galerkin Method

    NASA Technical Reports Server (NTRS)

    Baggag, Abdalkader; Atkins, Harold; Keyes, David

    1999-01-01

    This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.

  14. Translating Parallel Circumscription into Disjunctive Logic Programming

    E-print Network

    circumscription. There have been several attempts to embed parallel circumscription into disjunctive logic and test method. In this paper we discuss the approach presented in [7]. Contrarily to earlier approaches

  15. Parallel solution of triangular systems of equations

    NASA Technical Reports Server (NTRS)

    Romine, Charles H.; Ortega, James M.

    1988-01-01

    The solution on parallel computers of systems of equations Lx = b, where L is lower triangular, is considered. Some authors have suggested that it is difficult to solve such systems in parallel on message-passing, local-memory machines when L is stored by columns. It is shown here that this is not necessarily the case if the machine can accomplish fan-in communication with reasonable efficiency.

  16. Fast Parallel Computation Of Multibody Dynamics

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Kwan, Gregory L.; Bagherzadeh, Nader

    1996-01-01

    Constraint-force algorithm fast, efficient, parallel-computation algorithm for solving forward dynamics problem of multibody system like robot arm or vehicle. Solves problem in minimum time proportional to log(N) by use of optimal number of processors proportional to N, where N is number of dynamical degrees of freedom: in this sense, constraint-force algorithm both time-optimal and processor-optimal parallel-processing algorithm.

  17. Rotating the Plane of Parallel Light Beams

    NASA Technical Reports Server (NTRS)

    Orloff, K. L.; Yanagita, H.

    1982-01-01

    Rhomboid prism laterally displaces beam of light. Pairs of rhomboid prisms can rotate plane of two parallel beams of light and change spacing of beams. If each element of pair is mounted on independent motor-driven disk, angle of rotation of plane of beams can be varied over wide range. Among other uses, prism configurations can rotate plane of parallel laser beams used in laser velocimeter.

  18. On mesh rezoning algorithms for parallel platforms

    SciTech Connect

    Plaskacz, E.J.

    1995-07-01

    A mesh rezoning algorithm for finite element simulations in a parallel-distributed environment is described. The cornerstones of the algorithm are: the parallel computation of distortion norms on the element and subdomain level, the exchange of the individual subdomain norms to form a subdomain distortion vector, the classification of subdomains and the rezoning behavior prescribed within each subdomain as a response to its own classification and the classification of neighboring subdomains.

  19. EUROPA Parallel C++ Version 2.1

    E-print Network

    Caromel, Denis

    EUROPA Parallel C++ Version 2.1 The EUROPA Working Group on Parallel C++ Architecture SIG July 10@sophia.inria.fr), and the Europa group Challier Jean­Marc (challier@irit.fr), Dzwig Peter (P.E.Dzwig@lpac.ac.uk), Kaufman Richard.Roberts@cs.ucl.ac.uk), Winder Russel (R.Winder@dcs.kcl.ac.uk). Abstract This paper presents the deønition of EUROPA: a framework

  20. Computational electromagnetics and parallel dense matrix computations

    SciTech Connect

    Forsman, K.; Kettunen, L.; Gropp, W.

    1995-12-01

    We present computational results using CORAL, a parallel, three-dimensional, nonlinear magnetostatic code based on a volume integral equation formulation. A key feature of CORAL is the ability to solve, in parallel, the large, dense systems of linear equations that are inherent in the use of integral equation methods. Using the Chameleon and PSLES libraries ensures portability and access to the latest linear algebra solution technology.