Note: This page contains sample records for the topic parallel blade-vortex interaction from Science.gov.
While these samples are representative of the content of Science.gov,
they are not comprehensive nor are they the most current set.
We encourage you to perform a real-time search of Science.gov
to obtain the most current and comprehensive results. Last update: August 15, 2014.

An experimental and computational study was carried out to investigate the parallel head-on blade-vortexinteraction (BVI) and its noise generation mechanism. A shock tube, with an enlarged test section, was used to generate a compressible starting vortex which interacted with a target airfoil. The dual-pulsed holographic interferometry (DPHI) technique and airfoil surface pressure measurements were employed to obtain quantitative flow data during the BVI. A thin-layer Navier-Stokes code (BV12D), with a high-order upwind-biased scheme and a multizonal grid, was also used to simulate numerically the phenomena occurring in the head-on BVI. The detailed structure of a convecting vortex was studied through independent measurements of density and pressure distributions across the vortex center. Results indicate that, in a strong head-on BVI, the opposite pressure peaks are generated on both sides of the leading edge as the vortex approaches. Then, as soon as the vortex passes by the leading edge, the high-pressure peak suddenly moves toward the low-peak-reducing in magnitude as it moves--simultaneously giving rise to the initial sound wave. In both experiment and computation, it is shown that the viscous effect plays a significant role in head-on BVIs.

In this study parallelblade–vortexinteraction for a Schmidt-propeller configuration has been examined using particle image\\u000a velocimetry (PIV). This tandem configuration consists of a leading airfoil (forefoil), used to generate a vortical wake of\\u000a leading-edge vortices (LEVs) and trailing-edge vortices (TEVs) through a pitching or plunging motion, and a trailing airfoil\\u000a (hindfoil), held fixed with a specified angle of attack

Rotor impulsive noise is, of all the known sources of helicopter far-field noise radiation, the one which tends to dominate the acoustic spectrum generated by most helicopters. The helicopter generates a highly directional and rather unique form of impulsive noise which is thought to be generated by two source mechanisms: (1) high-speed impulsive noise due to formation of a shock on the advancing blade tip, and (2) Blade-VortexInteraction (BVI) noise in low-powered descending flight or maneuvers, especially during an approach to a landing. As an experimental approach, a shock tube was built to generate a compressible viscous vortex which was being convected at a constant velocity in a quasi-uniform subsonic or transonic stream. The vortex then interacted with a target airfoil (NACA 0012 shape). In order to measure quantitatively the flow field, a DPHI (Dual Pulsed Holographic Interferometry) technique was utilized for both flow visualization and density field measurements, and fast-response Kulite transducers were used to obtain pressure histories at the surfaces of both the test section and the airfoil. Secondly, as a numerical approach, the Thin-Layer Navier-Stokes equations were solved not only to simulate the experimental measurements, but also to develop the fundamental understanding of flow field and sound generating mechanisms and, furthermore, to understand the effect of several important parameters on the sound generation due to blade-vortexinteraction. The results indicate that the main noise generating mechanism in strong blade-vortexinteractions is the severe pressure fluctuation near the leading-edge, caused by rapid oscillation of the stagnation point and formation of a suction peak. In strong parallel BVI, a secondary vortex may be generated by separation on the lower surface if the original vortex is clockwise. In that case, the induced vortex has an opposite rotation. The generated noise level is strongly dependent on Mach number, miss distance, angle of attack, and vortex structure (circulation and core size). The near-field acoustic behavior also depends on the leading-edge shape and thickness of the airfoil rather than the chord length. Results also indicate that transpiration on the leading-edge can suppress pressure fluctuation near the leading-edge and reduce the amplitude of propagating noise.

Blade-vortexinteraction noises, sometimes referred to as 'blade slap', are avoided by increasing the absolute value of inflow to the rotor system of a rotorcraft. This is accomplished by creating a drag force which causes the angle of the tip-path plane of the rotor system to become more negative or more positive.

Blade–vortexinteraction noise-generated by helicopter main rotor blades is one of the most severe noise problems and is very important both in military applications and community acceptance of rotorcraft. Research over the decades has substantially improved physical understanding of noise-generating mechanisms, and various design concepts have been investigated to control noise radiation using advanced blade planform shapes and active blade

The impulsive noise associated with helicopter flight due to Blade-VortexInteraction, sometimes called blade slap is analyzed especially for the case of a close encounter of the blade-tip vortex with a following blade. Three parts of the phenomena are considered: the tip-vortex structure generated by the rotating blade, the unsteady pressure produced on the following blade during the interaction, and the acoustic radiation due to the unsteady pressure field. To simplify the problem, the analysis was confined to the situation where the vortex is aligned parallel to the blade span in which case the maximum acoustic pressure results. Acoustic radiation due to the interaction is analyzed in space-fixed coordinates and in the time domain with the unsteady pressure on the blade surface as the source of chordwise compact, but spanwise non-compact radiation. Maximum acoustic pressure is related to the vortex core size and Reynolds number which are in turn functions of the blade-tip aerodynamic parameters. Finally noise reduction and performance are considered.

Predictions of blade-vortexinteraction (BVI) noise, using blade airloads obtained from a coupled aerodynamic, structural, and acoustics methodology, are presented. This methodology uses an iterative, loosely-coupled trim strategy for exchanging information between the computational fluid dynamics (CFD) code OVERFLOW-2 and the computational structural dynamics (CSD) and rotorcraft comprehensive code CAMRAD-II. Results are compared to the HART-II rotor baseline conditions. It

Joon Lim; Mark Potsdam; Roger Strawn; Ben Sim; Tor Nygaard

The interaction of a vortical unsteady flow with structures is often encountered in engineering applications. Such flow structure\\u000a interactions (FSI) can be responsible for generating significant loads and can have many detrimental structural and acoustic\\u000a side effects, such as structural fatigue, radiated noise and even catastrophic results. Amongst the different types of FSI,\\u000a the parallelblade–vortexinteraction (BVI) is the

This distribution of the circumferential velocity of the vortex responsible for blade-vortexinteraction noise was measured using a rotating hot-wire rake synchronously meshed with a model helicopter rotor at the blade passage frequency. Simultaneous far-field acoustic data and blade differential pressure measurements were obtained. Results show that the shape of the measured far-field acoustic blade-vortexinteraction signature depends on the blade-vortexinteraction geometry. The experimental results are compared with the Widnall-Wolf model for blade-vortexinteraction noise.

This paper numerically examines possible alleviation of parallelblade-vortexinteraction (BVI), experienced by helicopters in low-speed descent, through the introduction of blade-to-blade dissimilarity. A four-bladed rotor with two sets of opposite blades is considered, and the radius of one set is reduced to 80% of the baseline radius. A free-wake analysis is developed for calculating the distorted wake geometry for

Helium bubble flow visualizations have been performed to study perpendicular interaction of a turbulent trailing vortex and a rectangular wing in the Virginia Tech Stability Tunnel. Many combinations of vortex strength, vortex-blade separation (Z(sub s)) and blade angle of attack were studied. Photographs of representative cases are presented. A range of phenomena were observed. For Z(sub s) greater than a few percent chord the vortex is deflected as it passes the blade under the influence of the local streamline curvature and its image in the blade. Initially the interaction appears to have no influence on the core. Downstream, however, the vortex core begins to diffuse and grow, presumably as a consequence of its interaction with the blade wake. The magnitude of these effects increases with reduction in Z(sub s). For Z(sub s) near zero the form of the interaction changes and becomes dependent on the vortex strength. For lower strengths the vortex appears to split into two filaments on the leading edge of the blade, one passing on the pressure and one passing on the suction side. At higher strengths the vortex bursts in the vicinity of the leading edge. In either case the core of its remnants then rapidly diffuse with distance downstream. Increase in Reynolds number did not qualitatively affect the flow apart from decreasing the amplitude of the small low-frequency wandering motions of the vortex. Changes in wing tip geometry and boundary layer trip had very little effect.

Helicopter blade-vortexinteraction noise is one of the most severe noise sources and is very important both in community annoyance and military detection. Research over the decades has substantially improved basic physical understanding of the mechanisms generating rotor blade-vortexinteraction noise and also of controlling techniques, particularly using active rotor control technology. This paper reviews active rotor control techniques currently

Yung H. Yu; Bernd Gmelin; Wolf Splettstoesser; Jean J. Philippe; Jean Prieur; Thomas F. Brooks

Several parameters of transonic blade-vortexinteractions (BVI) are being studied and some ideas for noise reduction are introduced and tested using numerical simulation. The model used is the two-dimensional high frequency transonic small disturbance equation with regions of distributed vorticity (VTRAN2 code). The far-field noise signals are obtained by using the Kirchhoff method with extends the numerical 2-D near-field aerodynamic results to the linear acoustic 3-D far-field. The BVI noise mechanisms are explained and the effects of vortex type and strength, and angle of attack are studied. Particularly, airfoil shape modifications which lead to noise reduction are investigated. The results presented are expected to be helpful for better understanding of the nature of the BVI noise and better blade design.

An experimental and computational study is carried out to investigate the dominant physical factors of 2D parallelblade-vortexinteraction (BVI) and its noise generation. A shock tube was used to generate a starting vortex which interacted with a target airfoil. Double-exposed holographic interferometry and airfoil surface pressure measurements were employed to obtain quantitative data during the BVI. As a numerical approach, thin-layer Navier-Stokes code, with a multizonal grid, was also used to resolve the phenomena occuring in the BVI, especially in the head-on collision case.

Transonic Blade-VortexInteractions (BVI) are simulated numerically and the noise mechanisms are investigated. The 2-D high frequency transonic small disturbance equation is solved numerically (VTRAN2 code). An Alternating Direction Implicit (ADI) scheme with monotone switches is used; viscous effects are included on the boundary and the vortex is simulated by the cloud-in-cell method. The Kirchoff method is used for the extension of the numerical 2-D near field aerodynamic results to the linear acoustic 3-D far field. The viscous effect (shock/boundary layer interaction) on BVI is investigated. The different types of shock motion are identified and compared. Two important disturbances with different directivity exist in the pressure signal and are believed to be related to the fluctuating lift and drag forces. Noise directivity for different cases is shown. The maximum radiation occurs at an angle between 60 and 90 deg below the horizontal for an airfoil fixed coordinate system and depends on the details of the airfoil shape. Different airfoil shapes are studied and classified according to the BVI noise produced.

During the HART-I data analysis, the need for comprehensive wake data was found including vortex creation and aging, and its re-development after blade-vortexinteraction. In October 2001, US Army AFDD, NASA Langley, German DLR, French ONERA and Dutch DNW performed the HART-II test as an international joint effort. The main objective was to focus on rotor wake measurement using a PIV technique along with the comprehensive data of blade deflections, airloads, and acoustics. Three prediction teams made preliminary correlation efforts with HART-II data: a joint US team of US Army AFDD and NASA Langley, German DLR, and French ONERA. The predicted results showed significant improvements over the HART-I predicted results, computed about several years ago, which indicated that there has been better understanding of complicated wake modeling in the comprehensive rotorcraft analysis. All three teams demonstrated satisfactory prediction capabilities, in general, though there were slight deviations of prediction accuracies for various disciplines.

A potential cause of helicopter impulsive noise, commonly called blade slap, is the unsteady lift fluctuation on a rotor blade due to interaction with the vortex trailed from another blade. The relationship between vortex structure and the intensity of the acoustic signal is investigated. The analysis is based on a theoretical model for blade/vortexinteraction. Unsteady lift on the blades due to blade/vortexinteraction is calculated using linear unsteady aerodynamic theory, and expressions are derived for the directivity, frequency spectrum, and transient signal of the radiated noise. An inviscid rollup model is used to calculate the velocity profile in the trailing vortex from the spanwise distribution of blade tip loading. A few cases of tip loading are investigated, and numerical results are presented for the unsteady lift and acoustic signal due to blade/vortexinteraction. The intensity of the acoustic signal is shown to be quite sensitive to changes in tip vortex structure.

Helicopter blade-vortexinteraction noise is one of the most severe noise sources and is very important both in community annoyance and military detection. Research over the decades has substantially improved basic physical understanding of the mechanisms generating rotor blade-vortexinteraction noise and also of controlling techniques, particularly using active rotor control technology. This paper reviews active rotor control techniques currently available for rotor bladevortexinteraction noise reduction, including higher harmonic pitch control, individual blade control, and on-blade control technologies. Basic physical mechanisms of each active control technique are reviewed in terms of noise reduction mechanism and controlling aerodynamic or structural parameters of a blade. Active rotor control techniques using smart structures/materials are discussed, including distributed smart actuators to induce local torsional or flapping deformations, Published by Elsevier Science Ltd.

Yu, Yung H.; Gmelin, Bernd; Splettstoesser, Wolf; Brooks, Thomas F.; Philippe, Jean J.; Prieur, Jean

An acoustic source localization scheme applicable to noncompact moving sources is developed and applied to the blade-vortexinteraction (BVI) noise data of a 40-percent scale BO-105 model rotor. A generalized rotor wake code is employed to predict possible VBI locations on the rotor disk and is found quite useful in interpreting the acoustic localization results. The highly varying directivity patterns

W. R. Splettstoesser; K. J. Schultz; Ruth M. Martin

A comparison of experimental acoustics data and computational predictions was performed for a helicopter rotor blade interacting with a parallel vortex. The experiment was designed to examine the aerodynamics and acoustics of parallelBlade-VortexInteraction (BVI) and was performed in the Ames Research Center (ARC) 80- by 120-Foot Subsonic Wind Tunnel. An independently generated vortex interacted with a small-scale, nonlifting helicopter rotor at the 180 deg azimuth angle to create the interaction in a controlled environment. Computational Fluid Dynamics (CFD) was used to calculate near-field pressure time histories. The CFD code, called Transonic Unsteady Rotor Navier-Stokes (TURNS), was used to make comparisons with the acoustic pressure measurement at two microphone locations and several test conditions. The test conditions examined included hover tip Mach numbers of 0.6 and 0.7, advance ratio of 0.2, positive and negative vortex rotation, and the vortex passing above and below the rotor blade by 0.25 rotor chords. The results show that the CFD qualitatively predicts the acoustic characteristics very well, but quantitatively overpredicts the peak-to-peak sound pressure level by 15 percent in most cases. There also exists a discrepancy in the phasing (about 4 deg) of the BVI event in some cases. Additional calculations were performed to examine the effects of vortex strength, thickness, time accuracy, and directionality. This study validates the TURNS code for prediction of near-field acoustic pressures of controlled parallel BVI.

Transonic perpendicular rotor blade-vortexinteraction (BVI) tests at Mach numbers ranging from 0.68 to 0.9 and Reynolds numbers (based on the airfoil chord) of 3.8-5.5 million were conducted in the UTA high- Reynolds number, transonic Ludwieg-tube wind tunnel. The scheme involved positioning a lifting wing (vortex generator) upstream of an instrumented NACA 0012 airfoil so that the trailing vortex interacted

Iraj M. Kalkhoran; Donald R. Wilsont; Donald D. Seatht

In the study reported here, blade-vortexinteraction noise was predicted using a simplified model of blade pressures measured on a one-seventh scale model AH-1\\/OLS main rotor. The methods used for the acoustic prediction are based on the acoustic analogy and have been developed by Nakamura (1981) and by Brentner, Nystrom, and Farassat (referred to as the WOPWOP method). The waveforms

Mahendra C. Joshi; Sandy R. Liu; Donald A. Boxwell

A study of the full-potential modeling of a blade-vortexinteraction was made. A primary goal of this study was to investigate the effectiveness of the various methods of modeling the vortex. The model problem restricts the interaction to that of an infinite wing with an infinite line vortex moving parallel to its leading edge. This problem provides a convenient testing ground for the various methods of modeling the vortex while retaining the essential physics of the full three-dimensional interaction. A full-potential algorithm specifically tailored to solve the blade-vortexinteraction (BVI) was developed to solve this problem. The basic algorithm was modified to include the effect of a vortex passing near the airfoil. Four different methods of modeling the vortex were used: (1) the angle-of-attack method, (2) the lifting-surface method, (3) the branch-cut method, and (4) the split-potential method. A side-by-side comparison of the four models was conducted. These comparisons included comparing generated velocity fields, a subcritical interaction, and a critical interaction. The subcritical and critical interactions are compared with experimentally generated results. The split-potential model was used to make a survey of some of the more critical parameters which affect the BVI.

The interaction of a vortical unsteady flow with structures is often encountered in engineering applications. Such flow structure interactions (FSI) can be responsible for generating significant loads and can have many detrimental structural and acoustic side effects, such as structural fatigue, radiated noise and even catastrophic results. Amongst the different types of FSI, the parallelblade-vortexinteraction (BVI) is the most common, often encountered in helicopters and propulsors. In this work, we report on the implementation of leading edge blowing (LEB) active flow control for successfully minimizing the parallel BVI. Our results show reduction of the airfoil vibrations up to 38% based on the root-mean-square of the vibration velocity amplitude. This technique is based on displacing an incident vortex using a jet issued from the leading edge of a sharp airfoil effectively increasing the stand-off distance of the vortex from the body. The effectiveness of the method was experimentally analyzed using time-resolved digital particle image velocimetry (TRDPIV) recorded at an 800 Hz rate, which is sufficient to resolve the spatio-temporal dynamics of the flow field and it was combined with simultaneous accelerometer measurements of the airfoil, which was free to oscillate in a direction perpendicular to the freestream. Analysis of the flow field spectra and a Proper Orthogonal Decomposition (POD) of the TRDPIV data of the temporally resolved planar flow fields indicate that the LEB effectively modified the flow field surrounding the airfoil and increased the convecting vortices stand-off distance for over half of the airfoil chord length. It is shown that LEB also causes a redistribution of the flow field spectral energy over a larger range of frequencies.

Models of both the advanced main rotor system and the standard or "baseline" UH-1 main rotor system were tested at one-quarter scale in the Langley 4- by 7-Meter (V/STOL) Tunnel using the general rotor model system. Tests were conducted over a range of descent angles which bracketed the blade-vortexinteraction phenomenon for a range of simulated forward speeds. The tunnel was operated in the open-throat configuration with acoustic treatment to improve the semi-anechoic characteristics of the test chamber. Acoustical data obtained for these two rotor systems operating at similar flight conditions are presented without analysis or discussion.

The impulsive nature of noise due to the interaction of a rotor blade with a tip vortex is studied. The time signature of this noise is calculated theoretically based on the measured blade surface pressure fluctuation of an operational load survey rotor in slow descending flight and is compared with the simultaneous microphone measurement. Particularly, the physical understanding of the characteristic features of a waveform is extensively studied in order to understand the generating mechanism and to identify the important parameters. The interaction trajectory of a tip vortex on an acoustic planform is shown to be a very important parameter for the impulsive shape of the noise. The unsteady nature of the pressure distribution at the very leading edge is also important to the pulse shape. The theoretical model using noncompact linear acoustics predicts the general shape of interaction impulse pretty well except for peak amplitude which requires more continuous pressure information along the span at the leading edge.

A method for simulating incompressible flows past airfoils and their wakes is described. Vorticity panels are used to represent the body, and vortex blobs (vortex points with their singularities removed) are used to represent the wake. The procedure can be applied to the simulation of completely attached flow past an oscillating airfoil. The rate at which vorticity is shed from the trailing edge of the airfoil into the wake is determined by simultaneously requiring the pressure along the upper and lower surface streamlines to approach the same value at the trailing edge and the circulation around both the airfoil and its wake to remain constant. The motion of the airfoils is discretized, and a vortex is shed from the trailing edge at each time step. The vortices are convected at the local velocity of fluid particles, a procedure that renders the pressure continuous in an inviscid fluid. When the vortices in the wake begin to separate they are split into more vortices, and when they begin to collect they are combined. The numerical simulation reveals that the wake, which is originally smooth, eventually coils, or wraps, around itself, primarily under the influence of the velocity it induces on itself, and forms regions of relatively concentrated vorticity. Although discrete vortices are used to represent the wake, the spatial density of the vortices is so high that the computed velocity profiles across a typical region of concentrated vorticity are quite smooth. Although the computed wake evolves in an entirely inviscid model of the flowfield, these profiles appear to have a viscous core. As an application, a simulation of the interaction between vorticity in the oncoming stream and a stationary airfoil is also discussed.

Dong, B. (Virginia Polytechnic Inst. and State Univ., Blacksburg, VA (United States). Dept. of Engineering Science and Mechanics); Mook, D.T.

ABSTRACT The interaction of a helicopter’s main rotor tip vortex with the tail rotor is an important source of noise and vibration, yet it is still poorly understood. An important limiting case is the orthogonal bladevortexinteraction (OBVI) where a three-dimensional vortex structure is cut by the tail rotor blade. It has been discovered that the blade unsteady surface

The directionality and strength of blade-vortexinteractions (BVI) is explained through the radiation cone concept. BVI acoustic radiation is primarily the result of two sound mechanisms: the tip effect, and the radiation cone effect. The radiation cone effect is a highly directional mechanism which results when a lift distribution moves supersonically with respect to the fluid. After a physical explanation of the BVI mechanisms, sample cases using translating and rotating blades interacting with a straight line vortex are shown. The radiation cone concept is then applied to specific rotorcraft cases where it helps to explain zones of intense sound pressure level found in experimental results for the XV-15 tiltrotor and for a BO-105 helicopter scale model.

Ringler, Todd D.; George, Albert R.; Steele, James B.

During the Higher Harmonic Control Aeroacoustic Rotor Test, extensive measurements of the rotor aerodynamics, the far-field acoustics, the wake geometry, and the blade motion for powered, descent, flight conditions were made. These measurements have been used to validate and improve the prediction of blade-vortexinteraction (BVI) noise. The improvements made to the BVI modeling after the evaluation of the test data are discussed. The effects of these improvements on the acoustic-pressure predictions are shown. These improvements include restructuring the wake, modifying the core size, incorporating the measured blade motion into the calculations, and attempting to improve the dynamic blade response. A comparison of four different implementations of the Ffowcs Williams and Hawkings equation is presented. A common set of aerodynamic input has been used for this comparison.

Gallman, Judith M.; Tung, Chee; Schultz, Klaus J.; Splettstoesser, Wolf; Buchholz, Heino

In the study reported here, blade-vortexinteraction noise was predicted using a simplified model of blade pressures measured on a one-seventh scale model AH-1/OLS main rotor. The methods used for the acoustic prediction are based on the acoustic analogy and have been developed by Nakamura (1981) and by Brentner, Nystrom, and Farassat (referred to as the WOPWOP method). The waveforms predicted by the two methods are in good agreement with each other and with the measurements in terms of the number of pulses, the pulse widths, and the separation times between the pulses. The peak amplitude of the dominant pulse may, however, be underpredicted by up to 40 percent, depending on flight conditions. Ways of improving the accuracy of the prediction methods are suggested.

Joshi, Mahendra C.; Liu, Sandy R.; Boxwell, Donald A.

In the study reported here, blade-vortexinteraction noise was predicted using a simplified model of blade pressures measured on a one-seventh scale model AH-1/OLS main rotor. The methods used for the acoustic prediction are based on the acoustic analogy and have been developed by Nakamura (1981) and by Brentner, Nystrom, and Farassat (referred to as the WOPWOP method). The waveforms predicted by the two methods are in good agreement with each other and with the measurements in terms of the number of pulses, the pulse widths, and the separation times between the pulses. The peak amplitude of the dominant pulse may, however, be underpredicted by up to 40 percent, depending on flight conditions. Ways of improving the accuracy of the prediction methods are suggested.

Joshi, Mahendra C.; Liu, Sandy R.; Boxwell, Donald A.

Predictions of blade-vortexinteraction (BVI) noise, using blade airloads obtained from a coupled aerodynamic and structural methodology, are presented. This methodology uses an iterative, loosely-coupled trim strategy to cycle information between the OVERFLOW-2 (CFD) and CAMRAD-II (CSD) codes. Results are compared to the HART-II baseline, minimum noise and minimum vibration conditions. It is shown that this CFD/CSD state-of-the-art approach is able to capture blade airload and noise radiation characteristics associated with BVI. With the exception of the HART-II minimum noise condition, predicted advancing and retreating side BVI for the baseline and minimum vibration conditions agrees favorably with measured data. Although the BVI airloads and noise amplitudes are generally under-predicted, this CFD/CSD methodology provides an overall noteworthy improvement over the lifting line aerodynamics and free-wake models typically used in CSD comprehensive analysis codes.

The generation of helicopter noise by blade-vortexinteractions during descent under impulsive conditions is investigated analytically. A noise-prediction technique is developed on the basis of the dipole source term of the Ffowcs-Williams/Hawkings equation and applied to data from simultaneous blade-pressure and acoustic measurements obtained by Cowan et al. (1986) on a 10-ft-diameter 4-blade rotor model in a wind tunnel. Preliminary results show that input-blade-airload azimuth resolution of 1 deg or better and computational azimuth step size of 2 deg or less are required to achieve good agreement between predicted and recorded acoustic time histories. The need for more sophisticated methods to model chordwise input data and for a more extensive experimental data base is indicated.

A parametric study of model helicopter rotor blade slap due to bladevortexinteraction (BVI) was conducted in a 5 by 7.5-foot anechoic wind tunnel using model helicopter rotors with two, three, and four blades. The results were compared with a previously developed Mach number scaling theory. Three- and four-bladed rotor configurations were found to show very good agreement with the Mach number to the sixth power law for all conditions tested. A reduction of conditions for which BVI blade slap is detected was observed for three-bladed rotors when compared to the two-bladed baseline. The advance ratio boundaries of the four-bladed rotor exhibited an angular dependence not present for the two-bladed configuration. The upper limits for the advance ratio boundaries of the four-bladed rotors increased with increasing rotational speed.

The effect of the porous leading-edge of an airfoil on the blade-vortexinteraction noise, which dominates far-field acoustic spectrum of the helicopter, is investigated. The thin-layer Navier-Stokes equations are solved with a high-order upwind-biased scheme and a multizonal grid system. The Baldwin-Lomax turbulence model is modified for considering transpiration on the surface. The amplitudes of the propagating acoustic wave in the near-field are calculated directly from the computation. The porosity effect on the surface is modeled. Results show leading-edge transpiration can suppress pressure fluctuations at the leading-edge during BVI, and consequently reduce the amplitude of propagating noise by 30 percent at maximum in the near-field. The effect of porosity factor on the noise level is also investigated.

A wind tunnel experiment simulating a steady three-dimensional helicopter rotor blade/vortexinteraction is reported. The experimental configuration consisted of a vertical semispan vortex-generating wing, mounted upstream of a horizontal semispan rotor blade airfoil. A three-dimensional laser velocimeter was used to measure the velocity field in the region of the blade. Sectional lift coefficients were calculated by integrating the velocity field to obtain the bound vorticity. Total lift values, obtained by using an internal strain-gauge balance, verified the laser velocimeter data. Parametric variations of vortex strength, rotor blade angle of attack, and vortex position relative to the rotor blade were explored. These data are reported (with attention to experimental limitations) to provide a dataset for the validation of analytical work.

This study focuses on detection and analysis methods of helicopter blade-vortexinteractions (BVI) and applies these methods to two different BVI noise alleviation schemes---an adaptive-passive and an active scheme. A standard free-wake analysis based on relaxation methods is extended in this study to compute high-resolution blade loading, to account for blade-to-blade dissimilarities, and dual vortices when there is negative loading at the blade tips. The free-wake geometry is still calculated on a coarse azimuthal grid and then interpolated to a high-resolution grid to calculate the BVI induced impulsive loading. Blade-to-blade dissimilarities are accounted by allowing the different blades to release their own vortices. A number of BVI detection criteria, including the spherical method (a geometric criterion developed in this thesis) are critically examined. It was determined that high-resolution azimuthal discretization is required in virtually all detection methods except the spherical method which detected the occurrence of parallel BVI even while using a low-resolution azimuthal mesh. Detection methods based on inflow and blade loads were, in addition, found to be sensitive to vortex core size. While most BVI studies use the high-resolution airloads to compute BVI noise, the total noise can often be due to multiple dominant interactions on the advancing and retreating sides. A methodology is developed to evaluate the contribution of an individual interaction to the total BVI noise, based on using the loading due to an individual vortex as an input to the acoustic code WOPWOP. The adaptive-passive BVI alleviation method considered in this study comprises of reducing the length of one set of opposite blades (of a 4-bladed rotor) in low-speed descent. Results showed that differential coning resulting from the blade dissimilarity increases the blade-vortex miss-distances and reduces the BVI noise by 4 dB. The Higher Harmonic Control Aeroacoustic Rotor Test (HART) has been studied as an active method for BVI noise alleviation. Good validation of a baseline case without Higher Harmonic Control (HHC) is obtained. However the present analysis is unable to capture all the features of two specific HHC pitch input schedules examined. Some partial insight on the mechanisms at work is provided.

This study focuses on detection and analysis methods of helicopter blade-vortexinteractions (BVI) and applies these methods to two different BVI noise alleviation schemes---an adaptive-passive and an active scheme. A standard free-wake analysis based on relaxation methods is extended in this study to compute high-resolution blade loading, to account for blade-to-blade dissimilarities, and dual vortices when there is negative loading

Acoustic data are presented from a 40 percent scale model of the 4-bladed BO-105 helicopter main rotor, measured in the large European aeroacoustic wind tunnel, the DNW. Rotor blade-vortexinteraction (BVI) noise data in the low speed flight range were acquired using a traversing in-flow microphone array. The experimental apparatus, testing procedures, calibration results, and experimental objectives are fully described. A large representative set of averaged acoustic signals is presented.

Martin, Ruth M.; Splettstoesser, W. R.; Elliott, J. W.; Schultz, K.-J.

Transonic blade-vortexinteractions (BVI) are simulated numerically and the noise mechanisms are investigated. The two-dimensional high frequency transonic small disturbance equation is solved numerically (VTRAN2 code). An ADI scheme with monotone switches is used; viscous effects are included on the boundary, and the vortex is simulated by the cloud in cell method. The Kirchhoff method is used for the extension of the numerical two-dimensional near-field aerodynamic results to the linear acoustic three dimensional far field. The viscous effects (shock/boundary layer interactions) on BVI is investigated. The different types of shock motion are identified and compared. Two important disturbances with different directivity exist in the pressure signal and are believed to be related to the fluctuating lift and drag forces. Noise directivity for different cases is shown. The maximum radiation occurs at an angle between 60 and 90 degrees below the horizontal for an airfoil-fixed coordinate system and depends on the details of the airfoil shape. Different airfoil shapes are studied and classified according to the BVI noise produced.

An acoustics test using an aeroelastically scaled rotor was conducted to examine the effectiveness of higher harmonic blade pitch control for the reduction of impulsive blade-vortexinteraction (BVI) noise. A four-bladed, 110 in. diameter, articulated rotor model was tested in a heavy gas (Freon-12) medium in Langley's Transonic Dynamics Tunnel. Noise and vibration measurements were made for a range of matched flight conditions, where prescribed (open-loop) higher harmonic pitch was superimposed on the normal (baseline) collective and cyclic trim pitch. For the inflow-microphone noise measurements, advantage was taken of the reverberance in the hard walled tunnel by using a sound power determination approach. Initial findings from on-line data processing for three of the test microphones are reported for a 4/rev (4P) collective pitch control for a range of input amplitudes and phases. By comparing these results to corresponding baseline (no control) conditions, significant noise reductions (4 to 5 dB) were found for low-speed descent conditions, where helicopter BVI noise is most intense. For other rotor flight conditions, the overall noise was found to increase. All cases show increased vibration levels.

Brooks, Thomas F.; Booth, Earl R., Jr.; Jolly, J. Ralph, Jr.; Yeager, William T., Jr.; Wilbur, Matthew L.

An acoustics test using an aeroelastically scaled rotor was conducted to examine the effectiveness of higher harmonic blade pitch control for the reduction of impulsive blade-vortexinteraction (BVI) noise. A four-bladed, 110 in. diameter, articulated rotor model was tested in a heavy gas (Freon-12) medium in Langley's Transonic Dynamics Tunnel. Noise and vibration measurements were made for a range of matched flight conditions, where prescribed (open-loop) higher harmonic pitch was superimposed on the normal (baseline) collective and cyclic trim pitch. For the inflow-microphone noise measurements, advantage was taken of the reverberance in the hard walled tunnel by using a sound power determination approach. Initial findings from on-line data processing for three of the test microphones are reported for a 4/rev (4P) collective pitch control for a range of input amplitudes and phases. By comparing these results to corresponding baseline (no control) conditions, significant noise reductions (4 to 5 dB) were found for low-speed descent conditions, where helicopter BVI noise is most intense. For other rotor flight conditions, the overall noise was found to increase. All cases show increased vibration levels.

Brooks, Thomas F.; Booth, Earl R., Jr.; Jolly, J. Ralph, Jr.; Yeager, William T., Jr.; Wilbur, Matthew L.

The use of higher harmonic control (HHC) of blade pitch to reduce blade-vortexinteraction (BVI) noise is examined by means of a rotor acoustic test. A dynamically scaled, four-bladed, articulated rotor model was tested in a heavy gas (Freon-12) medium. Acoustic and vibration measurements were made for a large range of matched flight conditions where prescribed (open loop) HHC pitch schedules were superimposed on the normal (baseline) collective and cyclic trim pitch. A novel sound power measurement technique was developed to take advantage of the reverberance in the hard walled tunnel. Quantitative sound power results are presented for a 4/rev (4P) collective pitch HHC. By comparing the results using 4P HHC to corresponding baseline (no HHC) conditions, significant midfrequency noise reductions of 5-6 dB are found for low-speed descent conditions where BVI is most intense. For other flight conditions, noise is found to increase with the use of HHC. LF loading noise, as well as fixed and rotating frame vibration levels, show increased levels.

Bladevortexinteraction (BVI) noise has been recognized as the primary determinant of the helicopter's far field acoustic signature. Given the limitations of design in eliminating this dynamic phenomenon, there exists a need for control. In this paper, we present the application, first of feedback control strategies, and then of adaptive cancellation of Leishman and Hariharan's linear aerodynamic model of a trailing edge flap. Lift fluctuations caused by vortices are taken as output disturbance. The contribution of the vortices to lift is obtained from Leishman's indicial model for gusts. The use of an active structure for actuation is assumed, and the actuator is approximated as a lag element. To design an adaptive cancellation scheme that is applicable not only to BVI but also to general problems with periodic disturbances, we start with the sensitivity method but arrive at the same scheme derived by Sacks, Bodson, and Khosla who introduced a phase advance into a pseudo-gradient scheme. We discuss stability of the scheme via averaging.

Acoustic data taken in the anechoic Deutsch-Niederlaendischer Windkanal (DNW) have documented the bladevortexinteraction (BVI) impulsive noise radiated from a 1/7-scale model main rotor of the AH-1 series helicopter. Averaged model scale data were compared with averaged full scale, inflight acoustic data under similar nondimensional test conditions. At low advance ratios (mu = 0.164 to 0.194), the data scale remarkable well in level and waveform shape, and also duplicate the directivity pattern of BVI impulsive noise. At moderate advance ratios (mu = 0.224 to 0.270), the scaling deteriorates, suggesting that the model scale rotor is not adequately simulating the full scale BVI noise; presently, no proved explanation of this discrepancy exists. Carefully performed parametric variations over a complete matrix of testing conditions have shown that all of the four governing nondimensional parameters - tip Mach number at hover, advance ratio, local inflow ratio, and thrust coefficient - are highly sensitive to BVI noise radiation.

Splettstoesser, W. R.; Schultz, K. J.; Boxwell, D. A.; Schmitz, F. H.

An investigation of the flowfield chracteristics around a rotor blade during a blade-vortexinteraction (BVI) was conducted at the NASA Langley Research Center by the Army's Aeroperformance Division and the Boeing Defense and Space Group, Helicopter Division, during a wind-tunnel test in the 14 by 22-foot Subsonic Tunnel. A two-component laser velocimeter was used to measure the blade pressure during a BVI. This paper presents velocity measurements that indicate the presence of a vortex in the streamlines and vectors of the induced velocity, when studied in conjunction with the blade surface pressures, indicate how the flowfield is behaving during a BVI. The following conclusions can be made from this investigation: (1) The streamlines and vectors of the induced velocity, when studied in conjunction with the blade surface pressures, indiacte how the flowfield is behaving during a BVI. The blade approaches and intersects a vortex, and the vortex slides beneath the blade. (2) The data provide detailed flowfield information for validating computational predictions of BVI and also for evaluating and improving current wake models. Among the options investigated, only the free-wake calculation by TECH-01 indicated any BVI activity in the first quadrant.

Gorton, Susan Althoff; Poling, David R.; Dadone, Leo

Acoustic data are presented from a 40 percent scale model of the four-bladed BO-105 helicopter main rotor, tested in a large aerodynamic wind tunnel. Rotor blade-vortexinteraction (BVI) noise data in the low-speed flight range were acquired using a traversing in-flow microphone array. Acoustic results presented are used to assess the acoustic far field of BVI noise, to map the directivity and temporal characteristics of BVI impulsive noise, and to show the existence of retreating-side BVI signals. The characterics of the acoustic radiation patterns, which can often be strongly focused, are found to be very dependent on rotor operating condition. The acoustic signals exhibit multiple blade-vortexinteractions per blade with broad impulsive content at lower speeds, while at higher speeds, they exhibit fewer interactions per blade, with much sharper, higher amplitude acoustic signals. Moderate-amplitude BVI acoustic signals measured under the aft retreating quadrant of the rotor are shown to originate from the retreating side of the rotor.

Martin, R. M.; Splettstoesser, W. R.; Elliott, J. W.; Schultz, K.-J.

The analysis of rotorcraft aerodynamics and acoustics is a challenging problem, primarily due to the fact that a rotorcraft continually flies through its own wake. The generation mechanism for a rotorcraft wake, which is dominated by strong, concentrated blade-tip trailing vortices, is similar to that in fixed wing aerodynamics. However, following blades encounter shed vortices from previous blades before they are swept downstream, resulting in sharp, impulsive loading on the blades. The blade/wake encounter, known as Blade-VortexInteraction, or BVI, is responsible for a significant amount of vibratory loading and the characteristic rotorcraft acoustic signature in certain flight regimes. The present work addressed three different aspects of this interaction at a fundamental level. First, an analytical model for the prediction of trailing vortex structure is discussed. The model as presented is the culmination of a lengthy research effort to isolate the key physical mechanisms which govern vortex sheet rollup. Based on the Betz model, properties of the flow such as mass flux, axial momentum flux, and axial flux of angular momentum are conserved on either a differential or integral basis during the rollup process. The formation of a viscous central core was facilitated by the assumption of a turbulent mixing process with final vortex velocity profiles chosen to be consistent with a rotational flow mixing model and experimental observation. A general derivation of the method is outlined, followed by a comparison of model predictions with experimental vortex measurements, and finally a viscous blade drag model to account for additional effects of aerodynamic drag on vortex structure. The second phase of this program involved the development of a new formulation of lifting surface theory with the ultimate goal of an accurate, reduced order hybrid analytical/numerical model for fast rotorcraft load calculations. Currently, accurate rotorcraft airload analyses are limited by the massive computational power required to capture the small time scale events associated with BVI. This problem has two primary facets: accurate knowledge of the wake geometry, and accurate resolution of the impulsive loading imposed by a tip vortex on a blade. The present work addressed the second facet, providing a mathematical framework for solving the impulsive loading problem analytically, then asymptotically matching this solution to a low-resolution numerical calculation. A method was developed which uses continuous sheets of integrated boundary elements to model the lifting surface and wake. Special elements were developed to capture local behavior in high-gradient regions of the flow, thereby reducing the burden placed on the surrounding numerical method. Unsteady calculations for several classical cases were made in both frequency and time domain to demonstrate the performance of the method. Finally, a new unsteady, compressible boundary element method was applied to the problem of BVI acoustic radiation prediction. This numerical method, combined with the viscous core trailing vortex model, was used to duplicate the geometry and flight configuration of a detailed experimental BVI study carried out at NASA Ames Research Center. Blade surface pressure and near- and far-field acoustic radiation calculations were made. All calculations were shown to compare favorably with experimentally measured values. The linear boundary element method with non-linear corrections proved sufficient over most of the rotor azimuth, and particular in the region of the bladevortexinteraction, suggesting that full non-linear CFD schemes are not necessary for rotorcraft noise prediction.

Blade-Vortex-Interaction (BVI) produces annoying high-intensity impulsive noise. NASA Ames collected several sets of BVI noise data during in-flight and wind tunnel tests. The goal of this work is to extract the essential features of the BVI signals from the in-flight data and examine the feasibility of extracting those features from BVI noise recorded inside a large wind tunnel. BVI noise generating mechanisms and BVI radiation patterns an are considered and a simple mathematical-physical model is presented. It allows the construction of simple synthetic BVI events that are comparable to free flight data. The boundary effects of the wind tunnel floor and ceiling are identified and more complex synthetic BVI events are constructed to account for features observed in the wind tunnel data. It is demonstrated that improved recording of BVI events can be attained by changing the geometry of the rotor hub, floor, ceiling and microphone. The Euclidean distance measure is used to align BVI events from each blade and improved BVI signals are obtained by time-domain averaging the aligned data. The differences between BVI events for individual blades are then apparent. Removal of wind tunnel background noise by optimal Wiener-filtering is shown to be effective provided representative noise-only data have been recorded. Elimination of wind tunnel reflections by cepstral and optimal filtering deconvolution is examined. It is seen that the cepstral method is not applicable but that a pragmatic optimal filtering approach gives encouraging results. Recommendations for further work include: altering measurement geometry, real-time data observation and evaluation, examining reflection signals (particularly those from the ceiling) and performing further analysis of expected BVI signals for flight conditions of interest so that microphone placement can be optimized for each condition.

The use of a bladevortexinteraction noise prediction scheme, based on CAMRAD/JA, FPR and RAPP, quantifies the effects of errors and assumptions in the modeling of the helicopter's shed vortex on the acoustic predictions. CAMRAD/JA computes the wake geometry and inflow angles that are used in FPR to solve for the aerodynamic surface pressures. RAPP uses these surface pressures to predict the acoustic pressure. Both CAMRAD/JA and FPR utilize the Biot-Savart Law to determine the influence of the vortical velocities on the blade loading and both codes use an algebraic vortex model for the solid body rotation of the vortex core. Large changes in the specification of the vortex core size do not change the inplane wake geometry calculated by CAMRAD/JA and only slightly affect the out-of-plane wake geometry. However, the aerodynamic surface pressure calculated by FPR changes in both magnitude and character with small changes to the core size used by the FPR calculations. This in turn affects the acoustic predictions. Shifting the CAMRAD/JA wake geometry away from the rotor plane by 1/4 chord produces drastic changes in the acoustic predictions indicating that the prediction of acoustic pressure is extremely sensitive to the miss distance between the vortex and the blade and that this distance must be calculated as accurately as possible for acceptable noise predictions. The inclusion or exclusion of a vortex in the FPR-RAPP calculation allows for the determination of the relative importance of that vortex as a BVI noise source.

The use of a trailing edge flap on a helicopter rotor has been numerically simulated to determine if such a device can mitigate the acoustics of bladevortexinteractions (BVI). The numerical procedure employs CAMRAD/JA, a lifting-line helicopter rotor trim code, in conjunction with RFS2, an unsteady transonic full-potential flow solver, and WOPWOP, an acoustic model based on Farassat's formulation 1A. The codes were modified to simulate trailing edge flap effects. The CAMRAD/JA code was used to compute the far wake inflow effects and the vortex wake trajectories and strengths which are utilized by RFS2 to predict the blade surface pressure variations. These pressures were then analyzed using WOPWOP to determine the high frequency acoustic response at several fixed observer locations below the rotor disk. Comparisons were made with different flap deflection amplitudes and rates to assess flap effects on BVI. Numerical experiments were carried out using a one-seventh scale AH-1G rotor system for flight conditions simulating BVI encountered during low speed descending flight with and without flaps. Predicted blade surface pressures and acoustic sound pressure levels obtained have shown good agreement with the baseline no-flap test data obtained in the DNW wind tunnel. Numerical results indicate that the use of flaps is beneficial in reducing BVI noise.

The impulsive nature of noise due to the interaction of a rotor blade with a tip vortex is studied. The time signature of this noise is calculated theoretically based on the measured blade surface pressure fluctuation of an operational load survey rotor in slow descending flight and is compared with the simultaneous microphone measurement. Particularly, the physical understanding of the characteristic features of a waveform is extensively studied in order to understand the generating mechanism and to identify the important parameters. The interaction trajectory of a tip vortex on an acoustic planform is shown to be a very important parameter for the impulsive shape of the noise. The unsteady nature of the pressure distribution at the very leading edge is also important to the pulse shape. The theoretical model using noncompact liner acoustics predicts the general shape of interaction impulse pretty well except for peak amplitude which requires more continuous information along the span at the leading edge.

The BVI noise prediction method developed at ONERA is a combination of three computer programs. The first program (MESIR) calculates the geometry and the intensity of the main rotor wake using a free wake analysis. The second program (ARHIS) provides the blade pressure fluctuations induced by the rotor wake even for close interactions. The third code (PARIS), based on the

Perpendicular bladevortexinteractions are a common occurrence in helicopter rotor flows. Under certain conditions they produce a substantial proportion of the acoustic noise. However, the mechanism of noise generation is not well understood. Specifically, turbulence associated with the trailing vortices shed from the blade tips appears insufficient to account for the noise generated. The hypothesis that the first perpendicular interaction experienced by a trailing vortex alters its turbulence structure in such a way as to increase the acoustic noise generated by subsequent interactions is examined. To investigate this hypothesis a two-part investigation was carried out. In the first part, experiments were performed to examine the behavior of a streamwise vortex as it passed over and downstream of a spanwise blade in incompressible flow. Bladevortex separations between +/- one eighth chord were studied for at a chord Reynolds number of 200,000. Three-component velocity and turbulence measurements were made in the flow from 4 chord lengths upstream to 15 chordlengths downstream of the blade using miniature 4-sensor hot wire probes. These measurements show that the interaction of the vortex with the blade and its wake causes the vortex core to loose circulation and diffuse much more rapidly than it otherwise would. Core radius increases and peak tangential velocity decreases with distance downstream of the blade. True turbulence levels within the core are much larger downstream than upstream of the blade. The net result is a much larger and more intense region of turbulent flow than that presented by the original vortex and thus, by implication, a greater potential for generating acoustic noise. In the second part, the turbulence measurements described above were used to derive the necessary inputs to a Blade Wake Interaction (BWI) noise prediction scheme. This resulted in significantly improved agreement between measurements and calculations of the BWI noise spectrum especially for the spectral peak at low frequencies, which previously was poorly predicted.

The paper presents a status of theoretical tools of AFDD, DLR, NASA and ONERA for prediction of the effect of HHC on helicopter main rotor BVI noise. Aeroacoustic predictions from the four research centers, concerning a wind tunnel simulation of a typical descent flight case without and with HHC are presented and compared. The results include blade deformation, geometry of interacting vortices, sectional loads and noise. Acoustic predictions are compared to experimental data. An analysis of the results provides a first insight of the mechanisms by which HHC may affect BVI noise.

Beaumier, P.; Prieur, J.; Rahier, G.; Spiegel, P.; Demargne, A.; Tung, C.; Gallman, J. M.; Yu, Y. H.; Kube, R.; Vanderwall, B. G.

Rotor impulsive noise is, of all the known sources of helicopter far-field noise radiation, the one which tends to dominate the acoustic spectrum generated by most helicopters. The helicopter generates a highly directional and rather unique form of impulsive noise which is thought to be generated by two source mechanisms: (1) high-speed impulsive noise due to formation of a shock

A numerical study of the aerodynamic noise generated when an airfoil/blade in a uniform flow is excited by an oncoming vortical flow is reported. The vortical flow is modelled by a series of flow convected discrete vortices representative of a Karman vortex street. Such noise generation problems due to fluid-blade interaction occur in helicopter rotor and turbomachinery blades. Interactions with both rigid and elastic airfoil/blade are considered. Under a vortical excitation, aerodynamic resonance of the airfoil/blade at certain excitation frequencies is found to occur and loading noise is generated due to the fluctuations of the aerodynamic loading on the airfoil/blade. For an elastic blade, due the occurrence of structural resonance incited by the flow-induced vibration of the airfoil/blade, a stronger loading noise is generated. The associated thickness effect due to the airfoil/blade vibration is extremely weak. The magnitude of the noise was found to depend on the frequency of the oncoming vortical flow and the geometry and rigidity of the blade.

Unsteady vortex-airfoil interaction experiments at transonic Mach numbers ranging from 0.7 to 0.85 and airfoil chord Reynolds numbers of 3.5 X 106 to 5.5 x 10 6 were conducted in the University of Texas at Arlington high Reynolds number transonic wind-tunnel facility. The experiments were designed to simulate a two-dimen- sional blade-vortexinteraction problem frequently encountered in rotocraft applications. The

A combined Eulerian/Lagrangian approach to calculating helicopter rotor flows with concentrated vortices is described. The method computes a general evolving vorticity distribution without any significant numerical diffusion. Concentrated vortices can be accurately propagated over long distances on relatively coarse grids with cores only several grid cells wide. The method is demonstrated for a blade/vortex impingement case in 2D and 3D where a vortex is cut by a rotor blade, and the results are compared to previous 2D calculations involving a fifth-order Navier-Stokes solver on a finer grid.

The plane strain problem of a multi-layered composite with parallel cracks is considered. The main objective is to study the interaction between parallel and collinear cracks. The problem is formulated in terms of a set of simultaneous singular integral equations which are solved numerically. The effect of material properties on the interaction between cracks is also demonstrated.

Helicopter rotor noise calculations to validate two numerical codes currently being developed are presented. BENP code results for high-speed impulse noise are illustrated for helicopters in forward flight up to very high advancing blade tip Mach number. The HERNOP and WOPWOP codes are compared for high-speed flight.

Iannielo, S.; di Francescantonio, P.; Tarica, D.; de Bernardis, E.

A rotor system (4) having odd and even blade assemblies (O.sub.b, E.sub.b) mounting to and rotating with a rotor hub assembly (6) wherein the odd blade assemblies (O.sub.b) define a radial length R.sub.O, and the even blade assemblies (E.sub.b) define a radial length R.sub.E and wherein the radial length R.sub.E is between about 70% to about 95% of the radial length R.sub.O. Other embodiments of the invention are directed to a Variable Diameter Rotor system (4) which may be configured for operating in various operating modes for optimizing aerodynamic and acoustic performance. The Variable Diameter Rotor system (4) includes odd and even blade assemblies (O.sub.b, E.sub.b) having inboard and outboard blade sections (10, 12) wherein the outboard blade sections (12) telescopically mount to the inboard blade sections (10). The outboard blade sections (12) are positioned with respect to the inboard blade sections (10 such that the radial length R.sub.E of the even blade assemblies (E.sub.b) is equal to the radial length R.sub.O of the odd blade assemblies (O.sub.b) in a first operating mode, and such that the radial length R.sub.E is between about 70% to about 95% of the length R.sub.O in a second operating mode.

Moffitt, Robert C. (Inventor); Visintainer, Joseph A. (Inventor)

This article introduces an interactiveparallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.

An analysis of the Tip Aerodynamic/Aeroacoustic Test (TAAT) data was performed to identify possible aerodynamic sources of blade/vortexinteraction (BVI) impulsive noise. The identification is based on correlation of measured blade pressure time histories with predicted blade/vortex intersections for the flight condition(s) where impulsive noise was detected. Due to the location of the recording microphones, only noise signatures associated with the advancing blade were available, and the analysis was accordingly restricted to the first and second azimuthal quadrants. The results show that the blade tip region is operating transonically in the azimuthal range where previous BVI experiments indicated the impulsive noise to be. No individual blade/vortex encounter is identifiable in the pressure data; however, there is indication of multiple intersections in the roll-up region which could be the origin of the noise. Discrete blade/vortex encounters are indicated in the second quadrant; however, if impulsive noise were produced here, the directivity pattern would be such that it was not recorded by the microphones. It is demonstrated that the TAAT data base is a valuable resource in the investigation of rotor aerodynamic/aeroacoustic behavior.

High resolution fluctuating airloads data were acquired during a test of a contemporary design United Technologies model rotor in the Duits-Nederlandse Windtunnel (DNW). The airloads are used as input to the noise prediction program WOPWOP, in order to predict the blade-vortexinteraction (BVI) noise field on a large plane below the rotor. Trends of predicted advancing and retreating side BVI

Michael A. Marcolini; Ruth M. Martin; Peter F. Lorber; T. A. Egolf

Identifying physical interactions between proteins and other molecules is a critical aspect of biological analysis. Here we describe PLATO, an in vitro method for mapping such interactions by affinity enrichment of a library of full-length open reading frames displayed on ribosomes, followed by massively parallel analysis using DNA sequencing. We demonstrate the broad utility of the method for human proteins by identifying known and previously unidentified interacting partners of LYN kinase, patient autoantibodies, and the small-molecules gefitinib and dasatinib. PMID:23503679

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

Because the variable ability of the antibody constant (Fc) domain to recruit innate immune effector cells and complement is a major factor in antibody activity in vivo, convenient means of assessing these binding interactions is of high relevance to the development of enhanced antibody therapeutics, and to understanding the protective or pathogenic antibody response to infection, vaccination, and self. Here, we describe a highly parallel microsphere assay to rapidly assess the ability of antibodies to bind to a suite of antibody receptors. Fc and glycan binding proteins such as Fc?R and lectins were conjugated to coded microspheres and the ability of antibodies to interact with these receptors was quantified. We demonstrate qualitative and quantitative assessment of binding preferences and affinities across IgG subclasses, Fc domain point mutants, and antibodies with variant glycosylation. This method can serve as a rapid proxy for biophysical methods that require substantial sample quantities, high-end instrumentation, and serial analysis across multiple binding interactions, thereby offering a useful means to characterize monoclonal antibodies, clinical antibody samples, and antibody mimics, or alternatively, to investigate the binding preferences of candidate Fc receptors. PMID:24927273

Boesch, Austin W; Brown, Eric P; Cheng, Hao D; Ofori, Maame Ofua; Normandin, Erica; Nigrovic, Peter A; Alter, Galit; Ackerman, Margaret E

The impulsive noise due to blade-vortex-interaction is analyzing in the time domain for the extreme case when the blade cuts through the center of the vortex core with the assumptions of no distortion of the vortex path or of the vortex core. An analytical turbulent vortex core model, described in terms of the tip aerodynamic parameters, is used and its effects on the unsteady loading and maximum acoustic pressure during the interaction are determined.

Hydrodynamic interaction between model ships moving on parallel courses was studied for three types of passing encounter: (i) where ships passed on parallel courses - the overtaking encounter; (ii) where ships encountered head-on, and (iii) where a statio...

Students attending a graduate course on the Theory of Vortex Sound given recently at Boston University were required to investigate the low Mach number unsteady flow and the accompanying acoustic radiation for a selection of idealized flow-structure interactions. These included linear and non-linear parallelblade-vortexinteractions for two-dimensional airfoils, and for finite span airfoils of variable chord; interactions between line vortices and surface projections from a plane wall; bluff-body interactions involving line and ring vortices impinging on circular cylindrical and spherical bodies, and vortex motion in the neighborhood of a wall aperture. In all cases, the effective source region was localized in either two or three dimensions, and could be regarded as acoustically compact, and the sound was calculated by routine numerical methods using the theory of compact Green functions. The results are collected together in this paper as a compendium of canonical solutions that provide qualitative and quantitative insight into the mechanisms responsible for sound production, and a database that can be used to validate predictions of more generally applicable numerical schemes.

ABOU-HUSSEIN, H.; DEBENEDICTIS, A.; HARRISON, N.; KIM, M.; RODRIGUES, M. A.; ZAGADOU, F.; HOWE, M. S.

Three in-class lecture demonstration questions to test and build understanding of DC circuits are presented. These questions cover simple series and parallel circuits, and a more complicated circuit that is fundamental for understanding this topic.

An application-independent visualization interaction system is proposed with potential for application-binding at any stage of the modeling process. Advantages of this approach include ease of use, flexibility, code reuse, and modularity. Our design ideas...

A study of crystal structures from the Cambridge Structural Database (CSD) and DFT calculations reveals that parallel pyridine-pyridine and benzene-pyridine interactions at large horizontal displacements (offsets) can be important, similar to parallel benzene-benzene interactions. In the crystal structures from the CSD preferred parallel pyridine-pyridine interactions were observed at a large horizontal displacement (4.0-6.0 Å) and not at an offset of 1.5 Å with the lowest calculated energy. The calculated interaction energies for pyridine-pyridine and benzene-pyridine dimers at a large offset (4.5 Å) are about 2.2 and 2.1 kcal mol(-1), respectively. Substantial attraction at large offset values is a consequence of the balance between repulsion and dispersion. That is, dispersion at large offsets is reduced, however, repulsion is also reduced at large offsets, resulting in attractive interactions. PMID:23090910

Ninkovi?, Dragan B; Andri?, Jelena M; Zari?, Snežana D

The vortex method in the simulation of 2D incompressible flows with complex interacting circulations is very attractive if compared with other nowadays widespread methods. However, the vortex method is computationally very expensive and suggests the adoption of suitably powerful computing units. Our group analysed some parallelisation techniques of the algorithm, in order to obtain the best performances on a dedicated

G. Braschi; Giovanni Danese; Ivo De Lotto; D. Dotti; M. Gallati; Francesco Leporati; M. Mazzoleni

The interaction of travelling interplanetary shock waves with the bow shock-magnetosphere system is considered. We consider the general case when the interplanetary magnetic field is oblique to the Sun-planetary axis, thus, the interplanetary shock is neither parallel nor perpendicular. We find that an ensemble of shocks are produced after the interaction for a representative range of shock Mach numbers. First,

The effect of the interaction of parallel slip bands of edge-type, especially with narrow spacing, on the stress concentration was studied based on the concept of continuous distribution of infinitesimal dislocations. Three cases in which the number of sl...

The Uintah computational framework is a component-based infrastructure, designed for highly parallel simulations of complex\\u000a fluid–structure interaction problems. Uintah utilizes an abstract representation of parallel computation and communication\\u000a to express data dependencies between multiple physics components. These features allow parallelism to be integrated between\\u000a multiple components while maintaining overall scalability. Uintah provides mechanisms for load-balancing, data communication,\\u000a data I\\/O, and

The velocity field estimated by first arrival traveltime tomography is commonly used as a starting point for further seismological, mineralogical, tectonic or similar analysis. In order to interpret quantitatively the results, the tomography uncertainty values as well as their spatial distribution are required. The estimated velocity model is obtained through inverse modeling by minimizing an objective function that compares observed and computed traveltimes. This step is often performed by gradient-based optimization algorithms. The major drawback of such local optimization schemes, beyond the possibility of being trapped in a local minimum, is that they do not account for the multiple possible solutions of the inverse problem. They are therefore unable to assess the uncertainties linked to the solution. Within a Bayesian (probabilistic) framework, solving the tomography inverse problem aims at estimating the posterior probability density function of velocity model using a global sampling algorithm. Markov chains Monte-Carlo (MCMC) methods are known to produce samples of virtually any distribution. In such a Bayesian inversion, the total number of simulations we can afford is highly related to the computational cost of the forward model. Although fast algorithms have been recently developed for computing first arrival traveltimes of seismic waves, the complete browsing of the posterior distribution of velocity model is hardly performed, especially when it is high dimensional and/or multimodal. In the latter case, the chain may even stay stuck in one of the modes. In order to improve the mixing properties of classical single MCMC, we propose to make interact several Markov chains at different temperatures. This method can make efficient use of large CPU clusters, without increasing the global computational cost with respect to classical MCMC and is therefore particularly suited for Bayesian inversion. The exchanges between the chains allow a precise sampling of the high probability zones of the model space while avoiding the chains to end stuck in a probability maximum. This approach supplies thus a robust way to analyze the tomography imaging uncertainties. The interacting MCMC approach is illustrated on two synthetic examples of tomography of calibration shots such as encountered in induced microseismic studies. On the second application, a wavelet based model parameterization is presented that allows to significantly reduce the dimension of the problem, making thus the algorithm efficient even for a complex velocity model.

Gesret, Alexandrine; Bottero, Alexis; Romary, Thomas; Noble, Mark; Desassis, Nicolas

A highly interactive visual analysis system is presented that is based on an enhanced variant of parallel coordinates — a multivariate information visualization technique. The system combines many variations of previously described visual interaction techniques such as dynamic axis scaling, conjunctive visual queries, statistical indicators, and aerial perspective shading. The system capabilities are demonstrated on a hurricane climate data set. This climate study corroborates the notion that enhanced visual analysis with parallel coordinates provides a deeper understanding when used in conjunction with traditional multiple regression analysis.

Steed, Chad A.; Fitzpatrick, Patrick J.; Jankun-Kelly, T. J.; Yancey, Amber N.; Swan, J. Edward, II

Specific absorption rate management and excitation fidelity are key aspects of radio frequency pulse design for parallel transmission at ultra high magnetic field strength. The design of radio frequency pulses for multiple channels is often based on the solution of regularized least squares optimization problems for which a regularization term is typically selected to control the integrated or peak pulse waveform amplitude. Unlike single channel transmission, the specific absorption rate of parallel transmission is significantly influenced by interferences between the electric fields associated with the individual transmission elements, which a conventional regularization term does not take into account. This work explores the effects upon specific absorption rate of incorporating experimentally measurable electric field interactions into parallel transmission pulse design. Results of numerical simulations and phantom experiments show that the global specific absorption rate during parallel transmission decreases when electric field interactions are incorporated into pulse design optimization. The results also show that knowledge of electric field interactions enables robust prediction of the net power delivered to the sample or subject by parallel radio frequency pulses before they are played out on a scanner.

Deniz, Cem Murat; Alon, Leeor; Brown, Ryan; Sodickson, Daniel K.; Zhu, Yudong

SUMMARY Massively parallel finite element computations of 3D, unsteady incompressible flows, including those involving fluid-structure interactions, are presented. The computations with time-varying spatial domains are based on the deforming spatial domain\\/stabilized space-time (DSD\\/SST) finite element formulation. The capability to solve 3D problems involving fluid-structure interactions is demonstrated by investigating the dynamics of a fle~ible cantilevered pipe conveying fluid. Computations of

We present a model for the study of the hysteretic behavior of a disordered ferromagnetic system with an array of parallel Bloch domain walls. We write the equations of motion of the walls under an external magnetic field driving, considering long-range dipolar interactions and disorder. We calculate analytically an expression for the magnetic susceptivity ?, and find a logarithmic dependence

In many engineering fields, dynamic response in fluid-structure interaction (FSI) is important, and some of the FSI phenomena are treated as acoustic FSI (AFSI) problems. Dynamic interactions between fluids and structures may change dynamic characteristics of the structure and its response to external excitation parameters such as seismic loading. This paper describes a parallel coupling analysis system for large-scale AFSI problems using iterative partitioned coupling techniques. We employ an open source parallel finite element analysis system called ADVENTURE, which adopts an efficient preconditioned iterative linear algebraic solver. In addition, we have recently developed a parallel coupling tool called ADVENTURE_Coupler to efficiently handle interface variables in various parallel computing environments. We also employ the Broyden method for updating interface variables to attain robust and fast convergence of fixed-point iterations. This paper describes key features of the coupling analysis system developed, and we perform tests to validate its performance for several AFSI problems. The system runs efficiently in a parallel environment, and it is capable of analyzing three-dimensional-complex-shaped structures with more than 20 million degrees-of-freedom (DOFs). Its numerical results also show good agreement with experimental results.

The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.

Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

The interactions between flames spreading over parallel solid sheets of paper are being studied in normal gravity and in microgravity. This geometry provides interesting opportunities to study the interaction of radiative and diffusive transport mechanisms on the spread process. These transport mechanisms are changed when the flame interacts with other flames. Most practical heterogeneous combustion processes involve interacting discrete burning fuel elements, consequently, the study of these interactions is of practical significance. Owing largely to this practical importance, flame interactions have been an area of active research, however microgravity research has been largely limited to droplets. Consideration of flame spread over parallel solid surfaces has been limited to 1-g studies. To study the conductive transport in these flames, an interferometer system has been developed for use in the drop tower. The system takes advantage of a single beam interferometer: Point Diffraction Interferometry (PDI) which uses a portion of the light through the test section to provide the reference beam. Like other interferometric and Schlieren systems, it is a line of sight measurement and is subject to the usual edge and concentration effects. The advantage over Schlieren and shearing interferometry systems is that the fringes are lines of constant index of refraction rather than of its gradient so the images are more readily interpreted. The disadvantage is that it is less able to accommodate a range of temperature gradients.

For particles interacting via two- and three-body potentials, a domain-decomposition algorithm is used to implement molecular dynamics (MD) on distributed memory MIMD (multiple-instruction multiple-data) computers. The algorithm employs the linked-cell-list method and separable three-body force calculation. The force calculation is accelerated by the multiple-time-step (MTS) method. For a 1.54 million particle SiO 2 system, the MD program runs at a speed of 660 time steps per hour (1100 steps/h without the three-body interaction) on a 64-node Intel iPSC/860. The parallel algorithm is highly efficient (parallel efficiency = 0.973), as it involves only 3% communication overhead. Utilizing the second derivatives of the potential energy, the conjugate-gradient search for a local minimum underlying an MD configuration is accelerated by a factor of 13.

Nakano, Aiichiro; Vashishta, Priya; Kalia, Rajiv K.

The parallel processing based on the free running model test was adopted to predict the interaction force coefficients (flow\\u000a straightening coefficient and wake fraction) of ship maneuvering. And the multi-population genetic algorithm (MPGA) based\\u000a on real coding that can contemporarily process the data of free running model and simulation of ship maneuvering was applied\\u000a to solve the problem. Accordingly the

We present a parallel finite element computational method for 3D simulation of fluid–structure interactions (FSI) in parachute systems. The flow solver is based on a stabilized finite element formulation applicable to problems involving moving boundaries and governed by the Navier–Stokes equations of incompressible flows. The structural dynamics (SD) solver is based on the total Lagrangian description of motion, with cable

The objective of this article is to report the parallel implementation of the 3D molecular dynamic simulation code for laser-cluster interactions. The benchmarking of the code has been done by comparing the simulation results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.

Holkundkar, Amol R. [Department of Physics, Birla Institute of Technology and Science, Pilani-333 031 (India)] [Department of Physics, Birla Institute of Technology and Science, Pilani-333 031 (India)

This study has conducted parallel simulations of interacting inertial particles in statistically-steady isotropic turbulence using a newly-developed efficient parallel simulation code. Flow is computed with a fourth-order finite-difference method and particles are tracked with the Lagrangian method. A binary-based superposition method has been developed and implemented in the code in order to investigate the hydrodynamic interaction among many particles. The code adopts an MPI library for a distributed-memory parallelization and is designed to minimize the MPI communication, which leads to a high parallel performance. The code has been run to obtain collision statistics of a monodisperse system with St = 0.4 particles, where St is the Stokes number representing the particle relaxation time relative to the Kolmogorov time. The attained Taylor-microscale based Reynolds number R? ranges from 54.9 to 527. The largest simulation computed the flow on 20003 grids and 10003 (one billion) particles. Numerical results have shown that the collision kernel increases for R?<100 then decreases as R? increases. This Reynolds dependency is attributed to that of the radial distribution function at contact, which measures the contribution of particle clustering to the collision kernel. The results have also shown that the hydrodynamic interaction for St = 0.4 particles decreases both the radial relative velocity and radial distribution function at contact, leading the collision efficiency less than unity. The collision efficiency increases from 0.65 to 0.75 as R? increases for R?<200 and then saturates.

Dispersion and electrostatics are known to stabilize ?-? interactions, but the preference for parallel-displaced (PD) and/or twisted (TW) over sandwiched (S) conformations is not well understood. Orbital interactions are generally believed to play little to no role in ?-stacking. However, orbital analysis of the dimers of benzene, pyridine, cytosine and several polyaromatic hydrocarbons demonstrates that PD and/or TW structures convert one or more ?-type dimer MOs with out-of-phase or antibonding inter-ring character at the S stack to in-phase or bonding in the PD/TW stack. This change in dimer MO character can be described in terms of a qualitative stack bond order (SBO) defined as the difference between the number of occupied in-phase/bonding and out-of-phase/antibonding inter-ring ?-type MOs. The concept of an SBO is introduced here in analogy to the bond order in molecular orbital theory. Thus, whereas the SBO of the S structure is zero, parallel displacement or twisting the stack results in a non-zero SBO and overall bonding character. The shift in bonding/antibonding character found at optimal PD/TW structures maximizes the inter-ring density, as measured by intermolecular Wiberg bond indices (WBIs). Values of WBIs calculated as a function of the parallel-displacement are found to correlate with the dispersion and other contributions to the ?-? interaction energy determined by the highly accurate density-fitting DFT symmetry adapted perturbation theory (DF-DFT-SAPT) method. These DF-DFT-SAPT calculations also suggest that the dispersion and other contributions are maximized at the PD conformation rather than the S when conducted on a potential energy curve where the inter-ring distance is optimized at fixed slip distances. From these results of this study, we conclude that descriptions of the qualitative manner in which orbitals interact within ?-stacking interactions can supplement high-level calculations of the interaction energy and provide an intuitive tool for applications to crystal design, molecular recognition and other fields where non-covalent interactions are important. PMID:23665910

Massively Parallel Processor algorithms were developed for the interactive manipulation of flat shaded digital terrain models defined over grids. The emphasis is on real time manipulation of stereo images. Standard graphics transformations are applied to ...

In this paper, with the aid of superimposing technique and the Pseudo Traction Method (PTM), the interaction problem between an interface macrocrack and parallel microcracks in the process zone in bimaterial anisotropic solids is reduced to a system of integral equations. After the integral equations are solved numerically, a conservation law among three kinds of J-integrals is obtained which are induced from the interface macrocrack tip, the microcrack and the remote field, respectively. This conservation law reveals that the microcrack shielding effect in such materials could be considered as the redistribution of the remote J-integral.

Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.

Guruswamy, Guru; VanDalsem, William (Technical Monitor)

Analysis of transcription factor binding to DNA sequences is of utmost importance to understand the intricate regulatory mechanisms that underlie gene expression. Several techniques exist that quantify DNA-protein affinity, but they are either very time-consuming or suffer from possible misinterpretation due to complicated algorithms or approximations like many high-throughput techniques. We present a more direct method to quantify DNA-protein interaction in a force-based assay. In contrast to single-molecule force spectroscopy, our technique, the Molecular Force Assay (MFA), parallelizes force measurements so that it can test one or multiple proteins against several DNA sequences in a single experiment. The interaction strength is quantified by comparison to the well-defined rupture stability of different DNA duplexes. As a proof-of-principle, we measured the interaction of the zinc finger construct Zif268/NRE against six different DNA constructs. We could show the specificity of our approach and quantify the strength of the protein-DNA interaction. PMID:24586920

Limmer, Katja; Pippig, Diana A; Aschenbrenner, Daniela; Gaub, Hermann E

We have studied the thermodynamics of a fluid of parallel cylinders which interact both through volume exclusion and through a longer-range quadrupolar interaction between point quadrupoles placed at the centres of the cylinders. The volume exclusion is treated using the Onsager approximation, and the long-range part of the potential is treated in mean field theory. At sufficiently high densities there

Parallel analysis of translated open reading frames (ORFs) (PLATO) can be used for the unbiased discovery of interactions between full-length proteins encoded by a library of 'prey' ORFs and surface-immobilized 'bait' antibodies, polypeptides or small-molecular-weight compounds. PLATO uses ribosome display (RD) to link ORF-derived mRNA molecules to the proteins they encode, and recovered mRNA from affinity enrichment is subjected to analysis using massively parallel DNA sequencing. Compared with alternative in vitro methods, PLATO provides several advantages including library size and cost. A unique advantage of PLATO is that an alternative reverse transcription-quantitative PCR (RT-qPCR) protocol can be used to test binding of specific, individual proteins. To illustrate a typical experimental workflow, we demonstrate PLATO for the identification of the immune target of serum antibodies from patients with inclusion body myositis (IBM). Beginning with an ORFeome library in an RD vector, the protocol can produce samples for deep sequencing or RT-qPCR within 4 d. PMID:24336473

Larman, H Benjamin; Liang, Anthony C; Elledge, Stephen J; Zhu, Jian

New powerful parallel computational tools are developed for 3D simulation of unsteady wake flows with complex geometries and fluid-structure interactions. The base method for flow simulation is a finite element formulation for the Navier-Stokes equations. The finite element formulation is based on the streamline-upwind/Petrov-Galerkin (SUPG) and pressure-stabilizing/Petrov-Galerkin (PSPG) techniques. These stabilization techniques facilitate simulation of flows with high Reynolds numbers, and allow us to use equal-order interpolation functions for velocity and pressure without generating numerical oscillations. A multi-domain computational method is developed to simulate wake flow both in the near and far downstream. The formulations lead to coupled nonlinear equation systems which are solved, at every time step, with the Newton-Raphson method. The overall formulation and solution techniques are implemented on parallel platforms such as the CRAY T3E and SGI PowerChallenge. Two phases of vortex shedding for flow past a cylinder is simulated to verify the accuracy of this method. The Enhanced-Discretization Interface Capturing Technique (EDICT) is utilized to simulate wake flow accurately. Fluid-structure coupling solution method based on the Deforming-Spatial-Domain/Stabilized Space-Time (DSD/SST) formulation is applied to simulate a parachute behavior in the unsteady wake.

Ab initio kinetic Monte Carlo (KMC) simulations have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These simulations necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for simulating catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce parallelization with OpenMP. We further benchmark our framework by simulating a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion. PMID:24329081

Ab initio kinetic Monte Carlo (KMC) simulations have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These simulations necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for simulating catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce parallelization with OpenMP. We further benchmark our framework by simulating a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.

Critical adsorption of a lattice self-avoiding bond fluctuation polymer chain confined between two parallel impenetrable surfaces is studied using the Monte Carlo method. The dependence of the mean contact number on the temperature T and on the chain length N is simulated for a polymer-surface interaction E=-1. A critical adsorption of the polymer is found at Tc=1.65 for large surface separation distance D>N?b, whereas no critical adsorption is observed for small distance D

A theoretical and experimental study was conducted to develop a validated first principles analysis for predicting noise generated by helicopter main-rotor shed vortices interacting with the tail rotor. The generalized prediction procedure requires a knowledge of the incident vortex velocity field, rotor geometry, and rotor operating conditions. The analysis includes compressibility effects, chordwise and spanwise noncompactness, and treats oblique intersections with the blade planform. Assessment of the theory involved conducting a model rotor experiment which isolated the blade-vortexinteraction noise from other rotor noise mechanisms. An isolated tip vortex, generated by an upstream semispan airfoil, was convected into the model tail rotor. Acoustic spectra, pressure signatures, and directivity were measured. Since assessment of the acoustic prediction required a knowledge of the vortex properties, blade-vortes intersection angle, intersection station, vortex stength, and vortex core radius were documented. Ingestion of the vortex by the rotor was experimentally observed to generate harmonic noise and impulsive waveforms.

The advent of affordable parallel computers such as Beowulf PC clusters and, more recently, multicore PCs has been highly beneficial for a large number of scientists and smaller institutions that might not otherwise have access to substantial computing facilities. However, there hasn't been an analogous progress in the development and dissemination of parallel software-scientists need the expertise to develop parallel

Enrico Vesperini; David M. Goldberg; Stephen L. W. Mcmillan; James Dura; Douglas Jones

It has long been recognized that flow in the melt can have a profound influence on the dynamics of a solidifying interface and hence the quality of the solid material. In particular, flow affects the heat and mass transfer, and causes spatial and temporal variations in the flow and melt composition. This results in a crystal with nonuniform physical properties. Flow can be generated by buoyancy, expansion or contraction upon phase change, and thermo-soluto capillary effects. In general, these flows can not be avoided and can have an adverse effect on the stability of the crystal structures. This motivates crystal growth experiments in a microgravity environment, where buoyancy-driven convection is significantly suppressed. However, transient accelerations (g-jitter) caused by the acceleration of the spacecraft can affect the melt, while convection generated from the effects other than buoyancy remain important. Rather than bemoan the presence of convection as a source of interfacial instability, Hurle in the 1960s suggested that flow in the melt, either forced or natural convection, might be used to stabilize the interface. Delves considered the imposition of both a parabolic velocity profile and a Blasius boundary layer flow over the interface. He concluded that fast stirring could stabilize the interface to perturbations whose wave vector is in the direction of the fluid velocity. Forth and Wheeler considered the effect of the asymptotic suction boundary layer profile. They showed that the effect of the shear flow was to generate travelling waves parallel to the flow with a speed proportional to the Reynolds number. There have been few quantitative, experimental works reporting on the coupling effect of fluid flow and morphological instabilities. Huang studied plane Couette flow over cells and dendrites. It was found that this flow could greatly enhance the planar stability and even induce the cell-planar transition. A rotating impeller was buried inside the sample cell, driven by an outside rotating magnet, in order to generate the flow. However, it appears that this was not a well-controlled flow and may also have been unsteady. In the present experimental study, we want to study how a forced parallel shear flow in a Hele-Shaw cell interacts with the directionally solidifying crystal interface. The comparison of experimental data show that the parallel shear flow in a Hele-Shaw cell has a strong stabilizing effect on the planar interface by damping the existing initial perturbations. The flow also shows a stabilizing effect on the cellular interface by slightly reducing the exponential growth rate of cells. The left-right symmetry of cells is broken by the flow with cells tilting toward the incoming flow direction. The tilting angle increases with the velocity ratio. The experimental results are explained through the parallel flow effect on lateral solute transport. The phenomenon of cells tilting against the flow is consistent with the numerical result of Dantzig and Chao.

The advent of affordable parallel computers such as Beowulf PC clusters and, more recently, of multi-core PCs has been highly beneficial for a large number of scientists and smaller institutions that might not otherwise have access to substantial computing facilities. However, there has not been an analogous progress in the development and dissemination of parallel software: scientists need the expertise

Enrico Vesperini; David M. Goldberg; Stephen L. W. McMillan; James Dura; Douglas Jones

In our previous study, we introduced a new hybrid approach to effectively approximate the total force on each ion during a trajectory calculation in mass spectrometry device simulations, and the algorithm worked successfully with SIMION. We took one step further and applied the method in massively parallel general-purpose computing with GPU (GPGPU) to test its performance in simulations with thousands to over a million ions. We took extra care to minimize the barrier synchronization and data transfer between the host (CPU) and the device (GPU) memory, and took full advantage of the latency hiding. Parallel codes were written in CUDA C++ and implemented to SIMION via the user-defined Lua program. In this study, we tested the parallel hybrid algorithm with a couple of basic models and analyzed the performance by comparing it to that of the original, fully-explicit method written in serial code. The Coulomb explosion simulation with 128,000 ions was completed in 309 s, over 700 times faster than the 63 h taken by the original explicit method in which we evaluated two-body Coulomb interactions explicitly on one ion with each of all the other ions. The simulation of 1,024,000 ions was completed in 2650 s. In another example, we applied the hybrid method on a simulation of ions in a simple quadrupole ion storage model with 100,000 ions, and it only took less than 10 d. Based on our estimate, the same simulation is expected to take 5-7 y by the explicit method in serial code.

A parallelized version of the Flowfield Dependent Variation (FDV) Method is developed to analyze a problem of current research interest, the flowfield resulting from a triple shock/boundary layer interaction. Such flowfields are often encountered in the inlets of high speed air-breathing vehicles including the NASA Hyper-X research vehicle. In order to resolve the complex shock structure and to provide adequate resolution for boundary layer computations of the convective heat transfer from surfaces inside the inlet, models containing over 500,000 nodes are needed. Efficient parallelization of the computation is essential to achieving results in a timely manner. Results from a parallelization scheme, based upon multi-threading, as implemented on multiple processor supercomputers and workstations is presented.

A parallel architecture is presented which is based on the Intel i860 RISC-processor and split into two subgroups of processors: the first designed to execute Monte Carlo loops, and the second to assess the statistical parameters. This approach enables efficient parallelization because data-transfer recurrence among processor groups is relatively small. With regard to the Monte Carlo-Metropolis algorithm (1953) the authors

E. Anzaldi; G. Danese; I. De Lotto; D. Dotti; F. Leporati; R. Lombardi; F. Prata; S. Romano

The use of a three-dimensional PIC (Particle-in-Cell) simulation is dispensable in the studies of nonlinear plasma physics,\\u000a such as ultra-intense laser interactions with plasmas. The three-dimensional simulation requires a large number of particles\\u000a more than 107 particles. It is therefore very important to develop a parallelization and vectorization scheme of the PIC code\\u000a and a visualization method of huge simulation

This paper presents a three-phase line-interactive uninterruptible power supply (UPS) system with series-parallel active power-line conditioning capabilities, using a synchronous-reference-frame (SRF)-based controller, which allows an effective power-factor correction, load harmonic current suppression, and output voltage regulation. The three-phase UPS system is composed of two active power filter topologies. The first one is a series active power filter, which works as

Sérgio Augusto Oliveira da Silva; P. F. Donoso-Garcia; P. C. Cortizo; P. F. Seixas

Sounding rockets launched by Mike Kelley and his group at Cornell demonstrated the existence of transient (1 ms) electric fields associated with lightning strikes at high altitudes above active thunderstorms. These electric fields had a component parallel to the Earth's magnetic field, and were unipolar and large in amplitude. They were thought to be strong enough to energize electrons and generate strong turbulence as the beams thermalized. The parallel electric fields were observed on multiple flights, but high time resolution measurements were not made within 100 km horizontal distance of lightning strokes, where the electric fields are largest. In 2000 the ``Lightning Bolt'' sounding rocket (NASA 27.143) was launched directly over an active thunderstorm to an apogee near 300 km. The sounding rocket was equipped with sensitive electric and magnetic field instruments as well as a photometer and electrostatic analyser for measuring accelerated electrons. The electric and magnetic fields were sampled at 10 million samples per second, letting us fully resolve the structure of the parallel electric field pulse up to and beyond the plasma frequency. We will present results from the Lightning Bolt mission, concentrating on the parallel electric field pulses that arrive before the lower-frequency whistler wave modes. We observe pulses with peak electric fields of a few mV/m lasting for a substantial fraction of a millisecond. Superimposed on this is high-frequency turbulence, comparable in amplitude to the pulse itself. This is the first direct observation of this structure in the parallel electric field, within 100 km horizontal distance of the lightning stroke. We will present evidence for the method of generation of these parallel fields, and discuss their probable effect on ionospheric electrons.

We present a "model" of hippocampal information processing based on a review of recent data regarding the local circuitry of Ammon's horn and the dentate gyrus. We have been struck by the parallels in cell type and connectivity in Ammon's horn and the dentate gyrus, and have focused on similarities between CA3 pyramidal cells and mossy cells. Important conclusions of our analysis include the following: (1) The idea of serial processing of afferent information, from one hippocampal subregion to the next, is inadequate and based on an over-simplification of circuitry; information processing undoubtedly occurs over parallel, as well as serial, pathways. (2) Local circuitry within a given hippocampal subregion gives rise predominantly to feedforward inhibition; recurrent inhibition is present, but less potent. (3) There are multiple populations of local circuit neurons, each of which has a specific function, characteristic interconnections, and special cell properties. It is misleading to categorize these cells into a single category of inhibitory interneuron. PMID:1975454

We present a highly parallel, linearly scalable technique of kd-tree construction for ray tracing of dynamic ge- ometry. We use conventional kd-tree compatible with the high performing algorithms such as MLRTA or frustum tracing. Proposed technique offers exceptional construction speed maintaining reasonable kd-tree quality for ren- dering stage. The algorithm builds a kd-tree from scratch each frame, thus prior knowledge

Maxim Shevtsov; Alexei Soupikov; Alexander Kapustin

A tokamak fusion reactor dumps a large amount of heat and particle flux to the divertor through the scrape-off plasma (SOL). Situation exists either by necessity or through deliberate design that the SOL plasma attains long mean-free-path along large segments of the open field lines. The rapid parallel streaming of electrons requires a large parallel electric field to maintain ambipolarity. The confining effect of the parallel electric field on electrons leads to a trap/passing boundary in the velocity space for electrons. In the normal situation where the upstream electron source populates both the trapped and passing region, a mechanism must exist to produce a flux across the electron trap/passing boundary. In a short mean-free-path plasma, this is provided by collisions. For long mean-free-path plasmas, wave-particle interaction is the primary candidate for detrapping the electrons. Here we present simulation results and a theoretical analysis using a model distribution function of trapped electrons. The dominating electromagnetic plasma instability and the associated collisionless scattering, that produces both particle and energy fluxes across the electron trap/passing boundary in velocity space, are discussed.

The Parallel-Plate Bounded-Wave EMP Simulator is typically used to test the vulnerability of electronic systems to the electromagnetic pulse (EMP) produced by a high altitude nuclear burst by subjecting the systems to a simulated EMP environment. However, when large test objects are placed within the simulator for investigation, the desired EMP environment may be affected by the interaction between the simulator and the test object. This simulator/obstacle interaction can be attributed to the following phenomena: (1) mutual coupling between the test object and the simulator, (2) fringing effects due to the finite width of the conducting plates of the simulator, and (3) multiple reflections between the object and the simulator's tapered end-sections. When the interaction is significant, the measurement of currents coupled into the system may not accurately represent those induced by an actual EMP. To better understand the problem of simulator/obstacle interaction, a dynamic analysis of the fields within the parallel-plate simulator is presented. The fields are computed using a moment method solution based on a wire mesh approximation of the conducting surfaces of the simulator. The fields within an empty simulator are found to be predominately transversse electromagnetic (TEM) for frequencies within the simulator's bandwidth, properly simulating the properties of the EMP propagating in free space. However, when a large test object is placed within the simulator, it is found that the currents induced on the object can be quite different from those on an object situated in free space. A comprehensive study of the mechanisms contributing to this deviation is presented.

We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside simulations verifying the coupled fluid-structure solver and assessing its parallel scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the simulation of a prototypical blast explosion in a realistic multistory building are presented.

Deiterding, Ralf [ORNL; Wood, Stephen L [University of Tennessee, Knoxville (UTK)

We argue through a combination of slave-boson mean-field theory and the Bethe ansatz that the ground state of closely spaced double quantum dots in parallel coupled to a single effective channel are Fermi liquids. We do so by studying the dots conductance, impurity entropy, and spin correlation. In particular, we find that the zero-temperature conductance is characterized by the Friedel sum rule, a hallmark of Fermi-liquid physics, and that the impurity entropy vanishes in the limit of zero temperature, indicating that the ground state is a singlet. This conclusion is in opposition to a number of numerical renormalization-group studies. We suggest a possible reason for the discrepancy.

In this interactive learning activity, students will learn about parallel circuits. They will measure and calculate the resistance of parallel circuits and answer several questions about the example circuit shown.

Consider the problem of a comet in a collision trajectory with a magnetized neutron star. The question addressed in this paper is whether the comet interacts strongly enough with a magnetic field such as to capture at a large radius or whether in general the comet will escape a magnetized neutron star. 6 refs., 4 figs.

Communication effectiveness and reconstruction validation are two important goals faced by archaeologists. This paper shows how these targets can be reached more easily by means of a mobile and user-centric fruition system designed with both the visitor's and the archaeologist's needs in mind. This system, called MUSE(1), consists of interactive multimedia tablets connected to a site control centre by a

Multi-dimensional, correlated particle tracking is a key technology to reveal dynamic processes in living and synthetic soft matter systems. In this paper we present a new method for tracking micron-sized beads in parallel and in all three dimensions - faster and more precise than existing techniques. Using an acousto-optic deflector and two quadrant-photo-diodes, we can track numerous optically trapped beads at up to tens of kHz with a precision of a few nanometers by back-focal plane interferometry. By time-multiplexing the laser focus, we can calibrate individually all traps and all tracking signals in a few seconds and in 3D. We show 3D histograms and calibration constants for nine beads in a quadratic arrangement, although trapping and tracking is easily possible for more beads also in arbitrary 2D arrangements. As an application, we investigate the hydrodynamic coupling and diffusion anomalies of spheres trapped in a 3 × 3 arrangement. PMID:22109012

Ruh, Dominic; Tränkle, Benjamin; Rohrbach, Alexander

The main focus of the present article is the development of a general solution framework for coupled and/or interaction multi-physics problems based upon re-using existing codes into software products. In particular, we discuss how to build this software tool for the case of fluid-structure interaction problem, from finite element code FEAP for structural and finite volume code OpenFOAM for fluid mechanics. This is achieved by using the Component Template Library (CTL) to provide the coupling between the existing codes into a single software product. The present CTL code-coupling procedure accepts not only different discretization schemes, but different languages, with the solid component written in Fortran and fluid component written in C++ . Moreover, the resulting CTL-based code also accepts the nested parallelization. The proposed coupling strategy is detailed for explicit and implicit fixed-point iteration solver presented in the Part I of this paper, referred to Direct Force-Motion Transfer/Block- Gauss-Seidel. However, the proposed code-coupling framework can easily accommodate other solution schemes. The selected application examples are chosen to confirm the capability of the code-coupling strategy to provide a quick development of advanced computational tools for demanding practical problems, such as 3D fluid models with free-surface flows interacting with structures.

Kassiotis, Christophe; Ibrahimbegovic, Adnan; Niekamp, Rainer; Matthies, Hermann G.

This is a final report as far as our work at University of Minnesota is concerned. The report describes our research progress and accomplishments in development of high performance computing methods and tools for 3D finite element computation of aerodynamic characteristics and fluid-structure interactions (FSI) arising in airdrop systems, namely ram-air parachutes and round parachutes. This class of simulations involves complex geometries, flexible structural components, deforming fluid domains, and unsteady flow patterns. The key components of our simulation toolkit are a stabilized finite element flow solver, a nonlinear structural dynamics solver, an automatic mesh moving scheme, and an interface between the fluid and structural solvers; all of these have been developed within a parallel message-passing paradigm.

By using the linearized quantum hydrodynamic (QHD) theory, electronic excitations induced by a charged particle moving between or over two parallel two-dimensional quantum electron gases (2DQEG) are investigated. The calculation shows that the influence of the quantum effects on the interaction process should be taken into account. Including the quantum statistical and quantum diffraction effects, the general expressions of the

Bone continuously adapts its internal structure to accommodate the functional demands of its mechanical environment and strain-induced flow of interstitial fluid is believed to be the primary mediator of mechanical stimuli to bone cells in vivo. In vitro investigations have shown that bone cells produce important biochemical signals in response to fluid flow applied using parallel-plate flow chamber (PPFC) systems. However, the exact mechanical stimulus experienced by the cells within these systems remains unclear. To fully understand this behaviour represents a most challenging multi-physics problem involving the interaction between deformable cellular structures and adjacent fluid flows. In this study, we use a fluid–structure interaction computational approach to investigate the nature of the mechanical stimulus being applied to a single osteoblast cell under fluid flow within a PPFC system. The analysis decouples the contribution of pressure and shear stress on cellular deformation and for the first time highlights that cell strain under flow is dominated by the pressure in the PPFC system rather than the applied shear stress. Furthermore, it was found that strains imparted on the cell membrane were relatively low whereas significant strain amplification occurred at the cell–substrate interface. These results suggest that strain transfer through focal attachments at the base of the cell are the primary mediators of mechanical signals to the cell under flow in a PPFC system. Such information is vital in order to correctly interpret biological responses of bone cells under in vitro stimulation and elucidate the mechanisms associated with mechanotransduction in vivo.

Bone continuously adapts its internal structure to accommodate the functional demands of its mechanical environment and strain-induced flow of interstitial fluid is believed to be the primary mediator of mechanical stimuli to bone cells in vivo. In vitro investigations have shown that bone cells produce important biochemical signals in response to fluid flow applied using parallel-plate flow chamber (PPFC) systems. However, the exact mechanical stimulus experienced by the cells within these systems remains unclear. To fully understand this behaviour represents a most challenging multi-physics problem involving the interaction between deformable cellular structures and adjacent fluid flows. In this study, we use a fluid-structure interaction computational approach to investigate the nature of the mechanical stimulus being applied to a single osteoblast cell under fluid flow within a PPFC system. The analysis decouples the contribution of pressure and shear stress on cellular deformation and for the first time highlights that cell strain under flow is dominated by the pressure in the PPFC system rather than the applied shear stress. Furthermore, it was found that strains imparted on the cell membrane were relatively low whereas significant strain amplification occurred at the cell-substrate interface. These results suggest that strain transfer through focal attachments at the base of the cell are the primary mediators of mechanical signals to the cell under flow in a PPFC system. Such information is vital in order to correctly interpret biological responses of bone cells under in vitro stimulation and elucidate the mechanisms associated with mechanotransduction in vivo. PMID:23365189

High resolution fluctuating airloads data were acquired during a test of a contemporary design United Technologies model rotor in the Duits-Nederlandse Windtunnel (DNW). The airloads are used as input to the noise prediction program WOPWOP, in order to predict the blade-vortexinteraction (BVI) noise field on a large plane below the rotor. Trends of predicted advancing and retreating side BVI noise levels and directionality as functions of flight condition are presented. The measured airloads have been analyzed to determine the BVI locations on the blade surface, and are used to interpret the predicted BVI noise radiation patterns. Predicted BVI locations are obtained using the free wake model in CAMRAD/JA, the UTRC Generalized Forward Flight Distorted Wake Model, and the UTRC FREEWAKE analysis. These predicted BVI locations are compared with those obtained from the measured pressure data.

Marcolini, Michael A.; Martin, Ruth M.; Lorber, Peter F.; Egolf, T. A.

High resolution fluctuating airloads data were acquired during a test of a contemporary design United Technologies model rotor in the Duits-Nederlandse Windtunnel (DNW). The airloads are used as input to the noise prediction program WOPWOP, in order to predict the blade-vortexinteraction (BVI) noise field on a large plane below the rotor. Trends of predicted advancing and retreating side BVI noise levels and directionality as functions of flight condition are presented. The measured airloads have been analyzed to determine the BVI locations on the blade surface, and are used to interpret the predicted BVI noise radiation patterns. Predicted BVI locations are obtained using the free wake model in CAMRAD/JA, the UTRC Generalized Forward Flight Distorted Wake Model, and the UTRC FREEWAKE analysis. These predicted BVI locations are compared with those obtained from the measured pressure data.

Marcolini, Michael A.; Martin, Ruth M.; Lorber, Peter F.; Egolf, T. A.

Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

The coupling of a finite-length, field-aligned, ion beam with a uniform background plasma is investigated using one-dimensional hybrid computer simulations. The finite-length beam is used to study the interaction between the incident solar wind and ions reflected from the Earth's quasi-parallel bow shock, where the reflection process may vary with time. The coupling between the reflected ions and the solar

We describe an efficient parallel and vector algorithm for solving huge eigen-vector problems in quantum chemistry. An automatically adaptive, single-vector, iterative diagonalization method was also developed to reduce the memory requirement and avoid an I\\/O bottleneck. Our initial full-configuration interaction calculation solved for an eigenvector with 65 billion coefficients and was performed on 432 MSPs of the Oak Ridge National

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems...

This report examines the current techniques of parallel processing, transputers, vector and vector supercomputers and covers such areas as transputer applications, programming models and language design for parallel processing.

The usual approach to predict particle loss in storage rings in the presence of nonlinearities consists in the determination of the dynamic aperture of the machine. This method, however, will not directly predict the lifetimes of beams. We have developed a code which can, by parallelization and careful speed optimization, predict lifetimes in the presence of 100 parasitic beam-beam crossings by tracking > 10{sup 10} particles-turns. An application of this code to the anti-proton lifetime in the Tevatron at injection is discussed.

Kabel, A.C.; Cai, Y.; Erdelyi, B.; Sen, T.; Xiao, M.; /SLAC /Fermilab

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

The chopping of helicopter main rotor tip vortices by the tail rotor was experimentally investigated. This is a problem of bladevortexinteraction (BVI) at normal incidence where the vortex is generally parallel to the rotor axis. The experiment used a model rotor and an isolated vortex and was designed to isolate BVI noise from other types of rotor noise. Tip Mach number, radical BVI station, and free stream velocity were varied. Fluctuating blade pressures, farfield sound pressure level and directivity, velocity field of the incident vortex, and bladevortexinteraction angles were measured. Bladevortexinteraction was found to produce impulsive noise which radiates primarily ahead of the blade. For interaction away from the blade tip, the results demonstrate the dipole character of BVI radiation. For BVI close to the tip, three dimensional relief effect reduces the intensity of the interaction, despite larger BVI angle and higher local Mach number. Furthermore, in this case, the radiation patern is more complex due to diffraction at and pressure communication around the tip.

Content prepared for the Supercomputing 2002 session on "Using Clustering Technologies in the Classroom". Contains a series of exercises for teaching parallel computing concepts through kinesthetic activities.

An introduction to optimisation techniques that may improve parallel performance and scaling on HECToR. It assumes that the reader has some experience of parallel programming including basic MPI and OpenMP. Scaling is a measurement of the ability for a parallel code to use increasing numbers of cores efficiently. A scalable application is one that, when the number of processors is increased, performs better by a factor which justifies the additional resource employed. Making a parallel application scale to many thousands of processes requires not only careful attention to the communication, data and work distribution but also to the choice of the algorithms to use. Since the choice of algorithm is too broad a subject and very particular to application domain to include in this brief guide we concentrate on general good practices towards parallel optimisation on HECToR.

Magnetotunneling between two-dimensional GaAs/InAs electron systems in vertical resonant tunneling GaAs/InAs/AlAs heterostructures is studied. A new-type of singularity in the tunneling density of states, specifically a dip at the Fermi level, is found; this feature is drastically different from that observed previously for the case of tunneling between two-dimensional GaAs tunnel systems in terms of both the kind of functional dependence and the energy and temperature parameters. As before, this effect manifests itself in the suppression of resonant tunneling in a narrow range near zero bias voltage in a high magnetic field parallel to the current direction. Magnetic-field and temperature dependences of the effect's parameters are obtained; these dependences are compared with available theoretical and experimental data. The observed effect can be caused by a high degree of disorder in two-dimensional correlated electron systems as a result of the introduction of structurally imperfect strained InAs layers.

Khanin, Yu. N.; Vdovin, E. E., E-mail: vdov62@yandex.ru [Russian Academy of Sciences, Institute of Microelectronics Technology and High Purity Materials (Russian Federation); Makarovsky, O. [University of Nottingham, School of Physics and Astronomy (United Kingdom)] [University of Nottingham, School of Physics and Astronomy (United Kingdom); Henini, M. [University of Nottingham, School of Physics and Astronomy, Nottingham Nanotechnology and Nanoscience Center (United Kingdom)] [University of Nottingham, School of Physics and Astronomy, Nottingham Nanotechnology and Nanoscience Center (United Kingdom)

For the first time, a 2D electromagnetic and relativistic semi-Lagrangian Vlasov model for a multi-computer environment was developed to study the laser–plasma interaction in an open system. Numerical simulations are presented for situations relevant to the penetration of an ultra-intense laser pulse inside a moderately overdense plasma and the relativistic filamentation instability in the case of an underdense plasma. The

A method is presented which can predict the hydroelastic response of a finite compliant panel to an unsteady potential flow. A specially adapted boundary-integral technique is used to determine the unsteady hydrodynamic forces. This is coupled to a finite-difference formulation of the wall mechanics, The fully interactive wall\\/flow numerical model serves only to mimic the physics of the system; in

Background With the advent of high throughput genomics and high-resolution imaging techniques, there is a growing necessity in biology and medicine for parallel computing, and with the low cost of computing, it is now cost-effective for even small labs or individuals to build their own personal computation cluster. Methods Here we briefly describe how to use commodity hardware to build a low-cost, high-performance compute cluster, and provide an in-depth example and sample code for parallel execution of R jobs using MOSIX, a mature extension of the Linux kernel for parallel computing. A similar process can be used with other cluster platform software. Results As a statistical genetics example, we use our cluster to run a simulated eQTL experiment. Because eQTL is computationally intensive, and is conceptually easy to parallelize, like many statistics/genetics applications, parallel execution with MOSIX gives a linear speedup in analysis time with little additional effort. Conclusions We have used MOSIX to run a wide variety of software programs in parallel with good results. The limitations and benefits of using MOSIX are discussed and compared to other platforms.

This book presents a framework for understanding the tradeoffs between the conventional view and the dataflow view with the objective of discovering the critical hardware structures which must be present in any scalable, general-purpose parallel computer to effectively tolerate latency and synchronization costs. The author presents an approach to scalable general purpose parallel computation. Linguistic Concerns, Compiling Issues, Intermediate Language Issues, and hardware/technological constraints are presented as a combined approach to architectural Develoement. This book presents the notion of a parallel machine language.

As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data and communicate the understanding to stakeholders. Recent developments in web technologies make it easy to manage, visualize and share large data sets with general public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and modify the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires developing new data models and intelligent knowledge discovery techniques to explore and extract information from complex computational simulations or large data repositories. Scientific visualization will be an increasingly important component to build comprehensive environmental information platforms. This presentation provides an overview of the trends and challenges in the field of scientific visualization, and demonstrates information visualization and communication tools developed within the light of these challenges.

The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved.

Assuming that the multicore revolution plays out the way the microprocessor industry expects, it seems that within a decade most programming will involve parallelism at some level. One needs to ask how this affects the the way we teach computer science, or even how we have people think about computation. With regards to teaching there seem to be three basic

In this interactive learning activity, students will learn more about series-parallel circuits. They will measure and calculate the resistance of series-parallel circuits and answer several questions about the example circuit shown.

Acoustic measurements were obtained in a Langley 14 x 22 foot Subsonic Wind Tunnel to study the aeroacoustic interaction of 1/5th scale main rotor, tail rotor, and fuselage models. An extensive aeroacoustic data base was acquired for main rotor, tail rotor, fuselage aerodynamic interaction for moderate forward speed flight conditions. The details of the rotor models, experimental design and procedure, aerodynamic and acoustic data acquisition and reduction are presented. The model was initially operated in trim for selected fuselage angle of attack, main rotor tip-path-plane angle, and main rotor thrust combinations. The effects of repositioning the tail rotor in the main rotor wake and the corresponding tail rotor countertorque requirements were determined. Each rotor was subsequently tested in isolation at the thrust and angle of attack combinations for trim. The acoustic data indicated that the noise was primarily dominated by the main rotor, especially for moderate speed main rotor blade-vortexinteraction conditions. The tail rotor noise increased when the main rotor was removed indicating that tail rotor inflow was improved with the main rotor present.

The wheat tan spot fungus (Pyrenophora tritici-repentis) produces a well-characterized host-selective toxin (HST) known as Ptr ToxA, which induces necrosis in genotypes that harbor the Tsn1 gene on chromosome 5B. In previous work, we showed that the Stagonospora nodorum isolate Sn2000 produces at least 2 HSTs (SnTox1 and SnToxA). Sensitivity to SnTox1 is governed by the Snn1 gene on chromosome 1B in wheat. SnToxA is encoded by a gene with a high degree of similarity to the Ptr ToxA gene. Here, we evaluate toxin sensitivity and resistance to S. nodorum blotch (SNB) caused by Sn2000 in a recombinant inbred population that does not segregate for Snn1. Sensitivity to the Sn2000 toxin preparation cosegregated with sensitivity to Ptr ToxA at the Tsn1 locus. Tsn1-disrupted mutants were insensitive to both Ptr ToxA and SnToxA, suggesting that the 2 toxins are functionally similar, because they recognize the same locus in the host to induce necrosis. The locus harboring the tsn1 allele underlies a major quantitative trait locus (QTL) for resistance to SNB caused by Sn2000, and explains 62% of the phenotypic variation, indicating that the toxin is an important virulence factor for this fungus. The Tsn1 locus and several minor QTLs together explained 77% of the phenotypic variation. Therefore, the Tsn1-ToxA interaction in the wheat-S. nodorum pathosystem parallels that of the wheat-tan spot system, and the wheat Tsn1 gene serves as a major determinant for susceptibility to both SNB and tan spot. PMID:17213908

Liu, Zhaohui; Friesen, Timothy L; Ling, Hua; Meinhardt, Steven W; Oliver, Richard P; Rasmussen, Jack B; Faris, Justin D

The three major research areas have been parallel structuring of computations, basic software for support of parallel computations and parallel architectures and supporting hardware. The work on parallel structuring of computations falls into three catego...

Several tutorials on parallel computing. Overview of parallel computing. Porting and code parallelization. Scalar, cache, and parallel code tuning. Timing, profiling and performance analysis. Overview of IBM Regatta P690.

We develop a 3D simulation code for interaction between the proto-planetary disk and embedded proto-planets. The protoplanetary disk is treated as a three-dimensional (3D), self-gravitating gas whose motion is described by the locally isothermal Navier-Stokes equations in a spherical coordinate centered on the star. The differential equations for the disk are similar to those given in Kley et al. (2009) with a different gravitational potential that is defined in Nelson et al. (2000). The equations are solved by directional split Godunov method for the inviscid Euler equations plus operator-split method for the viscous source terms. We use a sub-cycling technique for the azimuthal sweep to alleviate the time step restriction. We also extend the FARGO scheme of Masset (2000) and modified in Li et al. (2001) to our 3D code to accelerate the transport in the azimuthal direction. Furthermore, we have implemented a reduced 2D (r, {theta}) and a fully 3D self-gravity solver on our uniform disk grid, which extends our 2D method (Li, Buoni, & Li 2008) to 3D. This solver uses a mode cut-off strategy and combines FFT in the azimuthal direction and direct summation in the radial and meridional direction. An initial axis-symmetric equilibrium disk is generated via iteration between the disk density profile and the 2D disk-self-gravity. We do not need any softening in the disk self-gravity calculation as we have used a shifted grid method (Li et al. 2008) to calculate the potential. The motion of the planet is limited on the mid-plane and the equations are the same as given in D'Angelo et al. (2005), which we adapted to the polar coordinates with a fourth-order Runge-Kutta solver. The disk gravitational force on the planet is assumed to evolve linearly with time between two hydrodynamics time steps. The Planetary potential acting on the disk is calculated accurately with a small softening given by a cubic-spline form (Kley et al. 2009). Since the torque is extremely sensitive to the position of the planet, we adopt the corotating frame that allows the planet moving only in radial direction if only one planet is present. This code has been extensively tested on a number of problems. For the earthmass planet with constant aspect ratio h = 0.05, the torque calculated using our code matches quite well with the the 3D linear theory results by Tanaka et al. (2002). The code is fully parallelized via message-passing interface (MPI) and has very high parallel efficiency. Several numerical examples for both fixed planet and moving planet are provided to demonstrate the efficacy of the numerical method and code.

Li, Shengtai [Los Alamos National Laboratory; Li, Hui [Los Alamos National Laboratory

The objective of this research is to develop an efficient and accurate methodology to resolve flow non-linearity of fluid-structural interaction. To achieve this purpose, a numerical strategy to apply the detached-eddy simulation (DES) with a fully coupled fluid-structural interaction model is established for the first time. The following novel numerical algorithms are also created: a general sub-domain boundary mapping procedure for parallel computation to reduce wall clock simulation time, an efficient and low diffusion E-CUSP (LDE) scheme used as a Riemann solver to resolve discontinuities with minimal numerical dissipation, and an implicit high order accuracy weighted essentially non-oscillatory (WENO) scheme to capture shock waves. The Detached-Eddy Simulation is based on the model proposed by Spalart in 1997. Near solid walls within wall boundary layers, the Reynolds averaged Navier-Stokes (RANS) equations are solved. Outside of the wall boundary layers, the 3D filtered compressible Navier-Stokes equations are solved based on large eddy simulation(LES). The Spalart-Allmaras one equation turbulence model is solved to provide the Reynolds stresses in the RANS region and the subgrid scale stresses in the LES region. An improved 5th order finite differencing weighted essentially non-oscillatory (WENO) scheme with an optimized epsilon value is employed for the inviscid fluxes. The new LDE scheme used with the WENO scheme is able to capture crisp shock profiles and exact contact surfaces. A set of fully conservative 4th order finite central differencing schemes are used for the viscous terms. The 3D Navier-Stokes equations are discretized based on a conservative finite differencing scheme. The unfactored line Gauss-Seidel relaxation iteration is employed for time marching. A general sub-domain boundary mapping procedure is developed for arbitrary topology multi-block structured grids with grid points matched on sub-domain boundaries. Extensive numerical experiments are conducted to test the performance of the numerical algorithms. The RANS simulation with the Spalart-Allmaras one equation turbulence model is the foundation for DES and is hence validated with other transonic flows. The predicted results agree very well with the experiments. The RANS code is then further used to study the slot size effect of a co-flow jet (CFJ) airfoil. The DES solver with fully coupled fluid-structural interaction methodology is validated with vortex induced vibration of a cylinder and a transonic forced pitching airfoil. For the cylinder, the laminar Navier-Stokes equations are solved due to the low Reynolds number. The 3D effects are observed in both stationary and oscillating cylinder simulation because of the flow separations behind the cylinder. For the transonic forced pitching airfoil DES computation, there is no flow separation in the flow field. The DES results agree well with the RANS results. These two cases indicate that the DES is more effective on predicting flow separation. The DES code is used to simulate the limited cycle oscillation of NLR7301 airfoil. For the cases computed in this research, the predicted LCO frequency, amplitudes, averaged lift and moment, all agree excellently with the experiment. The solutions appear to have bifurcation and are dependent on the initial perturbation. The developed methodology is able to capture the LCO with very small amplitudes measured in the experiment. This is attributed to the high order low diffusion schemes, fully coupled FSI model, and the turbulence model used. This research appears to be the first time that a numerical simulation of LCO matches the experiment. The DES code is also used to simulate the CFJ airfoil jet mixing at high angle of attack. In conclusion, the numerical strategy of the high order DES with fully coupled FSI model and parallel computing developed in this research is demonstrated to have high accuracy, robustness, and efficiency. Future work to further maturate the methodology is suggested. (Abstract shortened by UMI.)

The lowest frequency parallel fundamental band nu5 of CH3SiH3 near 700 cm-1 has been measured at a resolution of 0.004 cm-1 with Fourier transform spectroscopy to investigate vibration-torsion-rotation interactions in symmetric tops. The torsional splittings in the spectrum are increased from ~0.005 cm-1 to ~1 cm-1 by Fermi-type vibration-torsion interactions between the torsional stack (v6=0,1,2,...) in the ground vibrational state

Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.

As a rotor blade moves through the air, it sheds vortices. These vortices shed along the length of the blade over time form the wake. The strongest vortices of the wake are those trailing from the tip of the blade. When a rotating blade system moves under certain operating conditions, each blade will impinge on the tip vortices shed by itself or other blades. This impingement is called a blade-vortexinteraction, or BVI. Although the blade and trailing tip vortices interact with many different orientations, one of the two extremes, either parallel or perpendicular interaction, is usually modelled. In a perpendicular interaction, the portion of the blade that is actually interacting with the travelling vortex at any given time is very small. A parallelinteraction, however, has the largest concurrent interaction with the blade, as a result this case is given the most attention. One of the most commonly studied occurrences of blade-vortexinteractions is associated with low-speed descending rotorcraft flight. BVI occur when the tip vortices shed by the blades intersect the plane of the rotor. BVI cause local pressure changes over the blades which are responsible, in part, for the acoustic signature of the rotorcraft. The local pressure changes also cause vibrations which lead to fatigue of both the blades and the mechanical components driving the blades.

Parallel programming tools are limited, making effective parallel programming difficult and cumbersome. Compilers that translate conventional sequential programs into parallel form would liberate programmers from the complexities of explicit, machine oriented parallel programming. The paper discusses parallel programming with Polaris, an experimental translator of conventional Fortran programs that target machines such as the Cray T3D

William Blume; Ramon Doallo; Rudolf Eigenmann; John Grout; Jay Hoeflinger; Thomas Lawrence; Jaejin Lee; David A. Padua; Yunheung Paek; William M. Pottenger; Lawrence Rauchwerger; Peng Tu

Recently, He and Yesha gave an algorithm for recognizing directed series parallel graphs, in time O(log2n) with linearly many EREW processors. We give a new algorithm for this problem, based on a structural characterization of series parallel graphs in terms of their ear decompositions. Our algorithm can recognize undirected as well as directed series parallel graphs. It can be implemented

This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

Mini-tuft and smoke flow visualization techniques have been developed for the investigation of model helicopter rotor bladevortexinteraction noise at low tip speeds. These techniques allow the parameters required for calculation of the bladevortexinteraction noise using the Widnall/Wolf model to be determined. The measured acoustics are compared with the predicted acoustics for each test condition. Under the conditions tested it is determined that the dominating acoustic pulse results from the interaction of the blade with a vortex 1-1/4 revolutions old at an interaction angle of less than 8 deg. The Widnall/Wolf model predicts the peak sound pressure level within 3 dB for bladevortex separation distances greater than 1 semichord, but it generally over predicts the peak S.P.L. by over 10 dB for bladevortex separation distances of less than 1/4 semichord.

This paper investigates the methods used for rotor rotational noise, impulsive noise from blade/vortexinteraction, high speed noise, rotor broadband noise, the various types of fenestron noise, and noise from the turboshaft engines. From the helicopter m...

In the present paper, we report the detection of mutations implicated in human cystic fibrosis (CF). Nine different oligonucleotides are studied, including three possible mutations related to this specific genetic disease: a deletion of three bases, ?F508, and two single-nucleotide polymorphisms 1540A\\/G and 1716G\\/A. We monitor, in real time and in parallel, hybridizations of a solution of unlabeled oligonucleotide targets

Nathalie Bassil; Emmanuel Maillart; Michael Canva; Yves Lévy; Marie-Claude Millot; Serge Pissard; Rémy Narwa; Michel Goossens

Parallel mesh generation is a relatively new research area between the boundaries of two scientific computing disciplines:\\u000a computational geometry and parallel computing. In this chapter we present a survey of parallel unstructured mesh generation\\u000a methods. Parallel mesh generation methods decompose the original mesh generation problem into smaller sub-problems which are\\u000a meshed in parallel. We organize the parallel mesh generation methods

Calculating the dipolar fields in LLG simulations account for the majority of the runtime as these computations scale with system size faster than the other interactions. Efficient parallelization of the dipolar field is essential for a parallel LLG solution. This article presents a method to dynamically generate a parallel strategy and examines the benefits in speedup over standard parallelization.

The availability of commodity multicore and multiprocessor machines and the inherent parallelism in constraint programming search offer significant opportunities for constraint programming. Both constraint-based local search and finite-domain techniques can dramatically benefit from parallelization. Yet, currently available libraries and languages offer very limited support to exploit the inherent parallelism and the high human cost incurred to develop parallel solutions confine

Many sequential applications are difficult to parallelize because of unpredictable control flow, indirect data access, and input- dependent parallelism. These difficulties led us to build a software system for behavior oriented parallelization (BOP), which allows a program to be parallelized based on partial information about pro- gram behavior, for example, a user reading just part of the source code, or

Chen Ding; Xipeng Shen; Kirk Kelsey; Chris Tice; Ruke Huang; Chengliang Zhang

The Parallel Mandelbrot Set Model is a parallelization of the sequential MandelbrotSet model, which does all the computations on a single processor core. This parallelization is able to use a computer with more than one cores (or processors) to carry out the same computation, thus speeding up the process. The parallelization is done using the model elements in the Parallel Java group. These model elements allow easy use of the Parallel Java library created by Alan Kaminsky. In particular, the parallelization used for this model is based on code in Chapters 11 and 12 of Kaminsky's book Building Parallel Java. The Parallel Mandelbrot Set Model was developed using the Easy Java Simulations (EJS) modeling tool. It is distributed as a ready-to-run (compiled) Java archive. Double click the ejs_chaos_ParallelMandelbrotSet.jar file to run the program if Java is installed.

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.

The extracellular matrix is constructed beyond the plasma membrane, challenging mechanisms for its control by the cell. In plants, the cell wall is highly ordered, with cellulose microfibrils aligned coherently over a scale spanning hundreds of cells. To a considerable extent, deploying aligned microfibrils determines mechanical properties of the cell wall, including strength and compliance. Cellulose microfibrils have long been seen to be aligned in parallel with an array of microtubules in the cell cortex. How do these cortical microtubules affect the cellulose synthase complex? This question has stood for as many years as the parallelism between the elements has been observed, but now an answer is emerging. Here, we review recent work establishing that the link between microtubules and microfibrils is mediated by a protein named cellulose synthase-interacting protein 1 (CSI1). The protein binds both microtubules and components of the cellulose synthase complex. In the absence of CSI1, microfibrils are synthesized but their alignment becomes uncoupled from the microtubules, an effect that is phenocopied in the wild type by depolymerizing the microtubules. The characterization of CSI1 significantly enhances knowledge of how cellulose is aligned, a process that serves as a paradigmatic example of how cells dictate the construction of their extracellular environment.

Vascular endothelial growth factor (VEGF) proximal promoter region contains a poly G/C-rich element that is essential for basal and inducible VEGF expression. The guanine-rich strand on this tract has been shown to form the DNA G-quadruplex structure, whose stabilization by small molecules can suppress VEGF expression. We report here the nuclear magnetic resonance structure of the major intramolecular G-quadruplex formed in this region in K+ solution using the 22mer VEGF promoter sequence with G-to-T mutations of two loop residues. Our results have unambiguously demonstrated that the major G-quadruplex formed in the VEGF promoter in K+ solution is a parallel-stranded structure with a 1:4:1 loop-size arrangement. A unique capping structure was shown to form in this 1:4:1 G-quadruplex. Parallel-stranded G-quadruplexes are commonly found in the human promoter sequences. The nuclear magnetic resonance structure of the major VEGF G-quadruplex shows that the 4-nt middle loop plays a central role for the specific capping structures and in stabilizing the most favored folding pattern. It is thus suggested that each parallel G-quadruplex likely adopts unique capping and loop structures by the specific middle loops and flanking segments, which together determine the overall structure and specific recognition sites of small molecules or proteins. LAY SUMMARY: The human VEGF is a key regulator of angiogenesis and plays an important role in tumor survival, growth and metastasis. VEGF overexpression is frequently found in a wide range of human tumors; the VEGF pathway has become an attractive target for cancer therapeutics. DNA G-quadruplexes have been shown to form in the proximal promoter region of VEGF and are amenable to small molecule drug targeting for VEGF suppression. The detailed molecular structure of the major VEGF promoter G-quadruplex reported here will provide an important basis for structure-based rational development of small molecule drugs targeting the VEGF G-quadruplex for gene suppression.

The paper describes a very simple scheme for rendering a class of parallel Orca programs fault-tolerant. It also discusses experiences with implementing this scheme on Amoeba. The approach works for parallel applications that are not interactive. The appr...

M. F. Kaashoek R. Michiels H. E. Bal A. S. Tanenbaum

The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.

Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (editor); Liu, S. C. (editor); Li, J. C. (editor)

Time requirements for the solving of complex large-scale engineering problems can be substantially reduced by using parallel computation. Motivated by a computationally demanding biomechanical system identification problem, we introduce a parallel impleme...

A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

A comparison of recently proposed parallel text search methods to alternative available search strategies that use serial processing machines suggests parallel methods do not provide large-scale gains in either retrieval effectiveness or efficiency.

This study investigates the practice of presenting multiple supporting examples in parallel form. The elements of parallelism and its use in argument were first illustrated by Aristotle. Although real texts may depart from the ideal form for presenting multiple examples, rhetorical theory offers a rationale for minimal, parallel presentation. The…

CFD or Computational Fluid Dynamics is one of the scientific disciplines that has always posed new challenges to the capabilities of the modern, ultra-fast supercomputers, and now to the even faster parallel computers. For applications where number crunching is of primary importance, there is perhaps no escaping parallel computers since sequential computers can only be (as projected) as fast as a few gigaflops and no more, unless, of course, some altogether new technology appears in future. For parallel computers, on the other hand, there is no such limit since any number of processors can be made to work in parallel. Computationally demanding CFD codes and parallel computers are therefore soul-mates, and will remain so for all foreseeable future. So much so that there is a separate and fast-emerging discipline that tackles problems specific to CFD as applied to parallel computers. For some years now, there is an international conference on parallel CFD. So, one can indeed say that parallel CFD has arrived. To understand how CFD codes are parallelized, one must understand a little about how parallel computers function. Therefore, in what follows we will first deal with parallel computers, how a typical CFD code (if there is one such) looks like, and then the strategies of parallelization.

The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as th...

The ParaScope parallel programming environment, developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by providing a variety of interactive program transformations that are effective in exposing parallelism. The debugging system detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis, program instrumentation, and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in FORTRAN D, a machine-independent parallel programming language intended for use with both distributed-memory and shared-memory parallel computers.

Cooper, Keith D.; Hall, Mary W.; Hood, Robert T.; Kennedy, Ken; Mckinley, Kathryn S.; Mellor-Crummey, John M.; Torczon, Linda; Warren, Scott K.

We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

Rapid changes in parallel computing technology are causing significant changes in the strategies being used for parallel algorithm development. One approach is simply to write computer code in a standard language like FORTRAN 77 or with the expectation that the compiler will produce executable code that will run in parallel. The alternatives are: (1) to build explicit message passing directly into the source code; or (2) to write source code without explicit reference to message passing or parallelism, but use a general communications library to provide efficient parallel execution. Application of these strategies is illustrated with examples of codes currently under development.

Aims Success of the quantitative prediction of drug–drug interactions via inhibition of CYP-mediated metabolism from the inhibitor concentration at the enzyme active site ([I]) and the in vitro inhibition constant (Ki) is variable. The aim of this study was to examine the impact of the fraction of victim drug metabolized by a particular CYP (fmCYP) and the inhibitor absorption rate constant (ka) on prediction accuracy. Methods Drug–drug interaction studies involving inhibition of CYP2C9, CYP2D6 and CYP3A4 (n = 115) were investigated. Data on fmCYP for the probe substrates of each enzyme and ka values for the inhibitors were incorporated into in vivo predictions, alone or in combination, using either the maximum hepatic input or the average systemic plasma concentration as a surrogate for [I]. The success of prediction (AUC ratio predicted within twofold of in vivo value) was compared using nominal values of fmCYP = 1 and ka = 0.1 min?1. Results The incorporation of fmCYP values into in vivo predictions using the hepatic input plasma concentration resulted in 84% of studies within twofold of in vivo value. The effect of ka values alone significantly reduced the number of over-predictions for CYP2D6 and CYP3A4; however, less precision was observed compared with the fmCYP. The incorporation of both fmCYP and ka values resulted in 81% of studies within twofold of in vivo value. Conclusions The incorporation of substrate and inhibitor-related information, namely fmCYP and ka, markedly improved prediction of 115 interaction studies with CYP2C9, CYP2D6 and CYP3A4 in comparison with [I]/Ki ratio alone.

Brown, Hayley S; Ito, Kiyomi; Galetin, Aleksandra; Houston, J Brian

Today's supercomputers and parallel computers provide an unprecedented amount of computational power in one machine. A basic understanding of the parallel computing techniques that assist in the capture and utilization of that computational power is essential to appreciate the capabilities and the limitations of parallel supercomputers. In addition, an understanding of technical vocabulary is critical in order to converse about parallel computers. The relevant techniques, vocabulary, currently available hardware architectures, and programming languages which provide the basic concepts of parallel computing are introduced in this document. This document updates the document entitled Introduction to Parallel Supercomputing, M88-42, October 1988. It includes a new section on languages for parallel computers, updates the hardware related sections, and includes current references.

The physically based simulation of clothes in virtual environments is a highly demanding problem. It involves both modeling the internal material properties of the textile and the interaction with the surrounding scene. We present a parallel cloth simulation approach designed for distributed memory parallel architectures, in particular clusters built of commodity components. In this paper, we focus on the parallelization

Set values for the initial position, velocity, and mass of the two particles, and click on the button "Initialize Animation" to play the animation using your specified values. Note, if m or v are too large, the particles may actually pass through one another which will seem a little strange. Note: the interaction between the particles is a "non-contact" interaction, much like the electrostatic force on two charges. Mathematically, it is actually a Hooke's law interaction.

This work presents two software components aimed to relieve the costs of accessing high-performance parallel computing resources within a Python programming environment: MPI for Python and PETSc for Python. MPI for Python is a general-purpose Python package that provides bindings for the Message Passing Interface (MPI) standard using any back-end MPI implementation. Its facilities allow parallel Python programs to easily exploit multiple processors using the message passing paradigm. PETSc for Python provides access to the Portable, Extensible Toolkit for Scientific Computation (PETSc) libraries. Its facilities allow sequential and parallel Python applications to exploit state of the art algorithms and data structures readily available in PETSc for the solution of large-scale problems in science and engineering. MPI for Python and PETSc for Python are fully integrated to PETSc-FEM, an MPI and PETSc based parallel, multiphysics, finite elements code developed at CIMEC laboratory. This software infrastructure supports research activities related to simulation of fluid flows with applications ranging from the design of microfluidic devices for biochemical analysis to modeling of large-scale stream/aquifer interactions.

This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.

Liebrock, Lorie M. (New Mexico Tech, Socorro, NM); Duggan, David Patrick

Contents: The Nature of Parallel Programming; Applications of Parallel Supercomputers: Scientific Results and Computer Science Lessons; Towards General-Purpose Parallel Computers; Cooperative Computation in Brains and Computers; Parallel Systems in the Ce...

The problem of writing software for multicore processors is greatly simplified if we could automatically parallelize sequential programs. Although auto-parallelization has been studied for many decades, it has succeeded only in a few application areas such as dense matrix computations. In particular, auto-parallelization of irregular programs, which are organized around large, pointer-based data struc- tures like graphs, has seemed intractable.

Milind Kulkarni; Keshav Pingali; Bruce Walter; Ganesh Ramanarayanan; Kavita Bala; L. Paul Chew

A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.

We present a parallel algorithm based on open ear decomposition to con- struct an embedding of a graph onto the plane or report that the graph is nonpla- nar. Our parallel algorithm runs on a CRCW PRAM in logarithmic time with a number of processors bounded by that needed for finding connected components in a graph and for performing bucket

One of the most pressing technological challenges in the development of next generation nanoscale devices is the rapid, parallel, precise and robust fabrication of nanostructures. Here, we demonstrate the possibility to parallelize thermochemical nanolithography (TCNL) by employing five nano-tips for the fabrication of conjugated polymer nanostructures and graphene-based nanoribbons. PMID:24337109

Carroll, Keith M; Lu, Xi; Kim, Suenne; Gao, Yang; Kim, Hoe-Joon; Somnath, Suhas; Polloni, Laura; Sordan, Roman; King, William P; Curtis, Jennifer E; Riedo, Elisa

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of five parallel kernels and three simulated application benchmarks. Together theymimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications.The principal distinguishing feature of these benchmarks is their penciland paper specification---all details of these benchmarks are

D. Bailey; E. Barszcz; J. Barton; D. Browning; R. Carter; L. Dagum

The creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the graphics pipeline. In rasterization, texture caching has become prevalent for reduc- ing texture bandwidth requirements. However, parallel

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

\\u000a The Programming Language Research Group at Sun Microsystems Laboratories seeks to apply lessons learned from the Java (TM)\\u000a Programming Language to the next generation of programming languages. The Java language supports platform-independent parallel\\u000a programming with explicit multithreading and explicit locks. As part of the DARPA program for High Productivity Computing\\u000a Systems, we are developing Fortress, a language intended to support

A compositional parallel program is a program constructed by composing component programs in parallel, where the composed program inherits properties of its components. In this paper, we describe a small extension of C++ called Compositional C++ or CC++ which is an object-oriented notation that supports compositional parallel programming. CC++ integrates different paradigms of parallel programming: data-parallel, task-parallel and object-parallel paradigms;

We are just starting to parallelize the nearest neighbor portion of our free-Lagrange code. Our implementation of the nearest neighbor reconnection algorithm has not been parallelizable (i.e., we just flip one connection at a time). In this paper we consider what sort of nearest neighbor algorithms lend themselves to being parallelized. For example, the construction of the Voronoi mesh can be parallelized, but the construction of the Delaunay mesh (dual to the Voronoi mesh) cannot because of degenerate connections. We will show our most recent attempt to tessellate space with triangles or tetrahedrons with a new nearest neighbor construction algorithm called DAM (Dial-A-Mesh). This method has the characteristics of a parallel algorithm and produces a better tessellation of space than the Delaunay mesh. Parallel processing is becoming an everyday reality for us at Los Alamos. Our current production machines are Cray YMPs with 8 processors that can run independently or combined to work on one job. We are also exploring massive parallelism through the use of two 64K processor Connection Machines (CM2), where all the processors run in lock step mode. The effective application of 3-D computer models requires the use of parallel processing to achieve reasonable "turn around" times for our calculations.

We investigate several aspects of the numerical solution of the radiative transfer equation in the context of coal combustion: the parallel efficiency of two commonly-used opacity models, the sensitivity of turbulent radiation interaction (TRI) effects to the presence of coal particulate, and an improvement of the order of temporal convergence using the coarse mesh finite difference (CMFD) method. There are four opacity models commonly employed to evaluate the radiative transfer equation in combustion applications; line-by-line (LBL), multigroup, band, and global. Most of these models have been rigorously evaluated for serial computations of a spectrum of problem types [1]. Studies of these models for parallel computations [2] are limited. We assessed the performance of the Spectral-Line-Based weighted sum of gray gasses (SLW) model, a global method related to K-distribution methods [1], and the LBL model. The LBL model directly interpolates opacity information from large data tables. The LBL model outperforms the SLW model in almost all cases, as suggested by Wang et al. [3]. The SLW model, however, shows superior parallel scaling performance and a decreased sensitivity to load imbalancing, suggesting that for some problems, global methods such as the SLW model, could outperform the LBL model. Turbulent radiation interaction (TRI) effects are associated with the differences in the time scales of the fluid dynamic equations and the radiative transfer equations. Solving on the fluid dynamic time step size produces large changes in the radiation field over the time step. We have modified the statistically homogeneous, non-premixed flame problem of Deshmukh et al. [4] to include coal-type particulate. The addition of low mass loadings of particulate minimally impacts the TRI effects. Observed differences in the TRI effects from variations in the packing fractions and Stokes numbers are difficult to analyze because of the significant effect of variations in problem initialization. The TRI effects are very sensitive to the initialization of the turbulence in the system. The TRI parameters are somewhat sensitive to the treatment of particulate temperature and the particulate optical thickness, and this effect are amplified by increased particulate loading. Monte Carlo radiative heat transfer simulations of time-dependent combustion processes generally involve an explicit evaluation of emission source because of the expense of the transport solver. Recently, Park et al. [5] have applied quasi-diffusion with Monte Carlo in high energy density radiative transfer applications. We employ a Crank-Nicholson temporal integration scheme in conjunction with the coarse mesh finite difference (CMFD) method, in an effort to improve the temporal accuracy of the Monte Carlo solver. Our results show that this CMFD-CN method is an improvement over Monte Carlo with CMFD time-differenced via Backward Euler, and Implicit Monte Carlo [6] (IMC). The increase in accuracy involves very little increase in computational cost, and the figure of merit for the CMFD-CN scheme is greater than IMC.

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

Bailey, David (editor); Barton, John (editor); Lasinski, Thomas (editor); Simon, Horst (editor)

The methyl-cytosine binding domain 2 (MBD2)-nucleosome remodeling and deacetylase (NuRD) complex recognizes methylated DNA and silences expression of associated genes through histone deacetylase and nucleosome remodeling functions. Our previous structural work demonstrated that a coiled-coil interaction between MBD2 and GATA zinc finger domain containing 2A (GATAD2A/p66?) proteins recruits the chromodomain helicase DNA-binding protein (CHD4/Mi2?) to the NuRD complex and is necessary for MBD2-mediated DNA methylation-dependent gene silencing in vivo (Gnanapragasam, M. N., Scarsdale, J. N., Amaya, M. L., Webb, H. D., Desai, M. A., Walavalkar, N. M., Wang, S. Z., Zu Zhu, S., Ginder, G. D., and Williams, D. C., Jr. (2011) p66?-MBD2 coiled-coil interaction and recruitment of Mi-2 are critical for globin gene silencing by the MBD2-NuRD complex. Proc. Natl. Acad. Sci. U.S.A. 108, 7487-7492). The p66?-MBD2 interaction differs from most coiled-coils studied to date by forming an anti-parallel heterodimeric complex between two peptides that are largely monomeric in isolation. To further characterize unique features of this complex that drive heterodimeric specificity and high affinity binding, we carried out biophysical analyses of MBD2 and the related homologues MBD3, MBD3-like protein 1 (MBD3L1), and MBD3-like protein 2 (MBD3L2) as well as specific mutations that modify charge-charge interactions and helical propensity of the coiled-coil domains. Analytical ultracentrifugation analyses show that the individual peptides remain monomeric in isolation even at 300 ?M in concentration for MBD2. Circular dichroism analyses demonstrate a direct correlation between helical content of the coiled-coil domains in isolation and binding affinity for p66?. Furthermore, complementary electrostatic surface potentials and inherent helical content of each peptide are necessary to maintain high-affinity association. These factors lead to a binding affinity hierarchy of p66? for the different MBD2 homologues (MBD2 ? MBD3 > MBD3L1 ? MBD3L2) and suggest a hierarchical regulatory model in tissue and life cycle stage-specific silencing by NuRD complexes. PMID:23239876

Walavalkar, Ninad M; Gordon, Nathaniel; Williams, David C

For the simulation of the flow through compressor stages, an interactive flow simulation system is set up on an MIMD-type parallel computer. An explicit scheme is used in order to resolve the time-dependent interaction between the blades. The 2D Navier-Stokes equations are transformed into their general moving coordinates. The parallelization of the solver is based on the idea of domain decomposition. Results are presented for a problem of fixed size (4096 grid nodes for the Hakkinen case).

This paper describes a family of parallel sorting algorithms for a multiprocessor system. These algorithms are enumeration sorts and comprise the following phases: count acquisition: the keys are subdivided into subsets and for each key the number of smal...

PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

Reviews the operation of parallel rc circuit and specifically points out how to solve for branch currents and total impedance by using ohm's law. Reviews vector representations and shows how approximate total current and phase angle are found by measuring...

PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, a set of tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory at info.mcs.anl.gov.

One of the most pressing technological challenges in the development of next generation nanoscale devices is the rapid, parallel, precise and robust fabrication of nanostructures. Here, we demonstrate the possibility to parallelize thermochemical nanolithography (TCNL) by employing five nano-tips for the fabrication of conjugated polymer nanostructures and graphene-based nanoribbons.One of the most pressing technological challenges in the development of next generation nanoscale devices is the rapid, parallel, precise and robust fabrication of nanostructures. Here, we demonstrate the possibility to parallelize thermochemical nanolithography (TCNL) by employing five nano-tips for the fabrication of conjugated polymer nanostructures and graphene-based nanoribbons. Electronic supplementary information (ESI) available: Details on the cantilevers array, on the sample preparation, and on the GO AFM experiments. See DOI: 10.1039/c3nr05696a

Carroll, Keith M.; Lu, Xi; Kim, Suenne; Gao, Yang; Kim, Hoe-Joon; Somnath, Suhas; Polloni, Laura; Sordan, Roman; King, William P.; Curtis, Jennifer E.; Riedo, Elisa

In this activity, learners demonstrate and discuss simple circuits as well as the differences between parallel and serial circuit design and functions. Learners test two different circuit designs through the use of low voltage light bulbs.

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.

Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

Assigning additional processors to a parallel application may slow it down or lead to poor computer utilization. This paper\\u000a demonstrates that it is possible for an application to automatically choose its own, optimal degree of parallelism. The technique\\u000a is based on a simple binary search procedure for finding the optimal number of processors, subject to one of the following\\u000a criteria:

An algorithm for direct two's-complement and sign-magnitude parallel multiplication is described. The partial product matrix representing the multiplication is converted to an equivalent matrix by encryption. Its reduction, producing the final result, needs no specialized adders and can be added with any parallel array addition technique. It contains no negative terms and no extra ''correction'' rows; in addition, it produces the multiplication with fewer than the minimal number of rows required for a direct multiplication process.

A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

A runtime system provides a parallel language compiler with an interface to the low-level facilities required to support interaction between concurrently executing program components. Nexus is a portable runtime system for task-parallel programming languages. Distinguishing features of Nexus include its support for multiple threads of control, dynamic processor acquisition, dynamic address space creation, a global memory model via interprocessor references, and asynchronous events. In addition, it supports heterogeneity at multiple levels, allowing a single computation to utilize different programming languages, executables, processors, and network protocols. Nexus is currently being used as a compiler target for two task-parallel languages: Fortran M and Compositional C++. In this paper, we present the Nexus design, outline techniques used to implement Nexus on parallel computers, show how it is used in compilers, and compare its performance with that of another runtime system.

Foster, I.; Tuecke, S. [Argonne National Lab., IL (United States); Kesselman, C. [Caltech, Pasadena, CA (United States). Beckman Institute

Exascale computing presents a challenge for the scientific community as new algorithms must be developed to take full advantage of the new computing paradigm. Atomistic simulation methods that offer full fidelity to the underlying potential, i.e., molecular dynamics (MD) and parallel replica dynamics, fail to use the whole machine speedup, leaving a region in time and sample size space that is unattainable with current algorithms. In this paper, we present an extension of the parallel replica dynamics algorithm [A. F. Voter, Phys. Rev. B 57, R13985 (1998), 10.1103/PhysRevB.57.R13985] by combining it with the synchronous sublattice approach of Shim and Amar [Y. Shim and J. G. Amar, Phys. Rev. B 71, 125432 (2005), 10.1103/PhysRevB.71.125432], thereby exploiting event locality to improve the algorithm scalability. This algorithm is based on a domain decomposition in which events happen independently in different regions in the sample. We develop an analytical expression for the speedup given by this sublattice parallel replica dynamics algorithm and compare it with parallel MD and traditional parallel replica dynamics. We demonstrate how this algorithm, which introduces a slight additional approximation of event locality, enables the study of physical systems unreachable with traditional methodologies and promises to better utilize the resources of current high performance and future exascale computers.

Martínez, Enrique; Uberuaga, Blas P.; Voter, Arthur F.

The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components. It is intended to operate on a collection of heterogeneous computing elements interconnected by one or more networks. The participating processors may be scalar machines, multiprocessors, or special-purpose computers, enabling application components to

Consider any known sequential algorithm for matrix multipli- cation over an arbitrary ring with time complexity ,w here . We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in time by using processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Further- more, our parallelization on a DMPC

We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin.

Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interactingparallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

\\u000aData parallelism is often seen as a form of explicit parallelism for SIMD and vector machines, and data parallel programming as an explicit programming paradigm for these architectures. Data parallel languages possess certain software qualities as well, which justifies their use in higher level programming and specification closer to the algorithm domain. Thus, it is interesting to study how the

Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemented, as well as, the type of methodology utilized by an algorithm. In this paper, we determine the parallel complexity of multiplying two (not necessarily square) matrices on parallel distributed-memory machines

The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore's law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with

John Nickolls; Ian Buck; Michael Garland; Kevin Skadron

. The Polaris project has delivered a new parallelizing compiler that overcomes severe limitations of current compilers. While available parallelizing compilers may succeed on small kernels, they often fail to extract any meaningful parallelism from large applications. In contrast, Polaris has proven to speed up real programs significantly beyond the degree achieved by the parallelization tools available on the SGI

William Blume; Rudolf Eigenmann; Keith Faigin; John Grout; Jay Hoeflinger; David Padua; Paul Petersen; William Pottenger; Lawrence Rauchwerger; Peng Tu; Stephen Weatherford

This module teaches the principals of Fourier spectral methods, their utility in solving partial differential equation and how to implement them in code. Performance considerations for several Fourier spectral implementations are discussed and methods for effective scaling on parallel computers are explained.

Chen, Gong; Cloutier, Brandon; Li, Ning; Muite, Benson; Rigge, Paul

We give a parallel implementation of merge sort on a CREW PRAM that uses n processors and O(logn) time; the constant in the running time is small. We also give a more complex version of the algorithm for the EREW PRAM; it also uses n processors and O(logn) time. The constant in the running time is still moderate, though not

Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

Use of parallel analysis (PA), a selection rule for the number-of-factors problem, is investigated from the viewpoint of permutation assessment through a Monte Carlo simulation. Results reveal advantages and limitations of PA. Tables of sample eigenvalues are included. (SLD)

This book focuses on numerical algorithms suited for parallelization for solving systems of equations and optimization problems. Emphasis on relaxation methods of the Jacobi and Gauss-Seidel type, and issues of communication and synchronization. Topics covered include: Algorithms for systems of linear equations and matrix inversion; Herative methods for nonlinear problems; and Shortest paths and dynamic programming.

——- ... .. 1. ABSTRACT The paper describes a number of distributed approaches to implementing a parallel vklbility a]g~rithm for Viewshed analysis. The problem can be simplified by considering a range of domain partitioning strategies for optimizing tie proc=sor worldoads. The best approaches are shown to work 22 times faster across a network of 24 processors. Such strategies allow traditional

J. Andrew Ware; David B. Kidner; Philip J. Rallings

The concept of the two-dimensional (2-D) parallel computer with square module arrays was first introduced by Unger. It is the purpose of this paper to discuss the relative merits of square and hexagonal module arrays, to propose an operational symbolism for the various basic hexagonal modular transformations which may be performed by these comupters, to illustrate some logical circuit implementation,

Summary form only given. High instruction execution rates may be achieved through a vorpal of inexpensive processors operating in parallel. The harnessing of this raw computing power to discrete event simulation applications is an active area of research. Three major approaches to the problem, of assigning computational tasks to processing elements may be identified: (1) model based assignment, (2) local

John C. Comfort; David Jefferson; Y. V. Reddy; Paul Reynolds; Sallie Sheppard

This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom- up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead

Frank K. H. A. Dehne; Todd Eavis; Susanne E. Hambrusch; Andrew Rau-chaplin

In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.

Anderson, R.E. (Lawrence Livermore National Lab., CA (USA))

The traveling salesman problem is a classic optimization problem in which one seeks to minimize the path taken by a salesman in traveling between N cities, where the salesman stops at each city one and only one time, never retracing his/her route. This implementation is designed to run on UNIX systems with X-Windows, and includes parallelization using MPI.

In task-parallel programs, diverse activities can take place concurrently, and communication and synchronization patterns are complex and not easily predictable. Previous work has identified compositionality as an important design principle for task-parallel programs. In this article, we discuss alternative approaches to the realization of this principle, which holds that properties of program components should be preserved when those components are composed in parallel with other program components. We review two programming languages, Strand and Program Composition Notation, that support compositionality via a small number of simple concepts, namely, monotone operations on shared objects, a uniform addressing mechanism, and parallel composition. Both languages have been used extensively for large-scale application development, allowing us to provide an informed assessment of both their strengths and their weaknesses. We observe that while compositionality simplifies development of complex applications, the use of specialized languages hinders reuse of existing code and tools and the specification of domain decomposition strategies. This suggests an alternative approach based on small extensions to existing sequential languages. We conclude the article with a discussion of two languages that realized this strategy.

A 5x3cm exp 2 (timing only) and a 15x5cm exp 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate ...

Multivalent interactions play a critical role in a variety of biological processes on both molecular and cellular levels. We have used molecular force spectroscopy to investigate the strength of multiple parallel peptide-antibody bonds using a system that allowed us to determine the rupture forces and the number of ruptured bonds independently. In our experiments the interacting molecules were attached to the surfaces of the probe and sample of the atomic force microscope with flexible polymer tethers, and unique mechanical signature of the tethers determined the number of ruptured bonds. We show that the rupture forces increase with the number of interacting molecules and that the measured forces obey the predictions of a Markovian model for the strength of multiple parallel bonds. We also discuss the implications of our results to the interpretation of force spectroscopy measurements in multiple bond systems.

This paper presents a useful scheme to implement parallelization in 3G network planning tools. We propose a solution to the problem of interactions between the mobiles and the base stations, especially because of interference and macro- diversity in 3G systems that make direct parallelization unfeasible. The proposed solution decomposes the network into a grid of zones, and sequentially allocates independent

High interaction rate colliders impose stringent requirements on data acquisition and triggering systems. These systems can only be realized by using asynchronous parallel networks of processors. The accurate prediction of the performance of these network...

A new neural network architecture is proposed and applied in classification of remote sensing/geographic data from multiple sources. The new architecture is called the parallel consensual neural network and its relation to hierarchical and ensemble neural networks is discussed. The parallel consensual neural network architecture is based on statistical consensus theory. The input data are transformed several times and the different transformed data are applied as if they were independent inputs and are classified using stage neural networks. Finally, the outputs from the stage networks are then weighted and combined to make a decision. Experimental results based on remote sensing data and geographic data are given. The performance of the consensual neural network architecture is compared to that of a two-layer (one hidden layer) conjugate-gradient backpropagation neural network. The results with the proposed neural network architecture compare favorably in terms of classification accuracy to the backpropagation method.

Benediktsson, J. A.; Sveinsson, J. R.; Ersoy, O. K.; Swain, P. H.

These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.

Future sensor systems will utilize massively parallel computing systems for rapid analysis of two-dimensional data. The Goddard Space Flight Center has an ongoing program to develop these systems. A single-instruction multiple data computer known as the Massively Parallel Processor (MPP) is being fabricated for NASA by the Goodyear Aerospace Corporation. This processor contains 16,384 processing elements arranged in a 128 x 128 array. The MPP will be capable of adding more than 6 billion 8-bit numbers per second. Multiplication of eight-bit numbers can occur at a rate of 2 billion per second. Delivery of the MPP to Goddard Space Flight Center is scheduled for 1983.

Consideration is given to a collisionless parallel shock based on solitary-type solutions of the modified derivative nonlinear Schroedinger equation (MDNLS) for parallel Alfven waves. The standard derivative nonlinear Schroedinger equation is generalized in order to include the possible anisotropy of the plasma distribution and higher-order Korteweg-de Vies-type dispersion. Stationary solutions of MDNLS are discussed. The anisotropic nature of 'adiabatic' reflections leads to the asymmetric particle distribution in the upstream as well as in the downstream regions of the shock. As a result, nonzero heat flux appears near the front of the shock. It is shown that this causes the stochastic behavior of the nonlinear waves, which can significantly contribute to the shock thermalization.

Khabibrakhmanov, I. KH.; Galeev, A. A.; Galinskii, V. L.

PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.

A parallel bucket-sort algorithm is presented that requires time O(log n) and the use of n processors. The algorithm makes use of a technique that requires more space than the product of processors and time. A realistic model is used in which no memory contention is permitted. A procedure is also presented to sort n numbers in time O(k log

Summary form only given. High instruction execution rates may be achieved through a vorpal of inexpensive processors operating in parallel. The harnessing of this raw computing power to discrete event simulation applications is an active area of research. Three major approaches to the problem, of assigning computational tasks to processing elements may be identified: (1) model based assignment, (2) local function based assignment, and (3) global function based assignment.

Cid is a parallel, “shared-memory” superset of C for distributed-memory machines. A major objective is to keep the entry cost low. For users-the language should be easily comprehensible to a C programmer. For implementors-it should run on standard hardware (including workstation farms); it should not require major new compilation techniques (which may not even be widely applicable); and it should

Abstract. This paper presents a general methodology,for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom- up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication,overhead by

Frank K. H. A. Dehne; Todd Eavis; Susanne E. Hambrusch; Andrew Rau-chaplin

Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N?), where 2parallelized on a distributed memory parallel computer (DMPC) in O(logN) time by using N?\\/logN processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC

Accurate simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete event simulation. This paper describes the use of COMPASS, a direct execution-driven, parallel simulator for performance prediction of programs that include both communication and I\\/O intensive applications. The simulator has been used to predict the performance of such applications on both distributed

Rajive Bagrodia; Ewa Deeljman; Steven Docy; Thomas Phan

Comprehensive numerical and experimental investigations of tip vortical characteristics were conducted for lateral tip jet flow over a fixed wing as a step to reduce bladevortexinteraction noise. The tip vortex of a NACA0012 blade was measured and visualized for the fundamental study of tip vortical flow, and the results were compared with numerical data as a validation of numerical solvers. Three-dimensional compressible Euler/Navier-Stokes codes were used to calculate the effect of jet flow from the tip of an OLS (modified BHT 540) fixed blade at various freestream velocities and jet conditions. The results show that the jet flowing from the wing tip can diffuse the tip vortex enlarging the core size of tip vortex and weakening its strength. When applied to the bladevortexinteraction phenomena, this enlarged and weak vortex can produce a lower pressure gradient on the blade surface, which means that the jet flow can effectively reduce bladevortexinteraction noise.

The application of the parallel programming methodology known as the Force was conducted. Two application issues were addressed. The first involves the efficiency of the implementation and its completeness in terms of satisfying the needs of other researchers implementing parallel algorithms. Support for, and interaction with, other Computational Structural Mechanics (CSM) researchers using the Force was the main issue, but some independent investigation of the Barrier construct, which is extremely important to overall performance, was also undertaken. Another efficiency issue which was addressed was that of relaxing the strong synchronization condition imposed on the self-scheduled parallel DO loop. The Force was extended by the addition of logical conditions to the cases of a parallel case construct and by the inclusion of a self-scheduled version of this construct. The second issue involved applying the Force to the parallelization of finite element codes such as those found in the NICE/SPAR testbed system. One of the more difficult problems encountered is the determination of what information in COMMON blocks is actually used outside of a subroutine and when a subroutine uses a COMMON block merely as scratch storage for internal temporary results.

Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results. PMID:19282550

Parallel corpora have become an essential resource for work in multi- lingual natural language processing. In this report, we describe our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm ...

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.

Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any Eclipse-based repository with a similar structure. It also can apply build parameters and preferences automatically at the end of the checkout.

Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.

We describe Fastpath, a system for speculative parallelization of sequential programs on conventional multicore processors. Our system distinguishes between the lead thread, which executes at almost-native speed, and speculative threads, which execute somewhat slower. This allows us to achieve nontrivial speedup, even on two-core machines. We present a mathematical model of potential speedup, parameterized by application characteristics and implementation constants. We also present preliminary results gleaned from two different Fastpath implementations, each derived from an implementation of software transactional memory.

Spear, Michael F.; Kelsey, Kirk; Bai, Tongxin; Dalessandro, Luke; Scott, Michael L.; Ding, Chen; Wu, Peng

A novel parallel kinetic Monte Carlo (kMC) algorithm formulated on the basis of perfect time synchronicity is presented. The algorithm provides an exact generalization of any standard serial kMC model and is trivially implemented in parallel architectures. We demonstrate the mathematical validity and parallel performance of the method by solving several well-understood problems in diffusion.

The CUDA programming model provides a straightforward means of describing inherently parallel computations, and NVIDIA's Tesla GPU architecture delivers high computational throughput on massively parallel problems. This article surveys experiences gained in applying CUDA to a diverse set of problems and the parallel speedups over sequential codes running on traditional CPU architectures attained by executing key computations on the GPU.

Michael Garland; Scott Le Grand; John Nickolls; Joshua Anderson; Jim Hardwick; Scott Morton; Everett Phillips; Yao Zhang; Vasily Volkov

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.

DiNucci, David C.; Bailey, David H. (Technical Monitor)

A new numerical algorithm for the solution of large-order eigenproblems typically encountered in linear elastic finite element systems is presented. The architecture of parallel processing is utilized in the algorithm to achieve increased speed and efficiency of calculations. The algorithm is based on the frontal technique for the solution of linear simultaneous equations and the modified subspace eigenanalysis method for the solution of the eigenproblem. Assembly, elimination and back-substitution of degrees of freedom are performed concurrently, using a number of fronts. All fronts converge to and diverge from a predefined global front during elimination and back-substitution, respectively. In the meantime, reduction of the stiffness and mass matrices required by the modified subspace method can be completed during the convergence/divergence cycle and an estimate of the required eigenpairs obtained. Successive cycles of convergence and divergence are repeated until the desired accuracy of calculations is achieved. The advantages of this new algorithm in parallel computer architecture are discussed.

A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.

By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of flow cytometry (FACS) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1-D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity and, (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in approximately 6–10 minutes, about 30-times the speed of most current FACS systems. In 1-D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times the sample throughput of CCD-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take.

Ehrlich, Daniel J.; McKenna, Brian K.; Evans, James G.; Belkina, Anna C.; Denis, Gerald V.; Sherr, David; Cheung, Man Ching

The computer code OVERFLOW is widely used in the aerodynamic community for the numerical solution of the Navier-Stokes equations. Current trends in computer systems and architectures are toward multiple processors and parallelism, including distributed memory. This report describes work that has been carried out by the author and others at Ames Research Center with the goal of parallelizing OVERFLOW using a variety of parallel architectures and parallelization strategies. This paper begins with a brief description of the OVERFLOW code. This description includes the basic numerical algorithm and some software engineering considerations. Next comes a description of a parallel version of OVERFLOW, OVERFLOW/PVM, using PVM (Parallel Virtual Machine). This parallel version of OVERFLOW uses the manager/worker style and is part of the standard OVERFLOW distribution. Then comes a description of a parallel version of OVERFLOW, OVERFLOW/MPI, using MPI (Message Passing Interface). This parallel version of OVERFLOW uses the SPMD (Single Program Multiple Data) style. Finally comes a discussion of alternatives to explicit message-passing in the context of parallelizing OVERFLOW.

There is now considerable evidence showing that the time to read a word out loud is influenced by an interaction between orthographic length and lexicality. Given that length effects are interpreted by advocates of dual-route models as evidence of serial processing this would seem to pose a serious challenge to models of single word reading which postulate a common parallel processing mechanism for reading both words and nonwords (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001; Rastle, Havelka, Wydell, Coltheart, & Besner, 2009). However, an alternative explanation of these data is that visual processes outside the scope of existing parallel models are responsible for generating the word-length related phenomena (Seidenberg & Plaut, 1998). Here we demonstrate that a parallel model of single word reading can account for the differential word-length effects found in the naming latencies of words and nonwords, provided that it includes a mapping from visual to orthographic representations, and that the nature of those orthographic representations are not preconstrained. The model can also simulate other supposedly “serial” effects. The overall findings were consistent with the view that visual processing contributes substantially to the word-length effects in normal reading and provided evidence to support the single-route theory which assumes words and nonwords are processed in parallel by a common mechanism.

Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…

Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

The Parallel Mesh Generation (PMESH) Project is a joint LDRD effort by A Division and Engineering to develop a unique mesh generation system that can construct large calculational meshes (of up to 10{sup 9} elements) on massively parallel computers. Such a capability will remove a critical roadblock to unleashing the power of massively parallel processors (MPPs) for physical analysis. PMESH will support a variety of LLNL 3-D physics codes in the areas of electromagnetics, structural mechanics, thermal analysis, and hydrodynamics.

Parallel database systems attempt to exploit recent multiprocessor computer architectures in order to build high-performance and high-availability database servers at a much lower price than equivalent mainframe computers. Although there are commercial SQL-based products, a number of open problems hamper the full exploitation of the capabilities of parallel systems. These problems touch on issues ranging from those of parallel processing

The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.

Data locality is critical to achieving high performance on l arge-scale parallel machines. Non-local data accesses result in communica- tion that can greatly impact performance. Thus the mapping, or decomposition, of the computation and data onto the processors of a scalable parallel machine is a key issue in compiling programs for these architectures. This paper describes a compiler algorithm that

Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs

Given the importance of parallel mesh generation in large-scale scientific applications and the proliferation of multilevel SMT-based architectures, it is imperative to obtain insight on the interaction between meshing algorithms and these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level and fine-grain at the element level. This multigrain data parallel

Christos D. Antonopoulos; Xiaoning Ding; Andrey N. Chernikov; Filip Blagojevic; Dimitrios S. Nikolopoulos; Nikos Chrisochoides

Archetype data parallel or task parallel applications are well served by contemporary languages. However, for applications containing a balance of task and data parallelism the choice of language is less clear. While there are languages that enable both forms of parallelism, e.g., one can write data parallel programs using a task parallel language, there are few languages which support both.

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

One candidate language for parallel symbolic computing is Prolog. Numerous ways for executing Prolog in parallel have been proposed, but current efforts suffer from several deficiencies. Many cannot support fundamental types of concurrency in Prolog. Other models are of purely theoretical interest, ignoring implementation costs. Detailed simulation studies of execution models are scare; at present little is known about the costs and benefits of executing Prolog in parallel. In this thesis, a new parallel execution model for Prolog is presented: the PPP model or Parallel Prolog Processor. The PPP supports AND-parallelism, OR-parallelism, and intelligent backtracking. An implementation of the PPP is described, through the extension of an existing Prolog abstract machine architecture. Several examples of PPP execution are presented, and compilation to the PPP abstract instruction set is discussed. The performance effects of this model are reported, based on a simulation of a large benchmark set. The implications of these results for parallel Prolog systems are discussed, and directions for future work are indicated.

Two formal models for parallel computation are presented: an abstract conceptual model and a parallel-program model. The former model does not distinguish between control and data states. The latter model includes the capability for the representation of an infinite set of control states by allowing there to be arbitrarily many instruction pointers (or processes) executing the program. An induction principle

A model of computation based on random access machines operating in parallel and sharing a common memory is presented. The computational power of this model is related to that of traditional models. In particular, deterministic parallel RAM's can accept in polynomial time exactly the sets accepted by polynomial tape bounded Turing machines; nondeterministic RAM's can accept in polynomial time exactly

This paper describes a data parallel method for polygon rendering on a massively parallel machine. This method, based on a simple shading model, is targeted for applications which require very fast rendering for extremely large sets of polygons. Such sets are found in many scientific visualization applications. The renderer can handle arbitrarily complex polygons which need not be meshed. Issues

Frank A. Ortega; Charles D. Hansen; James P. Ahrens

PMC (Parallel Monte Carlo) is a system of generic interface routines that allows easy porting of Monte Carlo packages of large-scale physics simulation codes to Massively Parallel Processor (MPP) computers. By loading various versions of PMC, simulation code developers can configure their codes to run in several modes: serial, Monte Carlo runs on the same processor as the rest of the code; parallel, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on other MPP processor(s); distributed, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on a different machine. This multi-mode approach allows maintenance of a single simulation code source regardless of the target machine. PMC handles passing of messages between nodes on the MPP, passing of messages between a different machine and the MPP, distributing work between nodes, and providing independent, reproducible sequences of random numbers. Several production codes have been parallelized under the PMC system. Excellent parallel efficiency in both the distributed and parallel modes results if sufficient workload is available per processor. Experiences with a Monte Carlo photonics demonstration code and a Monte Carlo neutronics package are described.

The influence of interface boundary conditions on the ability to parallelize pseudospectral multidomain algorithms is investigated. Using the properties of spectral expansions, a novel parallel two domain procedure is generalized to an arbitrary number of domains each of which can be solved on a separate processor. This interface boundary condition considerably simplifies influence matrix techniques.

The influence of interface boundary conditions on the ability to parallelize pseudospectral multidomain algorithms is investigated. Using the properties of spectral expansions, a novel parallel two domain procedure is generalized to an arbitrary number of domains each of which can be solved on a separate processor. This interface boundary condition considerably simplifies influence matrix techniques.

Tree search algorithms play an important role in many applications in the field of artificial intelligence. When playing board games like chess etc., computers use game tree search algorithms to evaluate a position. In this paper, we present a procedure that we call Parallel Controlled Conspiracy Number Search (Parallel CCNS). Shortly, we describe the principles of the sequential CCNS algorithm,

. This paper extends automata-theoretic techniques to unbounded parallelbehaviour, as seen for instance in Petri nets. Languages are defined to besets of (labelled) series-parallel posets --- or, equivalently, sets of terms in analgebra with two product operations: sequential and parallel. In an earlier paper,we restricted ourselves to languages of posets having bounded width andintroduced a notion of branching automaton. In

This patent describes a vector processing node for a computer of the type having a network of simultaneously operating vector processing nodes interconnected by bidirectional external busses for conveying parallel data words between the vector processing nodes. The vector processing node comprising: a bi-directional first bus for conveying parallel data words; a bi-directional second bus for conveying parallel data words; vector memory means connected for read and write access through the second bus for storing vectors comprising sequences of parallel data words conveyed on the second bus; vector processing means connected to the second bus for transmitting parallel data words to and receiving parallel data words from the vector memory means for generating output vectors comprising functions of input vectors stored in the vector memory means and for storing the output vectors in the vector memory means; and control means including a computer processor connected to the first bus, external port means controlled by the computer processor and connected between the first bus and the external busses, and local port means controlled by the computer processor connected between the first and second busses, for transmitting parallel data words to and receiving parallel data words from the first bus, the second bus, the external busses, and the vector memory.

In this paper, an investigation of the parallel arithmetic complexity of matrix inversion, solving systems of linear equations, computing determinants and computing the characteristic polynomial of a matrix is reported. The parallel arithmetic complexity of solving equations has been an open question for several years. The gap between the complexity of the best algorithms (2n + 0(1), where n is

The paper deals with parallel mobile agents and related performance evaluation framework. A model called mobile agent network is proposed. It includes a multi-agent system consisting of co-operating and communicating mobile agents, a set of processing nodes in which the agents perform services and a network that connects processing nodes and allows agent mobility. Parallelism in mobile agent network is

We generalize an earlier model of international vertical pricing to explain key features of parallel imports, or unauthorized trade in legitimate goods. When a manufacturer (or trademark owner) sells its product through an independent agent in one country, the agent may find it profitable to engage in parallel trade, selling the product to another country without the authorization of the

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

The production of noise in the interaction of a vortex with the leading edge of a profile in a transonic flow was investigated. The occurring phenomena are detected in the far field as noise (blade-vortexinteraction noise). The vortex was produced as a v...

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

Archer, Charles Jens (Rochester, MN); Inglett, Todd Alan (Rochester, MN)

pC++ is a language extension to C++ designed toallow programmers to compose "concurrent aggregate"collection classes which can be aligned and distributedover the memory hierarchy of a parallel machine ina manner modeled on the High Performance FortranForum (HPFF) directives for Fortran 90. pC++ allowsthe user to write portable and efficient code whichwill run on a wide range of scalable parallel computersystems.

A. Malony; B. Mohr; P. Beckman; D. Gannon; S. Yang; F. Bodin; S. Kesavan

We consider the problem of finding all the global (and some local) minimizers of a given nonlinear optimization function (a class of problems also known as multi-local programming problems), using a novel approach based on Parallel Computing. The approach, named Parallel Stretched Simulated Annealing (PSSA), combines simulated annealing with stretching function technique, in a parallel execution environment. Our PSSA software allows to increase the resolution of the search domains (thus facilitating the discovery of new solutions) while keeping the search time bounded. The software was tested with a set of well known problems and some numerical results are presented.

Integrated urban drainage modelling is used to analyze how existing urban drainage systems respond to particular conditions. Based on these integrated models, researchers and engineers are able to e.g. estimate long-term pollution effects, optimize the behaviour of a system by comparing impacts of different measures on the desired target value or get new insights on systems interactions. Although the use of simplified conceptual models reduces the computational time significantly, searching the enormous vector space that is given by comparing different measures or that the input parameters span, leads to the fact, that computational time is still a limiting factor. Owing to the stagnation of single thread performance in computers and the rising number of cores one needs to adapt algorithms to the parallel nature of the new CPUs to fully utilize the available computing power. In this work a new developed software tool named CD3 for parallel computing in integrated urban drainage systems is introduced. From three investigated parallel strategies two showed promising results and one results in a speedup of up to 4.2 on an eight-way hyperthreaded quad core CPU and shows even for all investigated sewer systems significant run-time reductions. PMID:20107253

We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.

Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.

The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) 'Newton' refers to a quadratically convergent nonlinear iterat...

Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)

Designing and Building Parallel Programs [Online] is an innovative traditional print and online resource publishing project. It incorporates the content of a textbook published by Addison-Wesley into an evolving online resource.

I show the earliest neutron sources, exhibit the historical development of slow-neutron sources, trace the early technical and community developments and the origins of ICANS and of UCANS, and find parallels between them.

Shows the elements of an ac parallel circuit, examines the effects of current, and shows what the generator sees in the following circuits: where xl exceeds xc, xc exceeds xl, and xc and xl are equal.

PVTOL is a C++ library that allows cross-platform software portability without sacrificing high performance. Researchers at MIT Lincoln Laboratory developed the Parallel Vector Tile Optimizing Library (PVTOL) to address a primary challenge faced by develo...

The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then si...

We introduce the real, discrete-time Gaussian parallel relay network. This simple network is theoretically important in the context of network information theory. We present upper and lower bounds to capacity and explain where they coincide

Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

Our work on Fast Algorithms for Parallel Architectures led us to investigate methods for computing all eigenvalues and eigen vectors of a summetric tridiagonal matrix on a distributed-memory MIMD multiprocessor. We have studied only those techniques havin...

Master\\/Slave Speculative Parallelization (MSSP) is an execution paradigm for improving the execution rate of sequential programs by parallelizing them speculatively for execution on a multiprocessor. In MSSP, one processor---the master---executes an approximate version of the program to compute selected values that the full program's execution is expected to compute. The master's results are checked by slave processors that execute the

The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.

Parallel algorithms for parsing expressions on mesh, shuffle, cube, and cube-connected cycle parallel computers are presented. With n processors, it requires O( square root n) time on the mesh-connected model and O(log\\/sup 2\\/ n) time on others. For the mesh-connected computer, the author uses a wrap-around row-major ordering. For the shuffle computer, he uses an extra connection between adjacent processors,

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.

Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

SUMMARY In the retina, presynaptic inhibitory mechanisms that shape directionally selective (DS) responses in output ganglion cells are well established. However, the nature of inhibition-independent forms of directional selectivity remains poorly defined. Here, we describe a genetically specified set of ON-OFF DS ganglion cells (DSGCs) that code anterior motion. This entire population of DSGCs exhibits asymmetric dendritic arborizations that orientate toward the preferred direction. We demonstrate that morphological asymmetries along with nonlinear dendritic conductances generate a centrifugal (soma-to-dendrite) preference that does not critically depend upon, but works in parallel with the GABAergic circuitry. We also show that in symmetrical DSGCs, such dendritic DS mechanisms are aligned with, or are in opposition to, the inhibitory DS circuitry in distinct dendritic subfields where they differentially interact to promote or weaken directional preferences. Thus, pre- and postsynaptic DS mechanisms interact uniquely in distinct ganglion cell populations, enabling efficient DS coding under diverse conditions.

Trenholm, Stuart; Johnson, Kyle; Li, Xiao; Smith, Robert G.; Awatramani, Gautam B.

In the retina, presynaptic inhibitory mechanisms that shape directionally selective (DS) responses in output ganglion cells are well established. However, the nature of inhibition-independent forms of directional selectivity remains poorly defined. Here, we describe a genetically specified set of ON-OFF DS ganglion cells (DSGCs) that code anterior motion. This entire population of DSGCs exhibits asymmetric dendritic arborizations that orientate toward the preferred direction. We demonstrate that morphological asymmetries along with nonlinear dendritic conductances generate a centrifugal (soma-to-dendrite) preference that does not critically depend upon, but works in parallel with the GABAergic circuitry. We also show that in symmetrical DSGCs, such dendritic DS mechanisms are aligned with, or are in opposition to, the inhibitory DS circuitry in distinct dendritic subfields where they differentially interact to promote or weaken directional preferences. Thus, pre- and postsynaptic DS mechanisms interact uniquely in distinct ganglion cell populations, enabling efficient DS coding under diverse conditions. PMID:21867884

Trenholm, Stuart; Johnson, Kyle; Li, Xiao; Smith, Robert G; Awatramani, Gautam B

...2009-04-01 2009-04-01 false Parallel proceedings. 12.24 Section...Consideration of Pleadings Â§ 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall...

...2010-04-01 2010-04-01 false Parallel proceedings. 12.24 Section...Consideration of Pleadings Â§ 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall...

We present an effective approach to performing data flow analysis in parallel and identify three types of parallelism inherent in this solution process: independent-problem parallelism, separate-unit parallelism and algorithmic parallelism. We present our investigations of Fortran procedures from thePerfect Benchmarks andnetlib libraries, which reveal structural characteristics of program flow graphs that are amenable to algorithmic parallelism. Previously, the utility of

The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

The history and methodology of aerodynamic noise reduction in rotary wing aircraft are presented. Thickness noise during hover tests and bladevortexinteraction noise are determined and predicted through the use of a variety of computer codes. The use of test facilities and scale models for data acquisition are discussed.

The practical aspects of an advanced schlieren technique, which has been presented by Meier (1999) and Richard et al (2000) and in a similar form by Dalziel et al (2000), are described in this paper. The application of the technique is demonstrated by three experimental investigations on compressible vortices. These vortices play a major role in the bladevortexinteraction

This paper presents a status of non-CFD aeroacoustic codes at NASA Langley Research Center for the prediction of helicopter harmonic and Blade-VortexInteraction (BVI) noise. The prediction approach incorporates three primary components: CAMRAD.Mod1, a su...

This dissertation describes the development of a comprehensive aeroelastic\\/aeroacoustic simulation capability for the modeling of vibration and noise in rotorcraft induced by blade-vortexinteraction (BVI). Subsequently this capability is applied to study vibration and noise reduction, using active and passive control approaches. The active approach employed is the actively controlled partial span trailing edge flaps (ACF), implemented in single and

The manipulation of a flow field to obtain a desired change is a much heightened subject. Active flow control has been the subject of the major research areas in fluid mechanics for the past two decades. It offers new solutions for mitigation of shock strength, sonic boom alleviation, drag minimization, reducing blade-vortexinteraction noise in helicopters, stall control and the

The effect of tip-shape modification on blade-vortexinteraction-induced helicopter blade-slap noise was investigated. The general rotor model system (GRMS) with a 3.148 m (10.33 ft) diameter, four-bladed fully articulated rotor was installed in the Langl...

With increasing use of helicopters the problem of noise emission of helicopters became more and more important. The first HART program on rotorcraft noise has shown that bladevortexinteraction (BVI) is a major source of impulsive noise. As BVI-noise is ...

A near wake model for trailing vorticity originally proposed by Beddoes for high-resolution helicopter bladevortexinteraction computations has been implemented and compared with the usual blade element momentum models used for wind turbine calculations. The model is in principle a lifting line model for the rotating blade, where only a quarter revolution of the wake system behind the blade

A series of flight tests for noise measurement was conducted by Japan Aerospace Exploration Agency using its research helicopter MuPAL-E. An onboard microphone was installed at the tip of the nose boom to obtain precise BVI (Blade-Vortex-Interaction) nois...

A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

We address the problem of finding parallel plans for SQL queries using the two-phase approach of join ordering followed by parallelization. We fo- cus on the parallelization phase and develop al- gorithms for exploiting pipelined parallelism. We formulate parallelization as scheduling a weighted operator tree to minimize response time. Our model of response time captures the fundamental tradeoff between parallel

Interactions between the proteins of the ternary soluble N-ethyl maleimide-sensitive fusion protein attachment protein receptor (SNARE) complex, synaptobrevin 2 (Sb2), syntaxin 1A (Sx1A) and synaptosome-associated protein of 25 kDa (SNAP25) can be readily assessed using force spectroscopy single-molecule measurements. We studied interactions during the disassembly of the ternary SNARE complex pre-formed by binding Sb2 in parallel or anti-parallel orientations to the binary Sx1A-SNAP25B acceptor complex. We determined the spontaneous dissociation lifetimes and found that the stability of the anti-parallel ternary SNARE complex is ?1/3 less than that of the parallel complex. While the free energies were very similar, within 0.5 k(B)T, for both orientations, the enthalpy changes (42.1 k(B)T and 39.8 k(B)T, for parallel and anti-parallel orientations, respectively) indicate that the parallel ternary complex is energetically advantageous by 2.3 k(B)T. Indeed, both ternary SNARE complex orientations were much more stable (by ?4-13 times) and energetically favorable (by ?9-13 k(B)T) than selected binary complexes, constituents of the ternary complex, in both orientations. We propose a model which considers the geometry for the vesicle approach to the plasma membrane with favorable energies and stability as the basis for preferential usage of the parallel ternary SNARE complex in exocytosis. PMID:22525946

The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center`s CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated.

Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

The recently developed techniques of parallel imaging with phased array coils are rapidly becoming accepted for magnetic resonance angiography (MRA) applications. This article reviews the various current parallel imaging techniques and their application to MRA. The increased scan efficiency provided by parallel imaging allows increased temporal or spatial resolution, and reduction of artifacts in contrast-enhanced MRA (CE-MRA). Increased temporal resolution in CE-MRA can be used to reduce the need for bolus timing and to provide hemodynamic information helpful for diagnosis. In addition, increased spatial resolution (or volume coverage) can be acquired in a breathhold (eg, in renal CE-MRA), or in otherwise limited clinically acceptable scan durations. The increased scan efficiency provided by parallel imaging has been successfully applied to CE-MRA as well as other MRA techniques such as inflow and phase contrast imaging. The large signal-to-noise ratio available in many MRA techniques lends these acquisitions to increased scan efficiency through parallel imaging. PMID:15479999

Wilson, Gregory J; Hoogeveen, Romhild M; Willinek, Winfried A; Muthupillai, Raja; Maki, Jeffrey H

Simultaneous externalization of design alternatives through multiple prototypes enables designers to see choices in context and facilitates comparative reasoning and discus- sion. This paper introduces two techniques for interactively manipulating multiple software alternatives. First, this work demonstrates a novel environment for constructing multiple design alternatives through (selectively) parallel editing and execution. Second, this environment's architecture intro- duces a mechanism for

The current authors reply to a response by Bowers on a comment by the current authors on the original article. Bowers (2010) mischaracterizes the goals of parallel distributed processing (PDP research)--explaining performance on cognitive tasks is the primary motivation. More important, his claim that localist models, such as the interactive…

to given workloads. The scope and interaction of applications, operating systems, communication networks, processors, and other hardware and software lead to substantial system complexity. Development of virtual prototypes in lieu of physical prototypes can result in tremendous savings, especially when created in concert with a powerful model development tool. When high-fidelity models of parallel architecture are coupled with workloads generated

Alan D. George; Ryan B. Fogarty; Jeff S. Markwell; Michael D. Miars

Bringing networked virtual game worlds and game world logic to the open Internet spawns new types of computer games. It usually deals with thousands of interactive entities among its Web servers. Game engine practitioners have used scripting technology to add soft computing capabilities to a variety of their engine modules. This article proposes a unified approach of using neural parallel

Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays.

A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.

Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)

This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.

Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.

Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.

Continuous Parallel Coordinates (CPC) are a contemporary visualization technique in order to combine several scalar fields, given over a common domain. They facilitate a continuous view for parallel coordinates by considering a smooth scalar field instead of a finite number of straight lines. We show that there are feature curves in CPC which appear to be the dominant structures of a CPC. We present methods to extract and classify them and demonstrate their usefulness to enhance the visualization of CPCs. In particular, we show that these feature curves are related to discontinuities in Continuous Scatterplots (CSP). We show this by exploiting a curve-curve duality between parallel and Cartesian coordinates, which is a generalization of the well-known point-line duality. Furthermore, we illustrate the theoretical considerations. Concluding, we discuss relations and aspects of the CPC's/CSP's features concerning the data analysis. PMID:22034308

Many scientific computer codes involve linear systems of equations which are coupled only between nearest neighbors in a single dimension. The most common situation can be formulated as a tridiagonal matrix relating source terms and unknowns. This system of equations is commonly solved using simple forward and back substitution. The usual algorithm is spectacularly ill suited for parallel processing with distributed data, since information must be sequentially communicated across all domains. Two new tridiagonal algorithms have been implemented in FORTRAN 77. The two algorithms differ only in the form of the unknown which is to be found. The first and simplest algorithm solves for a scalar quantity evaluated at each point along the single dimension being considered. The second algorithm solves for a vector quantity evaluated at each point. The solution method is related to other recently published approaches, such as that of Bondeli. An alternative parallel tridiagonal solver, used as part of an Alternating Direction Implicit (ADI) scheme, has recently been developed at LLNL by Lambert. For a discussion of useful parallel tridiagonal solvers, see the work of Mattor, et al. Previous work appears to be concerned only with scalar unknowns. This paper presents a new technique which treats both scalar and vector unknowns. There is no restriction upon the sizes of the subdomains. Even though the usual tridiagonal formulation may not be theoretically optimal when used iteratively, it is used in so many computer codes that it appears reasonable to write a direct substitute for it. The new tridiagonal code can be used on parallel machines with a minimum of disruption to pre-existing programming. As tested on various parallel computers, the parallel code shows efficiency greater than 50% (that is, more than half of the available computer operations are used to advance the calculation) when each processor is given at least 100 unknowns for which to solve.

After review of Construction Basics, the technique of constructing a parallel line through a point not on the line will be learned. Let's review the basics of Constructions in Geometry first: Constructions - General Rules Review of how to copy an angle is helpful; please review that here: Constructions: Copy a Line Segment and an Angle Now, using a paper, pencil, straight edge, and compass, you will learn how to construct a parallel through a point. A video demonstration is available to help you. (Windows Media ...

Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.

Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processes. User programs and their gangs of processes are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantum are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory.

Developing parallel discrete event simulation code is currently very time-consuming and requires a high level of expertise. Few tools, if any, exist to aid conversion of existing sequential simulation programs to efficient parallel code. Traditional approaches to automatic parallelization, as used in many parallelizing compilers, are not well-suited for this application because of the irregular, data dependent nature of discrete

Developing parallel discrete event simulation code is currently very time-consuming and requires a high level of expertise. Few tools, if any, exist to aid conversion of existing sequential simulation programs to efficient parallel code. Traditional approaches to automatic parallelization, as used in many parallelizing compilers, are not well-suited for this application because of the irregular, data dependent nature of discrete

Data parallel programming languages offer ease of pro- gramming and debugging and scalability of parallel pro- grams to increasing numbers of processors. Unfortunately, the usefulness of these languages for non-scientific pro- grammers and loosely coupled parallel machines is cur- rently limited. In this paper, we present the composite tree model which seeks to provide greater flexibility via parallel data types,

Our goal is to ease the parallelization of applications on distributed-memory parallel processors. Part of our team is implementing parallel kernels common to industrially significant applications using High Performance Fortran (HPF) and the Message Passing Interface (MPI). They are assisted in this activity by a second group developing an integrated tool environment, Annai, consisting of a parallelization support tool, a

C. Clemencon; K. M. Decker; V. R. Deshpande; A. Endo; J. Fritscher; P. A. R. Lorenzo; N. Masuda; A. Muller; R. Ruhl; W. Sawyer; B. J. N. Wylie; F. Zimmermann

This paper describes an integrated architecture, compiler, runtime, and operating system solution to exploiting heterogeneous parallelism. The architecture is a pipelined multi-threaded multiprocessor, enabling the execution of very fine (multiple operations within an instruction) to very coarse (multiple jobs) parallel activities. The compiler and runtime focus on managing parallelism within a job, while the operating system focuses on managing parallelism

Gail A. Alverson; Robert Alverson; David Callahan; Brian Koblenz; Allan Porterfield; Burton J. Smith

Abstract: It is the goal of the Polaris project to develop a new parallelizing compiler that will overcomelimitations of current compilers. While current parallelizing compilers may succeed on small kernels,they often fail to extract any meaningful parallelism from large applications. After a study ofapplication codes, it was concluded that by adding a few new techniques to current compilers,automatic parallelization becomes

. It is the goal of the Polaris project to develop a new parallelizing compiler that willovercome limitations of current compilers. While current parallelizing compilers may succeed onsmall kernels, they often fail to extract any meaningful parallelism from large applications. Aftera study of application codes, it was concluded that by adding a few new techniques to currentcompilers, automatic parallelization becomes

William Blume; Rudolf Eigenmann; Keith Faigin; John Grout; Jay Hoeflinger; David A. Padua; Paul Petersen; William M. Pottenger; Lawrence Rauchwerger; Peng Tu; Stephen Weatherford

The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.

We intro duce an algorithm for systematically improving the efficiency of parallel tempering Monte Carlo simulations by optimizing the simulated temperature set. Our approach is closely related to a recently introduced adaptive algorithm that optimizes the simulated statistical ensemble in generalized broad-histogram Monte Carlo simulations. Conventionally, a temperature set is chosen in such a way that the acceptance rates for

This issue contains nine articles that provide an overview of trends and research in parallel information retrieval. Topics discussed include network design for text searching; the Connection Machine System; PThomas, an adaptive information retrieval system on the Connection Machine; algorithms for document clustering; and system architecture for…

In this paper, we describe a family of parallel-sorting algorithms for a multiprocessor system. These algorithms are enumeration sortings and comprise the following phases: 1) count acquisition: the keys are subdivided into subsets and for each key we determine the number of smaller keys (count) in every subset; 2) rank determination: the rank of a key is the sum of

The decreasing cost of computing makes it economically viable to reduce the response time of decision support queries by using parallel execution to exploit inexpen- sive resources. This goal poses the following query op- timization problem: Mzntmzze response ttme subject to constraints on throughput, which we motivate as the dual of the traditional DBMS problem, We address this novel problem

Four paradigms that can be useful in developing parallel algorithms are discussed. These include computational complexity analysis, changing the order of computation, asynchronous computation, and divide and conquer. Each is illustrated with an example from scientific computation, and it is shown that computational complexity must be used with great care or an inefficient algorithm may be selected.

Teradata's parallel DBMS has been successfully deployed in large data warehouses over the last two decades for large scale business analysis in various industries over data sets ranging from a few terabytes to multiple petabytes. However, due to the explosive data volume increase in recent years at some customer sites, some data such as web logs and sensor data are

Research for which the idea that many basic cognitive processes can be described as fast, parallel, and automatic is reviewed. Memory retrieval\\/decision processes have often been ignored in the cognitive literature. However, in some cases, computation- ally complex processes can be replaced with simple passive processes. Cue-dependent retrieval from memory provides a straightforward example of how encoding, memory, and retrieval

This paper explores methods for extracting parallelism from a wide variety of numerical applications. We investigate communications overheads and load-balancing for networks of transputers. After a discussion of some practical strategies for constructing occam programs, two case studies are analysed in detail.

David J. Pritchard; C. R. Askew; D. B. Carpenter; Ian Glendinning; Anthony J. G. Hey; Denis A. Nicole

In this paper we describe a statistical technique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our data, the only information about the sentences that we use for calculating alignments is the number of tokens that they contain. Because we make no use of the lexical details of

This paper describes a parallel algorithm for ranking the pixels on a curve in O(log N) time using an EREW PRAM model. The algorithms accomplish this with N processors for a (root)N X (root)N image. After applying such an algorithm to an image, we are able to move the pixels from a curve into processors having consecutive addresses. This is important on hypercube connected machines like the Connection Machine because we can subsequently apply many algorithms to the curve using powerful segmented scan operations (i.e., parallel prefix operations). We shall illustrate this by first showing how we can find piecewise linear approximations of curves using Ramer's algorithm. This process has the effect of converting closed curves into simple polygons. As another example, we shall describe a more complicated parallel algorithm for computing the visibility graph of a simple planar polygon. The algorithm accomplishes this in O(k log N) time using O(N2/log N) processors for an N vertex polygon, where k is the link-diameter of the polygon in consideration. Both of these algorithms require only scan operations (as well as local neighbor communication) as the means of inter-processor communication. Thus, the algorithms can not only be implemented on an EREW PRAM, but also on a hypercube connected parallel machine, which is a more practical machine model. All these algorithms were implemented on the Connection Machine, and various performance tests were conducted.

The process of cytokinesis in animal cells is usually presented as a relatively simple picture: A cleavage plane is first positioned in the equatorial region by the astral microtubules of the anaphase mitotic apparatus, and a contractile ring made up of parallel filaments of actin and myosin II is formed and encircles the cortex at the division site. Active sliding

Taro Q. P. Uyeda; Akira Nagasaki; Shigehiko Yumura

Co-Array Fortran, formerly known as F--, is a small extension of Fortran 95 for parallel processing. A Co-Array Fortran program is interpreted as if it were replicated a number of times and all copies were executed asynchronously. Each copy has its own set of data objects and is termed an image. The array syntax of Fortran 95 is extended with

GRay is a massive parallel ordinary differential equation integrator that employs the "stream processing paradigm." It is designed to efficiently integrate billions of photons in curved spacetime according to Einstein's general theory of relativity. The code is implemented in CUDA C/C++.

We have designed and implemented a language called Cid for parallel applications with recursive linked data structures (e.g., lists, trees, graphs) and complex control structures (data dependent, recursion). Cid is unique in that, while targeting distributed memory machines, it attempts to preserve the traditional “MIMD threads plus lock-protected shared data” programming model that is standard on shared memory machines.

The Ejs Parallel Plate Capacitor model displays a parallel-plate capacitor which consists of two identical metal plates, placed parallel to one another. The capacitor can be charged by connecting one plate to the positive terminal of a battery and the other plate to the negative terminal. The dielectric constant and the separation of the plates can be changed via sliders. You can modify this simulation if you have Ejs installed by right-clicking within the plot and selecting "Open Ejs Model" from the pop-up menu item. Ejs Parallel Plate Capacitor model was created using the Easy Java Simulations (Ejs) modeling tool. It is distributed as a ready-to-run (compiled) Java archive. Double clicking the ejs_bu_capacitor.jar file will run the program if Java is installed. Ejs is a part of the Open Source Physics Project and is designed to make it easier to access, modify, and generate computer models. Additional Ejs models for Newtonian mechanics are available. They can be found by searching ComPADRE for Open Source Physics, OSP, or Ejs.

Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

Several recent studies have proposed methods to ac- celerate the receipt of a file by downloading its parts from differ- ent servers in parallel. This paper formulates models for an ap- proach based on receiving only one copy of each of the data pack- ets in a file, while different packets may be obtained from different sources. This approach guarantees

Let S:[0,1]?[0,1] be a nonsingular transformation and let P:L1(0,1)?L1(0,1) be the corresponding Frobenius–Perron operator. In this paper we propose a parallel algorithm for computing a fixed density of P, using Ulam's method and a modified Monte Carlo approach. Numerical results are also presented.

The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers

Richard A. Blech; Edward J. Milner; Angela Quealy; Scott E. Townsend

By means ofParallel Coordinates planar “graphs” of multivariate relations are obtained. Certain properties of the relationship correspond tothe geometrical properties of its graph. On the plane a point ?? line duality with several interesting properties is induced. A new duality betweenbounded and unbounded convex sets and hstars (a generalization of hyperbolas) and between Convex Unions and Intersections is found. This

During the Spring 2008 semester at Olin College, we introduced the programming language occam-pi to undergraduates as part of their first course in robotics. Students were able to explore image processing and autonomous behavioral control in a parallel programming language on a small mobile robotics platform with just two weeks of tutorial instruction. Our experiences to date suggest that the

Matthew C. Jadud; Christian L. Jacobsen; Carl G. Ritson; Jonathan Simpson

Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared me...

Originating from basic research conducted in the 1970's and 1980's, the parallel and distributed simulation field has ma- tured over the last few decades. Today, operational systems have been fielded for applications such as military training, analysis of communication networks, and air traffic control systems, to mention a few. This tutorial gives an overview of technologies to distribute the execution

This paper describes an experiment that uses translation equivalents derived from parallel corpora to determine sense distinctions that can be used for automatic sense-tagging and other disambiguation tasks. Our results show that sense distinctions derived from cross-lingual information are at least as reliable as those made by human annotators. Because our approach is fully automated through all its steps, it

Specialized computer architectures can provide better price\\/performance for executing image processing and graphics applications than general purpose designs. Two processors are presented that use parallel SIMD data paths to support common graphics data structures as primitive operands in arithmetic expressions. A variant of the C language has been implemented to allow high level language coding of user applications on these

Adam Levinthal; Pat Hanrahan; Mike Paquette; Jim Lawson

Often very fundamental biochemical and biophysical problems defy simulations because of limitation in today's computers. We present and discuss a distributed system composed of two IBM-4341 and one IBM-4381, as front-end processors, and ten FPS-164 attached array processors. This parallel system-called LCAP-has presently a peak performance of about 120 MFlops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have migrated from sequential to parallel; a molecular quantum mechanical, a Metropolis-Monte Carlo and a Molecular Dynamics program. Descriptions of the parallel codes are briefly outlined. As examples and tests of these applications we report on a study for proton tunneling in DNA base-pairs, very relevant to spontaneous mutations in genetics. As a second example, we present a Monte Carlo study of liquid water at room temperature where not only two- and three-body interactions are considered but-for the first time-also four-body interactions are included. Finally we briefly summarize a molecular dynamics study where two- and three-body interactions have been considered. These examples, and very positive performance comparison with today's supercomputers allow us to conclude that parallel computers and programming of the type we have considered, represent a pragmatic answer to many computer intensive problems.

Clementi, E.; Corongiu, G.; Detrich, J. H.; Kahnmohammadbaigi, H.; Chin, S.; Domingo, L.; Laaksonen, A.; Nguyen, N. L.

The objective of this research is to develop computationally efficient methods for solving fluid-structural interaction problems by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures on parallel computers. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.

This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.

This paper presents a parallel active filter system implementation for utility interface of an an adjustable speed drive air-conditioner chiller application to meet IEEE 519 recommended harmonic standards. Specifications of displacement power factor, efficiency, cost, size and packaging requirements with the rectifier front-end topology are used to determine the optimal active filter solution. Design issues and interaction of parallel active

S. Bhattacharya; T. M. Frank; D. M. Divan; B. Banerjee

Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.

This paper suggests that data parallelism is more general than previously thought and that integrating support for task parallelism into a data parallel programming language is a mistake. With several proposed improvements, the data parallel programming language ZPL is surprisingly versatile. The language and its power are illustrated by the solution to several traditionally task parallel problems. In addition, limitations

Three applications of massively parallel computing to weapons development at Sandia National Laboratories are briefly described, including armor/antiarmor simulations. The numerical modeling of penetrator-armor interactions requires detailed, three-dimens...

An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.

Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)

Many important problems in computer vision can be characterized as template-matching problems on edge images. Some examples are circle detection and line detection. Two techniques for template matching are the Hough transform and correlation. There are two algorithms for correlation: a shift-and-add-based technique and a Fourier-transform-based technique. The most efficient algorithm of these three varies depending on the size of the template and the structure of the image. On different parallel architectures, the choice of algorithms for a specific problem is different. This paper describes two parallel architectures: the WARP and the Butterfly and describes why and how the criterion for making the choice of algorithms differs between the two machines.

The paper develops and compares several fine-grained parallel algorithms to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed-memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special-purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a two-dimensional grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. It also presents a performance model and uses it to analyze the algorithms. It finds that asymptotic analysis combined with experimental measurement of parameters is accurate enough to be useful in choosing among alternative algorithms for a complicated problem.

Parallel simulation is an important practical technique for improving the performance of simulations. The most effective approach to parallel simulation depends on the characteristics of the system being simulated. One key characteristic is called lookahead. Another kind of lookahead, called implicit lookahead, was introduced for simulating FCFS stochastic queueing systems; impact lookahead can be exploited to yield performance benefits even when explicit lookahead does not exist. In this paper, the authors show the feasibility of implicit lookahead for non-FCFS systems. They propose several lookahead exploiting techniques for round-robin (RR) system simulations. They design an algorithm that generates lookahead in 0(1) time. Both analytical models and experiments are constructed to evaluate these techniques. The authors also evaluate a lookahead technique for preemptive priority (PP) systems using an analytical model.

Lin, Y.B.; Lazowska, E.D. (Washington Univ., Seattle, WA (USA). Dept. of Computer Science)

In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.

We present the new parallel version (PCRASH2) of the cosmological radiative transfer code CRASH2 for distributed memory supercomputing facilities. The code is based on a static domain decomposition strategy inspired by geometric dilution of photons in the optical thin case that ensures a favourable performance speed-up with an increasing number of computational cores. Linear speed-up is ensured as long as the number of radiation sources is equal to the number of computational cores or larger. The propagation of rays is segmented and rays are only propagated through one sub-domain per time-step to guarantee an optimal balance between communication and computation. We have extensively checked PCRASH2 with a standardized set of test cases to validate the parallelization scheme. The parallel version of CRASH2 can easily handle the propagation of radiation from a large number of sources and is ready for the extension of the ionization network to species other than hydrogen and helium.

Partl, A. M.; Maselli, A.; Ciardi, B.; Ferrara, A.; Müller, V.

To facilitate numerical study of noise and decoherence in QC algorithms,and of the efficacy of error correction schemes, we have developed a Fortran 90 quantum computer simulator with parallel processing capabilities. It permits rapid evaluation of quantum algorithms for a large number of qubits and for various ``noise'' scenarios. State vectors are distributed over many processors, to employ a large number of qubits. Parallel processing is implemented by the Message-Passing Interface protocol. A description of how to spread the wave function components over many processors, along with how to efficiently describe the action of general one- and two-qubit operators on these state vectors will be delineated.Grover's search and Shor's factoring algorithms with noise will be discussed as examples. A major feature of this work is that concurrent versions of the algorithms can be evaluated with each version subject to diverse noise effects, corresponding to solving a stochastic Schrodinger equation. The density matrix for the ensemble of such noise cases is constructed using parallel distribution methods to evaluate its associated entropy. Applications of this powerful tool is made to delineate the stability and correction of QC processes using Hamiltonian based dynamics.

Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.

SPEFY (Scotia Programming Environment and Facility) is a new software development environment designed to simplify and accelerate the development of large-scale programs in a manner that makes the most efficient use of the supercomputers on which they run. The centerpiece of SPEFY is the Parallelism Analysis and Optimization tool, which is an interactive facility for analyzing code, detecting data dependence, and optimizing the program by parallelism-enhancing transformations. A significant feature of the analysis is that it is performed both across and within procedures, and greatly increase the precision of data flow and dependence information. The objective of this paper is to describe the Parallelism Analysis and Optimization tool of SPEFY. It discusses data dependence, interprocedural analysis by determining the relevant effects of procedure calls, data dependence analysis incorporating interprocedural information, and program restructuring optimization techniques.

The TREE method has been widely used for long-range interaction N-body problems. We have developed a parallel TREE code for two-component classical plasmas with open boundary conditions and highly non-uniform charge distributions. The program efficiently handles millions of particles evolved over long relaxation times requiring millions of time steps. Appropriate domain decomposition and dynamic data management were employed, and large-scale parallel processing was achieved using an intermediate level of granularity of domain decomposition and ghost TREE communication. Even though the computational load is not fully distributed in fine grains, high parallel efficiency was achieved for ultracold plasma systems of charged particles. As an application, we performed simulations of an ultracold neutral plasma with a half million particles and a half million time steps. For the long temporal trajectories of relaxation between heavy ions and light electrons, large configurations of ultracold plasmas can now be investigated, which was not possible in past studies.

Jeon, Byoungseon; Kress, Joel D.; Collins, Lee A.; Grønbech-Jensen, Niels

We consider parallel algorithms working in sequential global time, for example circuits or parallel random access machines (PRAMs). Parallel abstract state machines (parallel ASMs) are such par- allel algorithms, and the parallel ASM thesis asserts that every parallel algorithm is behaviorally equivalent to a parallel ASM. In an earlier paper, we axiomatized parallel algorithms, proved the ASM thesis and proved

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.

NWChem is a general purpose computational chemistry code specifically designed to run on distributed memory parallel computers. The core functionality of the code focuses on molecular dynamics, Hartree-Fock and density functional theory methods for both plane-wave basis sets as well as Gaussian basis sets, tensor contraction engine based coupled cluster capabilities and combined quantum mechanics/molecular mechanics descriptions. It was realized from the beginning that scalable implementations of these methods required a programming paradigm inherently different from what message passing approaches could offer. In response a global address space library, the Global Array Toolkit, was developed. The programming model it offers is based on using predominantly one-sided communication. This model underpins most of the functionality in NWChem and the power of it is exemplified by the fact that the code scales to tens of thousands of processors. In this paper the core capabilities of NWChem are described as well as their implementation to achieve an efficient computational chemistry code with high parallel scalability. NWChem is a modern, open source, computational chemistry code1 specifically designed for large scale parallel applications2. To meet the challenges of developing efficient, scalable and portable programs of this nature a particular code design was adopted. This code design involved two main features. First of all, the code is build up in a modular fashion so that a large variety of functionality can be integrated easily. Secondly, to facilitate writing complex parallel algorithms the Global Array toolkit was developed. This toolkit allows one to write parallel applications in a shared memory like approach, but offers additional mechanisms to exploit data locality to lower communication overheads. This framework has proven to be very successful in computational chemistry but is applicable to any engineering domain. Within the context created by the features above NWChem has grown into a general purpose computational chemistry code that supports a wide variety of energy expressions and capabilities to calculate properties based there upon. The main energy expressions are classical mechanics force fields, Hartree-Fock and DFT both for finite systems and condensed phase systems, coupled cluster, as well as QM/MM. For most energy expressions single point calculations, geometry optimizations, excited states, and other properties are available. Below we briefly discuss each of the main energy expressions and the critical points involved in scalable implementations thereof.

van Dam, Hubertus JJ; De Jong, Wibe A.; Bylaska, Eric J.; Govind, Niranjan; Kowalski, Karol; Straatsma, TP; Valiev, Marat

Parallelizing compilers automatically translate a sequential program into a parallel program. They simplify parallel programming by freeing the user from the need to consider the details of the parallel architecture and the parallel decomposition. A sourc...

In the last two to three decades the cost of variable speed devices has come down considerably and their energy consumption has improved to the point that they no longer waste energy. Additionally, speed control of the circulating pumps in variable volume flow hydronic systems permits matching the head generated by the pumps to the frictional resistance to flow in the system. This will improve the operation of the control valves and save energy. In view of these advantages, the use of variable speed pumps has become widespread. Nevertheless, the speed control devices are costly, and the question arises whether all pumps in a given installation need be equipped with them. This article will explore the interaction between multiple speed pumps operating in parallel and their interface with the hydronic system they serve. It will specifically address two questions: can one pump be operated at varying speed in parallel with another pump at fixed speed? Also, what is the most economical method of applying variable speed pumping in a given chilled water system? It will be seen that the benefits to be gained from unequal speed operation of parallel pumps are minimal and the benefits are outweighed by the danger inherent in such operation. This practice must be discouraged. The study also will show that in a correctly engineered and analyzed system the number of parallel pumps can be reduced and that not all need be provided with speed control. However, all pumps in parallel operation must be run at the same speed.

Fast parallel matrix multiplication algorithms in SIMD (Single-Instruction-Multiple-Data) and MIMD (MUltiple-Instruction-Multiple-data) modes are described for implementation in a parallel-binary matrix processing system with facilities for bit-wise paral...

The present volume on parallel CFD discusses implementations on parallel machines, numerical algorithms for parallel CFD, and performance evaluation and computer science issues. Attention is given to a parallel algorithm for compressible flows through rotor-stator combinations, a massively parallel Euler solver for unstructured grids, a fast scheme to analyze 3D disk airflow on a parallel computer, and a block implicit multigrid solution of the Euler equations. Topics addressed include a 3D ADI algorithm on distributed memory multiprocessors, clustered element-by-element computations for fluid flow, hypercube FFT and the Fourier pseudospectral method, and an investigation of parallel iterative algorithms for CFD. Also discussed are fluid dynamics using interface methods on parallel processors, sorting for particle flow simulation on the connection machine, a large grain mapping method, and efforts toward a Teraflops capability for CFD.

The historic focus of Automatic Parallelization efforts has been limited in two ways. First, parallelization has generally been attempted only on codes which can be proven to be parallelizeable. Unfortunately, the requisite dependence analysis is undecida...

In this paper we study the potential performance improvements for catastrophe modelling systems that can be achieved through parallelization on a Cell Processor. We studied and parallelized a critical section of catastrophe modelling, the so called \\

Frank K. H. A. Dehne; Glenn Hickey; Andrew Rau-Chaplin; Mark Byrne

We present fast and practical parallel algorithms for the computation and evaluation of interpolating polynomials. The algorithms make use of fast parallel prefix techniques for the calculation of divided differences in the Newton representation of the in...

The report contains descriptions, discussions and comparisons of several types of parallel adders and a class of parallel multipliers. It brings together for a comprehensive view the works of many authors, using common terms as much as possible to facilit...

A number of parallel randomized algorithms have appeared recently. These algorithms typically use a large number of random bits which must be generated in a small amount of time. Nonetheless, the area of parallel random bit generation remains unexplored. ...

The objective of this research effort was to develop a tool to simulate various parallel computer systems. The tools would give users insight into the differ classes of parallel machines in terms of architecture, software, synchronization, communication, ...

Parallel programming requires task scheduling to optimize performance; this primarily involves balancing the load over the processors. In many cases, it is critical to perform task scheduling at runtime. For example, (1) in many parallel applications the ...

I. C. Wu H. T. Kung P. Steenkiste D. O'Hallaron G. Thompson

We discuss the efficiency of parallelization on graphical processing units (GPUs) for the simulation of the one-dimensional Potts model with long-range interactions via parallel tempering. We investigate the behavior of some thermodynamic properties, such as equilibrium energy and magnetization, critical temperatures as well as the separation between the first- and second-order regimes. By implementing multispin coding techniques and an efficient parallelization of the interaction energy computation among threads, the GPU-accelerated approach reached speedup factors of up to 37.

Parallels between the dynamic response of flexible bridges under the action of wind and under the forces induced by crowds allow each field to inform the other. Wind-induced behaviour has been traditionally classified into categories such as flutter, galloping, vortex-induced vibration and buffeting. However, computational advances such as the vortex particle method have led to a more general picture where effects may occur simultaneously and interact, such that the simple semantic demarcations break down. Similarly, the modelling of individual pedestrians has progressed the understanding of human-structure interaction, particularly for large-amplitude lateral oscillations under crowd loading. In this paper, guided by the interaction of flutter and vortex-induced vibration in wind engineering, a framework is presented, which allows various human-structure interaction effects to coexist and interact, thereby providing a possible synthesis of previously disparate experimental and theoretical results. PMID:23690640

McRobie, Allan; Morgenthal, Guido; Abrams, Danny; Prendergast, John

This book examines the present state and future direction of multicomputer parallel architectures for artificial intelligence research and development of artificial intelligence applications. The book provides a survey of the large variety of parallel architectures, describing the current state of the art and suggesting promising architectures to produce artificial intelligence systems such as intelligence systems such as intelligent robots. This book integrates artificial intelligence and parallel processing research areas and discusses parallel processing from the viewpoint of artificial intelligence.

This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.

In the paper, parallelization of finite element modeling of solidification is considered. The core of this modeling is solving\\u000a large sparse linear systems. The Aztec library is used for implementing the model problem on massively parallel computers.\\u000a Now the complete parallel code is available. The performance results of numerical experiments carried out on the IBM SP2 parallel\\u000a computer are presented.

Roman Wyrzykowski; Norbert Sczygiol; Tomasz Olas; Juri Kanevski

The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.

Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to control mini robotic manipulators for positional control; scalable numerical algorithms for reliability, verifications and testability. There appears no fundamental obstacle to simulating molecular compilers and molecular computers on high performance parallel computers, just as the Boeing 777 was simulated on a computer before manufacturing it.

Saini, Subhash; Craw, James M. (Technical Monitor)

This viscometer (which can also be used as a rheometer) is designed for use with liquids over a large temperature range. The device consists of horizontally disposed, similarly sized, parallel plates with a precisely known gap. The lower plate is driven laterally with a motor to apply shear to the liquid in the gap. The upper plate is freely suspended from a double-arm pendulum with a sufficiently long radius to reduce height variations during the swing to negligible levels. A sensitive load cell measures the shear force applied by the liquid to the upper plate. Viscosity is measured by taking the ratio of shear stress to shear rate.

Tony R. Kuphaldt is the creator of All About Circuits, a collection of online textbooks about circuits and electricity. The site is split into volumes, chapters, and topics to make finding and learning about these subjects convenient. Volume 1, Chapter 7: Series-Parallel Combination Circuits digs deeper into these circuits than Chapter 5. This chapter offers a step-by-step analysis technique in order to identify all changes in voltage and current. It also offers a set of detailed instructions for component failure analysis. All in all, this is a great resource for educators or students.

The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.

. We present a massively parallel algorithm for static and dynamicpartitioning of unstructured FEM-meshes. The method consistsof two parts. First a fast but inaccurate sequential clustering is determinedwhich is used, together with a simple mapping heuristic, to mapthe mesh initially onto the processors of a massively parallel system.The second part of the method uses a massively parallel algorithm toremap and

The authors investigate the modeling and analysis of time cost behavior of parallel computations. It is assumed parallel computations reside in a computer system in which there is a limited number of processors, all the processors have the same speed, and they communicate with each other through a shared memory. It has been found that the time costs of parallel

The authors describe CODE (computation-oriented display environment), which can be used to develop modular parallel programs graphically in an environment built around fill-in templates. It also lets programs written in any sequential language be incorporated into parallel programs targeted for any parallel architecture. Broad expressive power was obtained in CODE by including abstractions of all the dependency types that occur

Parallel manipulator robots have complex kinematics and present singular positions within their workspace. For these reasons, in most software simulating parallel robots, each kinematic model should be given in advance by users or programmers. In this paper we present a new tool used to design and to simulate parallel manipulator robots. Explicit kinematic equations are generated automatically depending on the

Many problems, inherent in air traffic control, weather analysis and prediction, nuclear reaction, missile tracking, and hydrodynamics have common processing characteristics that can most efficiently be solved using parallel “non-conventional” techniques. Because of high sensor data rates, these parallel problem solving techniques cannot be economically applied using the standard sequential computer. The application of special processing techniques such as parallel\\/associative

Parallel processing is being used to improve the catalog of earth orbiting satellites and for problems associated with the catalog. Initial efforts centered around using SIMD parallel processors to perform debris conjunction analysis and satellite dynamics studies. More recently, the availability of cheap supercomputing processors and parallel processing software such as PVM have enabled the reutilization of existing astrodynamics software

In this paper we consider productivity challenges for parallel programmers and explore ways that parallel language design might help improve end-user productivity. We offer a candidate list of desirable qualities for a parallel programming language, and describe how these qualities are addressed in the design of the Chapel language. In doing so, we provide an overview of Chapel's features and

Bradford L. Chamberlain; David Callahan; Hans P. Zima

Modern computers will increasingly rely on parallelism to achieve high computation rates. Techniques to automatically detect and exploit parallelism have shown effective for computers with vector capabilities. To employ similar techniques for asynchronous multiprocessor machines, the analysis and transformations used for vectorization must be extended to apply to entire programs rather than single loops. Three subproblems are addressed. A sequential-to-parallel

This is the generic target version of the WFPC2 Archival Pure Parallel program. The program will be used to take paralell images of random areas of the sky, following the recommendations of the Parallels Working Group chaired by Jay Frogel; see their final report at http://www.stsci.edu/hst/parallels.

This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

This paper presents the design and implementation of RISC processor having five stages pipelined architecture. Functional unit parallelism is exploited through the implementation of pipelining in five stages of RISC processor. The hazards which come to life due to parallelism are data, structural, and control hazards. In order to achieve the true benefits of the parallelism through pipelining; these hazards

This paper describes the design process and performance of the optimized parallel optical transmission module. Based on 1×12 VCSEL (Vertical Cavity Surface Emitting Laser) array, we designed and fabricated the high speed parallel optical modules. Our parallel optical module contains a 1×12 VCSEL array, a 12 channel CMOS laser driver circuit, a high speed PCB (Printed Circuit Board), a MT

Rongxuan Shen; Hongda Chen; Chao Zuo; Weihua Pei; Yi Zhou; Jun Tang

Multigrid algorithms are a computational paradigm that enjoy widespread use in the scientific com- munity. While parallel multigrid applications have been in use for quite some time, parallel language support for features common to multigrid algorithms has been lacking. This forces scientists either to express their computations in high-level terms without knowing the parallel impact, or to explicitly man- age

Bradford L. Chamberlain; Steven Deitz; Lawrence Snyder

Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

A linear instability model for multiple spatially periodic supersonic rectangular jets is solved using Floquet-Bloch theory. The disturbance environment is investigated using a two dimensional perturbation of a mean flow. For all cases large temporal growth rates are found. This work is motivated by an increase in mixing found in experimental measurements of spatially periodic supersonic rectangular jets with phase-locked screech. The results obtained in this paper suggests that phase-locked screech or edge tones may produce correlated spatially periodic jet flow downstream of the nozzles which creates a large span wise multi-nozzle region where a disturbance can propagate. The large temporal growth rates for eddies obtained by model calculation herein are related to the increased mixing since eddies are the primary mechanism that transfer energy from the mean flow to the large turbulent structures. Calculations of growth rates are presented for a range of Mach numbers and nozzle spacings corresponding to experimental test conditions where screech synchronized phase locking was observed. The model may be of significant scientific and engineering value in the quest to understand and construct supersonic mixer-ejector nozzles which provide increased mixing and reduced noise.

We use a 2D finite difference computer program to study the effect of fault steps on dynamic ruptures. Our results indicate that a strike-slip earthquake is unlikely to jump a fault step wider than 5 km, in correlation with field observations of moderate to great-sized earthquakes. We also find that dynamically propagating ruptures can jump both compressional and dilational fault

We consider the influence of a CAS context on a learner's process of constructing a justification for the bifurcations in a logistic dynamical process. We describe how instrumentation led to cognitive constructions and how the roles of the learner and the CAS intertwine, especially close to the branching and combining of constructing actions. The…

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver ?? ?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( ?? ?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver N??T?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (N??T?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

Grinberg, Leopold [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)] [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States); Fedosov, Dmitry A. [Institute of Complex Systems and Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich 52425 (Germany)] [Institute of Complex Systems and Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich 52425 (Germany); Karniadakis, George Em, E-mail: george_karniadakis@brown.edu [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)

The cavity formation and propagation process of stress wave from parallel hole cut blasting was simulated with ANSYS\\/LS-DYNA 3D nonlinear dynamic finite element software. The distribution of element plastic strain, node velocity, node time-acceleration history and the blasting cartridge volume ratio during the process were analyzed. It was found that the detonation of charged holes would cause the interaction of

Swimming at low Reynolds number in a fluid confined between two plane walls is studied for an infinite plane sheet located midway between the walls and distorted with a transverse propagating wave. It is shown that the flow pattern is closely related to that for peristaltic pumping. The hydrodynamic interaction between two flexible sheets swimming parallel in infinite space is related to the problem of peristaltic pumping in a planar channel with two wavy walls. PMID:21825515

In this paper, we propose to compare Parallel and sequential architectures for Multi-sensor Multi-target tracking. This contribution aims to contribute in solving the problem of model-based body motion estimation by using data coming from multiple visual sensors. To ensure accurate track we use the interacting multiple model (IMM) associated with the unscented Kalman filter (IMM-UKF). To resolve the problem of

Swimming at low Reynolds number in a fluid confined between two plane walls is studied for an infinite plane sheet located midway between the walls and distorted with a transverse propagating wave. It is shown that the flow pattern is closely related to that for peristaltic pumping. The hydrodynamic interaction between two flexible sheets swimming parallel in infinite space is related to the problem of peristaltic pumping in a planar channel with two wavy walls.

An interactive visualization system pV3 is being developed for the investigation of advanced computational methodologies employing visualization and parallel processing for the extraction of information contained in large-scale transient engineering simulations. Visual techniques for extracting information from the data in terms of cutting planes, iso-surfaces, particle tracing and vector fields are included in this system. This paper discusses improvements to the pV3 system developed under NASA's Affordable High Performance Computing project.

The DSI3D code is designed to numerically solve electromagnetics problems involving complex objects by solving Maxwell`s curl equations in the time-domain and in three space dimensions. The code has been designed to run on the new parallel processing computers as well as on conventional serial computers. The DSI3D code is unique for the following reasons: It runs efficiently on a variety of parallel computers, Allows the use of unstructured non-orthogonal grids, Allows a variety of cell or element types, Reduces to be the Finite Difference Time Domain (FDID) method when orthogonal grids are used, Preserves charge or divergence locally (and globally), Is non- dissipative, and Is accurate for non-orthogonal grids. This method is derived using a Discrete Surface Integration (DSI) technique. As formulated, the DSI technique can be used with essentially arbitrary unstructured grids composed of convex polyhedral cells. This implementation of the DSI algorithm allows the use of unstructured grids that are composed of combinations of non-orthogonal hexahedrons, tetrahedrons, triangular prisms and pyramids. This algorithm reduces to the conventional FDTD method when applied on a structured orthogonal hexahedral grid.

Studying the nature of flow in confined geometries has become increasingly important due to downsizing of equipment. Examples include microfluidic devices as lab-on-a-chip and flow through porous media. Here, we focus on the flow of a single drop in a matrix fluid confined between two parallel walls, where the distance between the walls is in the order of the drop diameter. To model this system a three-dimensional boundary integral method is used with the inclusion of the two parallel walls in the free-space kernels of the boundary integral method. The deformation of a drop in shear flow as function of the capillary number and the distance between the walls is studied. The drop shapes found in the presence of the walls substantially differ from the typical ellipsoidal shaped drops found in unbounded flows. Overall deformation, expressed in the Taylor deformation parameter, increases when reducing the distance between the walls. Furthermore, the angle of the major drop axis with the velocity direction also decreases. A detailed analysis decribing the dynamics of breakup of drops in confined geometries is discussed.

This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

Selected topics of interest from and area of parallel processing systems are investigated. Problems concern specifically an optimal scheduling of jobs subject to a dependency structure, an analysis of the performance of a heuristic assignment schedule in a multiserver system of many competing queues, and the optimal service rate control of a parallel processing system. In general, multi-tasking leads to a stochastic scheduling problem in which n jobs subject to precedence constraints are to be processed on m processors. Of particular interest are intree forms of the precedence constraints and i.i.d. job processing times. Using an optimal stochastic control formulation, it is shown, under some conditions on the distributions, that HLF (Highest Levels First) policies and HLF combined with LERPT (Longest Expected Remaining Processing Time) within each level minimize expected makespan for nonpreemptive and preemptive scheduling, respectively, when m = 2. The relative performance of HLF heuristics are investigated for a model in which the job execution times are i.i.d. with an exponential distribution. Many situations in resource sharing environments can be modeled as a multi-server system of many competing queues.

In this thesis a new parallel execution model for Prolog is presented: The PPP model or Parallel Prolog Processor. The PPP supports AND-parallelism, OR- parallelism, and intelligent backtracking. An implementation of the PPP is described, through the extension of an existing Prolog abstract machine architecture. Several examples of PPP execution are presented and compilation to the PPP abstract instructions set is discussed. The performance effects of this model are reported, based on a simulation of a large benchmark set. The implications of these results for parallel Prolog systems are discussed, and directions for future work are indicated.

This project will help you to understand the different angles created by a transversal cutting across two parallel lines. Please watch for alternate exterior, alternate interior, consecutive, and corresponding angles. Here is an overview of the concepts that will be discussed in this lesson. Take notes.... Parallel Lines and the Angles they Create See if you understand these concepts by completing the following online practice page: Practice with Parallel Lines and Angles This activity from Class Zone will help you to further understand parallel line and perpendicular line theorems. Parallel and Perpendicular Lines Explore this website: Please notice that when you ...

Three-dimensional RF scattering calculations for objects of realistic complexity require parallel computational methods in order to be feasible for design trade-off studies. These are two basic approaches to obtaining parallel speed improvements: vectorization and full parallel processing. In the latter category are shared memory multi-processor machines and distributed memory multi-processor machines. After reviewing the approaches to parallel computation, the authors consider the changes required to adapt RF scattering analysis to each major parallel architecture. A comparison of conversion effort and execution performance is presented for representative computers.

Bedrosian, G.; D'Angelo, J.; DeBlois, A. (General Electric Co., Schenectady, NY (USA). Corporate Research and Development Center)

Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.

Parallel programming and calculation performance were examined by using two types of MIMD parallel systems, that is, a transputer (T800) network and iPSC/860. Some interface subroutines were developed to apply the programs parallelized by using a transputer network to iPSC/860. Compatibility and performance of parallelized programs are discussed.

The goals of this paper is to design a Prolog system that automatically exploits parallelism in Prolog with low overhead memory management and task management schemes, and to demonstrate by means of detailed simulations that such a Prolog system can indeed achieve a significant speedup over the fastest sequential Prolog systems. The authors achieve these goals by first identifying the large sources of overhead in parallel Prolog execution: side-effects caused by parallel tasks, choicepoints created by parallel tasks, tasks creation, task scheduling, task suspension and context switching. The authors then identify a form of parallelism, called flow parallelism, that can be exploited with low overhead because parallel execution is restricted to goals that do not cause side-effects and do not create choicepoints. The authors develop a master-slave model of parallel execution that eliminates task suspension and context switching. The model uses program partitioning and task scheduling techniques that do not require task suspension and context switching to prevent deadlock. The authors identify architectural techniques to support the parallel execution model and develop the Flow Parallel Prolog Machines (FPPM) architecture and implementation. Finally, the authors evaluate the performance of FPPM and investigate the design tradeoffs using measurements on a detailed, register- transfer level simulator. FPPM achieves an average speedup of about a factor of 2 (as much as a factor of 5 for some programs) over the current highest performance sequential Prolog implementation, the VLSI-BAM. The speedups over other parallel Prolog systems are much larger.

The research described in this book addresses the semantic gap between logic programming languages and the architecture of parallel computers the problem of how to implement logic programming languages on parallel computers in a way that can most effectively exploit the inherent parallelism of the language and efficiently utilize the parallel architecture of the computer. Following a review of other research results, the first project explores the possibilities of implementing logic programs on MIMD, nonshared memory massively parallel computers containing 100 to 1,000 processing elements. The second investigates the possibility of implementing Prolog on a distributed processor array. The author's objectives are to define a parallel computational paradigm (the extended cellular-dataflow model) that can be used to create a parallel Prolog abstract machine.

The subject of input/output (I/O) was often been neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Par