Goerner, Frank L.; Duong, Timothy; Stafford, R. Jason; Clarke, Geoffrey D.
2013-01-01
Purpose: To investigate the utility of five different standard measurement methods for determining image uniformity for partially parallel imaging (PPI) acquisitions in terms of consistency across a variety of pulse sequences and reconstruction strategies. Methods: Images were produced with a phantom using a 12-channel head matrix coil in a 3T MRI system (TIM TRIO, Siemens Medical Solutions, Erlangen, Germany). Images produced using echo-planar, fast spin echo, gradient echo, and balanced steady state free precession pulse sequences were evaluated. Two different PPI reconstruction methods were investigated, generalized autocalibrating partially parallel acquisition algorithm (GRAPPA) and modified sensitivity-encoding (mSENSE) with acceleration factors (R) of 2, 3, and 4. Additionally images were acquired with conventional, two-dimensional Fourier imaging methods (R = 1). Five measurement methods of uniformity, recommended by the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) were considered. The methods investigated were (1) an ACR method and a (2) NEMA method for calculating the peak deviation nonuniformity, (3) a modification of a NEMA method used to produce a gray scale uniformity map, (4) determining the normalized absolute average deviation uniformity, and (5) a NEMA method that focused on 17 areas of the image to measure uniformity. Changes in uniformity as a function of reconstruction method at the same R-value were also investigated. Two-way analysis of variance (ANOVA) was used to determine whether R-value or reconstruction method had a greater influence on signal intensity uniformity measurements for partially parallel MRI. Results: Two of the methods studied had consistently negative slopes when signal intensity uniformity was plotted against R-value. The results obtained comparing mSENSE against GRAPPA found no consistent difference between GRAPPA and mSENSE with regard to signal intensity uniformity
NASA Astrophysics Data System (ADS)
Lee, Mike M.; Cho, Byung Lok
2001-11-01
In this paper, we proposed a new First Partial product Addition (FPA) architecture with new compressor (or parallel counter) to CSA tree built in the process of adding partial product for improving speed in the fast parallel multiplier to improve the speed of calculating partial product by about 20% compared with existing parallel counter using full Adder. The new circuit reduces the CLA bit finding final sum by N/2 using the novel FPA architecture. A 5.14ns of multiplication speed of the 16X16 multiplier is obtained using 0.25um CMOS technology. The architecture of the multiplier is easily opted for pipeline design and demonstrates high speed performance.
The Force Singularity for Partially Immersed Parallel Plates
NASA Astrophysics Data System (ADS)
Bhatnagar, Rajat; Finn, Robert
2016-05-01
In earlier work, we provided a general description of the forces of attraction and repulsion, encountered by two parallel vertical plates of infinite extent and of possibly differing materials, when partially immersed in an infinite liquid bath and subject to surface tension forces. In the present study, we examine some unusual details of the exotic behavior that can occur at the singular configuration separating infinite rise from infinite descent of the fluid between the plates, as the plates approach each other. In connection with this singular behavior, we present also some new estimates on meniscus height details.
Solution of partial differential equations on vector and parallel computers
NASA Technical Reports Server (NTRS)
Ortega, J. M.; Voigt, R. G.
1985-01-01
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Applicability of Parallel Computing to Partial Wave Analysis
NASA Astrophysics Data System (ADS)
Ruger, Justin; Gilfoyle, Gerard; Weygand, Dennis; CLAS Collaboration
2013-10-01
Bound states of Quantum Chromodynamics (QCD) give insights into the nature of confinement, a key element of the strong interaction. States may be identified from weak signals extracted from the analysis of high statistics data from reactions with many final state particles. One of the best tools for the analysis of these reactions is Partial Wave Analysis (PWA). PWA transforms an ensemble of experimental data from a large acceptance detector from free particle eigenstates to angular momentum eigenstates. The PWA program must be fast enough to deal with the large amounts of data available currently, as processing time scales with the number of events. The scope of this research is to study the applicability and scalability of Intel's Xeon Phi using the Many Integrated Core (MIC) architecture when applied to the existing PWA code at Jefferson Laboratory. An algorithm was developed for the Xeon Phi and scaled across 240 available threads, giving parallel functionality to the PWA which was originally written serially. This scaling can make the fitting process fifteen times faster. Supported by the US Department of Energy.
Dynamic grid refinement for partial differential equations on parallel computers
NASA Technical Reports Server (NTRS)
Mccormick, S.; Quinlan, D.
1989-01-01
The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems.
Polarization imaging apparatus with auto-calibration
Zou, Yingyin Kevin; Zhao, Hongzhi; Chen, Qiushui
2013-08-20
A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5.degree., a second variable phase retarder with its optical axis aligned 45.degree., a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I.sub.0, I.sub.1, I.sub.2 and I.sub.3 of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (.pi.,0), (.pi.,.pi.) and (.pi./2,.pi.), respectively. Then four Stokes components of a Stokes image, S.sub.0, S.sub.1, S.sub.2 and S.sub.3 were calculated using the four intensity images.
Polarization Imaging Apparatus with Auto-Calibration
NASA Technical Reports Server (NTRS)
Zou, Yingyin Kevin (Inventor); Zhao, Hongzhi (Inventor); Chen, Qiushui (Inventor)
2013-01-01
A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5 deg, a second variable phase retarder with its optical axis aligned 45 deg, a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I(sub 0), I(sub 1), I(sub 2) and I(sub 3) of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (pi,0), (pi,pi) and (pi/2,pi), respectively. Then four Stokes components of a Stokes image, S(sub 0), S(sub 1), S(sub 2) and S(sub 3) were calculated using the four intensity images.
NASA Technical Reports Server (NTRS)
Toomarian, N.; Fijany, A.; Barhen, J.
1993-01-01
Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.
Characterization of high resolution MR images reconstructed by a GRAPPA based parallel technique
NASA Astrophysics Data System (ADS)
Banerjee, Suchandrima; Majumdar, Sharmila
2006-03-01
This work implemented an auto-calibrating parallel imaging technique and applied it to in vivo magnetic resonance imaging (MRI) of trabecular bone micro-architecture. A Generalized auto-calibrating partially parallel acquisition (GRAPPA) based reconstruction technique using modified robust data fitting was developed. The MR data was acquired with an eight channel phased array receiver on three normal volunteers on a General Electric 3 Tesla scanner. Microstructures comprising the trabecular bone architecture are of the order of 100 microns and hence their depiction requires very high imaging resolution. This work examined the effects of GRAPPA based parallel imaging on signal and noise characteristics and effective spatial resolution in high resolution (HR) images, for the range of undersampling or reduction factors 2-4. Additionally quantitative analysis was performed to obtain structural measures of trabecular bone from the images. Image quality in terms of contrast and depiction of structures was maintained in parallel images for reduction factors up to 3. Comparison between regular and parallel images suggested similar spatial resolution for both. However differences in noise characteristics in parallel images compared to regular images affected the threshholding based quantification. This suggested that GRAPPA based parallel images might require different analysis techniques. In conclusion, the study showed the feasibility of using parallel imaging techniques in HR-MRI of trabecular bone, although quantification strategies will have to be further investigated. Reduction of acquisition time using parallel techniques can improve the clinical feasibility of MRI of trabecular bone for prognosis and staging of the skeletal disorder osteoporosis.
Allidina, A.Y.; Malinowski, K.; Singh, M.G.
1982-12-01
The possibilities were explored for enhancing parallelism in the simulation of systems described by algebraic equations, ordinary differential equations and partial differential equations. These techniques, using multiprocessors, were developed to speed up simulations, e.g. for nuclear accidents. Issues involved in their design included suitable approximations to bring the problem into a numerically manageable form and a numerical procedure to perform the computations necessary to solve the problem accurately. Parallel processing techniques used as simulation procedures, and a design of a simulation scheme and simulation procedure employing parallel computer facilities, were both considered.
NASA Technical Reports Server (NTRS)
Nguyen, Howard; Willacy, Karen; Allen, Mark
2012-01-01
KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.
NASA Astrophysics Data System (ADS)
Lyu, Jingyuan; Nakarmi, Ukash; Zhang, Chaoyi; Ying, Leslie
2016-05-01
This paper presents a new approach to highly accelerated dynamic parallel MRI using low rank matrix completion, partial separability (PS) model. In data acquisition, k-space data is moderately randomly undersampled at the center kspace navigator locations, but highly undersampled at the outer k-space for each temporal frame. In reconstruction, the navigator data is reconstructed from undersampled data using structured low-rank matrix completion. After all the unacquired navigator data is estimated, the partial separable model is used to obtain partial k-t data. Then the parallel imaging method is used to acquire the entire dynamic image series from highly undersampled data. The proposed method has shown to achieve high quality reconstructions with reduction factors up to 31, and temporal resolution of 29ms, when the conventional PS method fails.
Autocalibration in hydrologic modeling: Using SWAT2005 in small-scale watersheds
Technology Transfer Automated Retrieval System (TEKTRAN)
SWAT is a physically-based model that can simulate water quality and quantity at the watershed scale. Due to many of the processes involved in the manual or auto-calibration of model parameters and the knowledge of realistic input values, calibration can become difficult. An autocalibration-sensitiv...
Adaptive methods and parallel computation for partial differential equations. Final report
Biswas, R.; Benantar, M.; Flaherty, J.E.
1992-05-01
Consider the adaptive solution of two-dimensional vector systems of hyperbolic and elliptic partial differential equations on shared-memory parallel computers. Hyperbolic systems are approximated by an explicit finite volume technique and solved by a recursive local mesh refinement procedure on a tree-structured grid. Local refinement of the time steps and spatial cells of a coarse base mesh is performed in regions where a refinement indicator exceeds a prescribed tolerance. Computational procedures that sequentially traverse the tree while processing solutions on each grid in parallel, that process solutions at the same tree level in parallel, and that dynamically assign processors to nodes of the tree have been developed and applied to an example. Computational results comparing a variety of heuristic processor load balancing techniques and refinement strategies are presented.
Fast parallel algorithms and enumeration techniques for partial k-trees
Narayanan, C.
1989-01-01
Recent research by several authors have resulted in systematic way of developing linear-time sequential algorithms for a host of problem: on a fairly general class of graphs variously known as bounded decomposable graphs, graphs of bounded treewidth, partial k-trees, etc. Partial k-trees arise in a variety of real-life applications such as network reliability, VLSI design and database systems and hence fast sequential algorithms on these graphs have been found to be desirable. The linear-time methodologies were independently developed by Bern, Lawler, and Wong ((10)), Arnborg and Proskurowski ((6)), Bodlaender ((14)), and Courcelle ((25)). Wimer ((89)) significantly extended the work of Bern, Lawler and Wong. All of these approaches share the common thread of using dynamic programming on a tree structure. In particular the methodology of Wimer uses a parse-tree as the data structure. The methodologies claim linear-time algorithms on partial k-trees for fixed k, for a number of combinatorial optimization problems given the tree structure as input. It is known that obtaining the tree structure is NP-hard. This dissertation investigates three important classes of problems: (1) Developing parallel algorithms for constructing a k-tree embedding, finding a tree decomposition and most notably obtaining a parse-tree for a partial k-tree. (2) Developing parallel algorithms for parse-tree computations, testing isomorphism of k-trees, and finding a 2-tree embedding of a cactus. (3) Obtaining techniques for counting vertex/edge subsets satisfying a certain property in some classes of partial k-trees. The parallel algorithms the author has developed are in class NC and are either new or improve upon the existing results of Bodlaender (13). The difference equations he has obtained for counting certain sub-graphs are not known in the literature so far.
Auto-calibration system of EMG sensor suit
NASA Astrophysics Data System (ADS)
Suzuki, Yousuke; Tanaka, Takayuki; Feng, Maria Q.
2005-12-01
Biogenic measurement has been studied as a robot's interface. We have studied the wearable sensor suit as a robot's interface. Some kinds of sensor disks are embedded the sensor suit to the wet suit-like material. The sensor suit measures a wearing person's joint, and muscular activity. In this report, we aim to establish an auto-calibration system for measuring joint torques by using EMG sensors based on neural network and sensor disks of a lattice. The Torque presumption was performed using the share neural network, which learned the data that formed the whole subject's teacher data. Additional training of the share neural network was carried out using the individual teaching data. As a result, that was able to do the neural network training in short time, high probability and high accuracy to training of initial neural network. Moreover, high-presumed accuracy was able to be acquired by this method Next, Sensor disks of a lattice was developed. EMG is measurable, checking the state of an electrode by that can measure biogenic impedance. That was able to measure EMG by sensor disks which has low impedance We measured EMG and joint torque by trial production sensor suit and torque measuring instrument. The predominancy of the torque presumption using the share neural network was check. We proposed Measurement system, which consists sensor disk of lattice. Experimental results show the proposed method is effective for the auto-calibration.
Active catheter tracking using parallel MRI and real-time image reconstruction.
Bock, Michael; Müller, Sven; Zuehlsdorff, Sven; Speier, Peter; Fink, Christian; Hallscheidt, Peter; Umathum, Reiner; Semmler, Wolfhard
2006-06-01
In this work active MR catheter tracking with automatic slice alignment was combined with an autocalibrated parallel imaging technique. Using an optimized generalized autocalibrating partially parallel acquisitions (GRAPPA) algorithm with an acceleration factor of 2, we were able to reduce the acquisition time per image by 34%. To accelerate real-time GRAPPA image reconstruction, the coil sensitivities were updated only after slice reorientation. For a 2D trueFISP acquisition (160 x 256 matrix, 80% phase matrix, half Fourier acquisition, TR = 3.7 ms, GRAPPA factor = 2) real-time image reconstruction was achieved with up to six imaging coils. In a single animal experiment the method was used to steer a catheter from the vena cava through the beating heart into the pulmonary vasculature at an image update rate of about five images per second. Under all slice orientations, parallel image reconstruction was accomplished with only minor image artifacts, and the increased temporal resolution provided a sharp delineation of intracardial structures, such as the papillary muscle. PMID:16683261
Analysis and Modeling of Parallel Photovoltaic Systems under Partial Shading Conditions
NASA Astrophysics Data System (ADS)
Buddala, Santhoshi Snigdha
Since the industrial revolution, fossil fuels like petroleum, coal, oil, natural gas and other non-renewable energy sources have been used as the primary energy source. The consumption of fossil fuels releases various harmful gases into the atmosphere as byproducts which are hazardous in nature and they tend to deplete the protective layers and affect the overall environmental balance. Also the fossil fuels are bounded resources of energy and rapid depletion of these sources of energy, have prompted the need to investigate alternate sources of energy called renewable energy. One such promising source of renewable energy is the solar/photovoltaic energy. This work focuses on investigating a new solar array architecture with solar cells connected in parallel configuration. By retaining the structural simplicity of the parallel architecture, a theoretical small signal model of the solar cell is proposed and modeled to analyze the variations in the module parameters when subjected to partial shading conditions. Simulations were run in SPICE to validate the model implemented in Matlab. The voltage limitations of the proposed architecture are addressed by adopting a simple dc-dc boost converter and evaluating the performance of the architecture in terms of efficiencies by comparing it with the traditional architectures. SPICE simulations are used to compare the architectures and identify the best one in terms of power conversion efficiency under partial shading conditions.
Nuclear norm-regularized k-space-based parallel imaging reconstruction
NASA Astrophysics Data System (ADS)
Xu, Lin; Liu, Xiaoyun
2014-04-01
Parallel imaging reconstruction suffers from serious noise amplification at high accelerations that can be alleviated with regularization by imposing some prior information or constraints on image. Nevertheless, point-wise interpolation of missing k-space data restricts the use of prior information in k-space-based parallel imaging reconstructions like generalized auto-calibrating partial acquisitions (GRAPPA). In this study, a regularized k-space based parallel imaging reconstruction is presented. We first formulate the reconstruction of missing data within a patch as a linear inverse problem. Instead of exploiting prior information on image or its transform domain, the proposed method exploits the rank deficiency of structured matrix consisting of vectorized patches form entire k-space, which leads to a nuclear norm-regularized problem solved by the numeric algorithms iteratively. Brain imaging studies are performed, demonstrating that the proposed method is capable of mitigating noise at high accelerations in GRAPPA reconstruction.
Parallelizing across time when solving time-dependent partial differential equations
Worley, P.H.
1991-09-01
The standard numerical algorithms for solving time-dependent partial differential equations (PDEs) are inherently sequential in the time direction. This paper describes algorithms for the time-accurate solution of certain classes of linear hyperbolic and parabolic PDEs that can be parallelized in both time and space and have serial complexities that are proportional to the serial complexities of the best known algorithms. The algorithms for parabolic PDEs are variants of the waveform relaxation multigrid method (WFMG) of Lubich and Ostermann where the scalar ordinary differential equations (ODEs) that make up the kernel of WFMG are solved using a cyclic reduction type algorithm. The algorithms for hyperbolic PDEs use the cyclic reduction algorithm to solve ODEs along characteristics. 43 refs.
NASA Technical Reports Server (NTRS)
Hunt, L. R.; Villarreal, Ramiro
1987-01-01
System theorists understand that the same mathematical objects which determine controllability for nonlinear control systems of ordinary differential equations (ODEs) also determine hypoellipticity for linear partial differentail equations (PDEs). Moreover, almost any study of ODE systems begins with linear systems. It is remarkable that Hormander's paper on hypoellipticity of second order linear p.d.e.'s starts with equations due to Kolmogorov, which are shown to be analogous to the linear PDEs. Eigenvalue placement by state feedback for a controllable linear system can be paralleled for a Kolmogorov equation if an appropriate type of feedback is introduced. Results concerning transformations of nonlinear systems to linear systems are similar to results for transforming a linear PDE to a Kolmogorov equation.
Parallel Proportion Fair Scheduling in DAS with Partial Channel State Information
NASA Astrophysics Data System (ADS)
Jiang, Zhanjun; Wu, Jiang; Wang, Dongming; You, Xiaohu
A parallel multiplexing scheduling (PMS) scheme is proposed for distributed antenna systems (DAS), which greatly improves average system throughput due to multi-user diversity and multi-user multiplexing. However, PMS has poor fairness because of the use of the “best channel selection” criteria in the scheduler. Thus we present a parallel proportional fair scheduling (PPFS) scheme, which combines PMS with proportional fair scheduling (PFS) to achieve a tradeoff between average throughput and fairness. In PPFS, the “relative signal to noise ratio (SNR)” is employed as a metric to select the user instead of the “relative throughput” in the original PFS. And only partial channel state information (CSI) is fed back to the base station (BS) in PPFS. Moreover, there are multiple users selected to transmit simultaneously at each slot in PPFS, while only one user occupies all channel resources at each slot in PFS. Consequently, PPFS improves fairness performance of PMS greatly with a relatively small loss of average throughput compared to PFS.
Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations
NASA Technical Reports Server (NTRS)
Fijany, Amir
1993-01-01
In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.
A Partial Order Reduction Technique for Parallel Timed Automaton Model Checking
NASA Astrophysics Data System (ADS)
Jianhua, Zhao; Linzhang, Wang; Xuandong, Li
We propose a partial order reduction technique for timed automaton model checking in this paper. We first show that the symbolic successors w.r.t. partial order paths can be computed using DBMs. An algorithm is presented to compute such successors incrementally. This algorithm can avoid splitting the symbolic states because of the enumeration order of independent transitions. A reachability analysis algorithm based on this successor computation algorithm is presented. Our technique can be combined with some static analysis techniques in the literate. Further more, we present a rule to avoid exploring all enabled transitions, thus the space requirements of model checking are further reduced.
Choongsang Cho; Sangkeun Lee
2016-04-01
Image smoothing has been used for image segmentation, image reconstruction, object classification, and 3D content generation. Several smoothing approaches have been used at the pre-processing step to retain the critical edge, while removing noise and small details. However, they have limited performance, especially in removing small details and smoothing discrete regions. Therefore, to provide fast and accurate smoothing, we propose an effective scheme that uses a weighted combination of the gradient, Laplacian, and diagonal derivatives of a smoothed image. In addition, to reduce computational complexity, we designed and implemented a parallel processing structure for the proposed scheme on a graphics processing unit (GPU). For an objective evaluation of the smoothing performance, the images were linearly quantized into several layers to generate experimental images, and the quantized images were smoothed using several methods for reconstructing the smoothly changed shape and intensity of the original image. Experimental results showed that the proposed scheme has higher objective scores and better successful smoothing performance than similar schemes, while preserving and removing critical and trivial details, respectively. For computational complexity, the proposed smoothing scheme running on a GPU provided 18 and 16 times lower complexity than the proposed smoothing scheme running on a CPU and the L0-based smoothing scheme, respectively. In addition, a simple noise reduction test was conducted to show the characteristics of the proposed approach; it reported that the presented algorithm outperforms the state-of-the art algorithms by more than 5.4 dB. Therefore, we believe that the proposed scheme can be a useful tool for efficient image smoothing. PMID:26886985
NASA Astrophysics Data System (ADS)
Acebrón, Juan A.; Rodríguez-Rozas, Ángel
2011-09-01
A probabilistic representation for initial value semilinear parabolic problems based on generalized random trees has been derived. Two different strategies have been proposed, both requiring generating suitable random trees combined with a Pade approximant for approximating accurately a given divergent series. Such series are obtained by summing the partial contribution to the solution coming from trees with arbitrary number of branches. The new representation greatly expands the class of problems amenable to be solved probabilistically, and was used successfully to develop a generalized probabilistic domain decomposition method. Such a method has been shown to be suited for massively parallel computers, enjoying full scalability and fault tolerance. Finally, a few numerical examples are given to illustrate the remarkable performance of the algorithm, comparing the results with those obtained with a classical method.
Technology Transfer Automated Retrieval System (TEKTRAN)
Autocalibration of a water quality model such as SWAT (Soil and Water Assessment Tool) can be a powerful, labor-saving tool. When multi-gage or multi-pollutant calibration is desired, autocalibration is essential because the time involved in manual calibration becomes prohibitive. The ArcSWAT Interf...
Prototype of an auto-calibrating, context-aware, hybrid brain-computer interface.
Faller, J; Torrellas, S; Miralles, F; Holzner, C; Kapeller, C; Guger, C; Bund, J; Müller-Putz, G R; Scherer, R
2012-01-01
We present the prototype of a context-aware framework that allows users to control smart home devices and to access internet services via a Hybrid BCI system of an auto-calibrating sensorimotor rhythm (SMR) based BCI and another assistive device (Integra Mouse mouth joystick). While there is extensive literature that describes the merit of Hybrid BCIs, auto-calibrating and co-adaptive ERD BCI training paradigms, specialized BCI user interfaces, context-awareness and smart home control, there is up to now, no system that includes all these concepts in one integrated easy-to-use framework that can truly benefit individuals with severe functional disabilities by increasing independence and social inclusion. Here we integrate all these technologies in a prototype framework that does not require expert knowledge or excess time for calibration. In a first pilot-study, 3 healthy volunteers successfully operated the system using input signals from an ERD BCI and an Integra Mouse and reached average positive predictive values (PPV) of 72 and 98% respectively. Based on what we learned here we are planning to improve the system for a test with a larger number of healthy volunteers so we can soon bring the system to benefit individuals with severe functional disability. PMID:23366267
Sparse Auto-Calibration for Radar Coincidence Imaging with Gain-Phase Errors
Zhou, Xiaoli; Wang, Hongqiang; Cheng, Yongqiang; Qin, Yuliang
2015-01-01
Radar coincidence imaging (RCI) is a high-resolution staring imaging technique without the limitation of relative motion between target and radar. The sparsity-driven approaches are commonly used in RCI, while the prior knowledge of imaging models needs to be known accurately. However, as one of the major model errors, the gain-phase error exists generally, and may cause inaccuracies of the model and defocus the image. In the present report, the sparse auto-calibration method is proposed to compensate the gain-phase error in RCI. The method can determine the gain-phase error as part of the imaging process. It uses an iterative algorithm, which cycles through steps of target reconstruction and gain-phase error estimation, where orthogonal matching pursuit (OMP) and Newton’s method are used, respectively. Simulation results show that the proposed method can improve the imaging quality significantly and estimate the gain-phase error accurately. PMID:26528981
Gerdes, Lee; Gerdes, Peter; Lee, Sung W; H Tegeler, Charles
2013-03-01
Disturbances of neural oscillation patterns have been reported with many disease states. We introduce methodology for HIRREM™ (high-resolution, relational, resonance-based electroencephalic mirroring), also known as Brainwave Optimization™, a noninvasive technology to facilitate relaxation and auto-calibration of neural oscillations. HIRREM is a precision-guided technology for allostatic therapeutics, intended to help the brain calibrate its own functional set points to optimize fitness. HIRREM technology collects electroencephalic data through two-channel recordings and delivers a series of audible musical tones in near real time. Choices of tone pitch and timing are made by mathematical algorithms, principally informed by the dominant frequency in successive instants of time, to permit resonance between neural oscillatory frequencies and the musical tones. Relaxation of neural oscillations through HIRREM appears to permit auto-calibration toward greater hemispheric symmetry and more optimized proportionation of regional spectral power. To illustrate an application of HIRREM, we present data from a randomized clinical trial of HIRREM as an intervention for insomnia (n = 19). On average, there was reduction of right-dominant temporal lobe high-frequency (23-36 Hz) EEG asymmetry over the course of eight successive HIRREM sessions. There was a trend for correlation between reduction of right temporal lobe dominance and magnitude of insomnia symptom reduction. Disturbances of neural oscillation have implications for both neuropsychiatric health and downstream peripheral (somatic) physiology. The possibility of noninvasive optimization for neural oscillatory set points through HIRREM suggests potentially multitudinous roles for this technology. Research is currently ongoing to further explore its potential applications and mechanisms of action. PMID:23532171
Gerdes, Lee; Gerdes, Peter; Lee, Sung W; H Tegeler, Charles
2013-01-01
Disturbances of neural oscillation patterns have been reported with many disease states. We introduce methodology for HIRREM™ (high-resolution, relational, resonance-based electroencephalic mirroring), also known as Brainwave Optimization™, a noninvasive technology to facilitate relaxation and auto-calibration of neural oscillations. HIRREM is a precision-guided technology for allostatic therapeutics, intended to help the brain calibrate its own functional set points to optimize fitness. HIRREM technology collects electroencephalic data through two-channel recordings and delivers a series of audible musical tones in near real time. Choices of tone pitch and timing are made by mathematical algorithms, principally informed by the dominant frequency in successive instants of time, to permit resonance between neural oscillatory frequencies and the musical tones. Relaxation of neural oscillations through HIRREM appears to permit auto-calibration toward greater hemispheric symmetry and more optimized proportionation of regional spectral power. To illustrate an application of HIRREM, we present data from a randomized clinical trial of HIRREM as an intervention for insomnia (n = 19). On average, there was reduction of right-dominant temporal lobe high-frequency (23–36 Hz) EEG asymmetry over the course of eight successive HIRREM sessions. There was a trend for correlation between reduction of right temporal lobe dominance and magnitude of insomnia symptom reduction. Disturbances of neural oscillation have implications for both neuropsychiatric health and downstream peripheral (somatic) physiology. The possibility of noninvasive optimization for neural oscillatory set points through HIRREM suggests potentially multitudinous roles for this technology. Research is currently ongoing to further explore its potential applications and mechanisms of action. PMID:23532171
Huang, Feng; Lin, Wei; Duensing, George R; Reykowski, Arne
2012-09-01
Because dynamic MR images are often sparse in x-f domain, k-t space compressed sensing (k-t CS) has been proposed for highly accelerated dynamic MRI. When a multichannel coil is used for acquisition, the combination of partially parallel imaging and k-t CS can improve the accuracy of reconstruction. In this work, an efficient combination method is presented, which is called k-t sparse Generalized GRAPPA fOr Wider readout Line. One fundamental aspect of this work is to apply partially parallel imaging and k-t CS sequentially. A partially parallel imaging technique using a Generalized GRAPPA fOr Wider readout Line operator is adopted before k-t CS reconstruction to decrease the reduction factor in a computationally efficient way while preserving temporal resolution. Channel combination and relative sensitivity maps are used in the flexible virtual coil scheme to alleviate the k-t CS computational load with increasing number of channels. Using k-t FOCUSS as a specific example of k-t CS, the experiments with Cartesian and radial data sets demonstrate that k-t sparse Generalized GRAPPA fOr Wider readout Line can produce results with two times lower root-mean-square error than conventional channel-by-channel k-t CS while consuming up to seven times less computational cost. PMID:22162191
NASA Astrophysics Data System (ADS)
Vijayalekshmy, S.; Rama Iyer, S.; Beevi, Bisharathu
2015-09-01
The output power from the photovoltaic (PV) array decreases and the array exhibit multiple peaks when it is subjected to partial shading (PS). The power loss in the PV array varies with the array configuration, physical location and the shading pattern. This paper compares the relative performance of a PV array consisting of a short string of three PV modules for two different configurations. The mismatch loss, shading loss, fill factor and the power loss due to the failure in tracking of the global maximum power point, of a series string with bypass diodes and short parallel string are analysed using MATLAB/Simulink model. The performance of the system is investigated for three different conditions of solar insolation for the same shading pattern. Results indicate that there is considerable power loss due to shading in a series string during PS than in a parallel string with same number of modules.
Auto-Calibration of SOL-ACES in the EUV Spectral Region
NASA Astrophysics Data System (ADS)
Schmidtke, G.; Brunner, R.; Eberhard, D.; Hofmann, A.; Klocke, U.; Knothe, M.; Konz, W.; Riedel, W.-J.; Wolf, H.
The Sol-ACES (SOLAR Auto-Calibrating EUV/UV Spectrometers) experiment is prepared to be flown with the ESA SOLAR payload to the International Space Station as planned for the Shuttle mission E1 in August 2006. Four grazing incidence spectrometers of planar geometry cover the wavelength range from 16-220 nm with a spectral resolution from 0.5-2.3 nm. These high-efficiency spectrometers will be re-calibrated by two three-signal ionization chambers to be operated with 44 band pass filters on routine during the mission. Re-measuring the filter transmissions with the spectrometers also allows a very accurate determination of the changing second (optical) order efficiencies of the spectrometers as well as the stray light contributions to the spectral recording in different wavelength ranges. In this context the primary requirements for measurements of high radiometric accuracy will be discussed in detail. - The absorption gases of the ionization chambers are neon, xenon and a mixture of 10 % nitric oxide and 90 % xenon. As the laboratory measurements show that by this method secondary effects can be determined to a high degree resulting in very accurate irradiance measurements that is ranging from 5 to 3 % in absolute terms depending on the wavelegth range.
NASA Astrophysics Data System (ADS)
Ma, Sangback
In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The
Automatic High-Bandwidth Calibration and Reconstruction of Arbitrarily Sampled Parallel MRI
Aelterman, Jan; Naeyaert, Maarten; Gutierrez, Shandra; Luong, Hiep; Goossens, Bart; Pižurica, Aleksandra; Philips, Wilfried
2014-01-01
Today, many MRI reconstruction techniques exist for undersampled MRI data. Regularization-based techniques inspired by compressed sensing allow for the reconstruction of undersampled data that would lead to an ill-posed reconstruction problem. Parallel imaging enables the reconstruction of MRI images from undersampled multi-coil data that leads to a well-posed reconstruction problem. Autocalibrating pMRI techniques encompass pMRI techniques where no explicit knowledge of the coil sensivities is required. A first purpose of this paper is to derive a novel autocalibration approach for pMRI that allows for the estimation and use of smooth, but high-bandwidth coil profiles instead of a compactly supported kernel. These high-bandwidth models adhere more accurately to the physics of an antenna system. The second purpose of this paper is to demonstrate the feasibility of a parameter-free reconstruction algorithm that combines autocalibrating pMRI and compressed sensing. Therefore, we present several techniques for automatic parameter estimation in MRI reconstruction. Experiments show that a higher reconstruction accuracy can be had using high-bandwidth coil models and that the automatic parameter choices yield an acceptable result. PMID:24915203
van Hees, Vincent T.; Fang, Zhou; Langford, Joss; Assah, Felix; Mohammad, Anwar; da Silva, Inacio C. M.; Trenell, Michael I.; White, Tom; Wareham, Nicholas J.
2014-01-01
Wearable acceleration sensors are increasingly used for the assessment of free-living physical activity. Acceleration sensor calibration is a potential source of error. This study aims to describe and evaluate an autocalibration method to minimize calibration error using segments within the free-living records (no extra experiments needed). The autocalibration method entailed the extraction of nonmovement periods in the data, for which the measured vector magnitude should ideally be the gravitational acceleration (1 g); this property was used to derive calibration correction factors using an iterative closest-point fitting process. The reduction in calibration error was evaluated in data from four cohorts: UK (n = 921), Kuwait (n = 120), Cameroon (n = 311), and Brazil (n = 200). Our method significantly reduced calibration error in all cohorts (P < 0.01), ranging from 16.6 to 3.0 mg in the Kuwaiti cohort to 76.7 to 8.0 mg error in the Brazil cohort. Utilizing temperature sensor data resulted in a small nonsignificant additional improvement (P > 0.05). Temperature correction coefficients were highest for the z-axis, e.g., 19.6-mg offset per 5°C. Further, application of the autocalibration method had a significant impact on typical metrics used for describing human physical activity, e.g., in Brazil average wrist acceleration was 0.2 to 51% lower than uncalibrated values depending on metric selection (P < 0.01). The autocalibration method as presented helps reduce the calibration error in wearable acceleration sensor data and improves comparability of physical activity measures across study locations. Temperature ultization seems essential when temperature deviates substantially from the average temperature in the record but not for multiday summary measures. PMID:25103964
van Hees, Vincent T; Fang, Zhou; Langford, Joss; Assah, Felix; Mohammad, Anwar; da Silva, Inacio C M; Trenell, Michael I; White, Tom; Wareham, Nicholas J; Brage, Søren
2014-10-01
Wearable acceleration sensors are increasingly used for the assessment of free-living physical activity. Acceleration sensor calibration is a potential source of error. This study aims to describe and evaluate an autocalibration method to minimize calibration error using segments within the free-living records (no extra experiments needed). The autocalibration method entailed the extraction of nonmovement periods in the data, for which the measured vector magnitude should ideally be the gravitational acceleration (1 g); this property was used to derive calibration correction factors using an iterative closest-point fitting process. The reduction in calibration error was evaluated in data from four cohorts: UK (n = 921), Kuwait (n = 120), Cameroon (n = 311), and Brazil (n = 200). Our method significantly reduced calibration error in all cohorts (P < 0.01), ranging from 16.6 to 3.0 mg in the Kuwaiti cohort to 76.7 to 8.0 mg error in the Brazil cohort. Utilizing temperature sensor data resulted in a small nonsignificant additional improvement (P > 0.05). Temperature correction coefficients were highest for the z-axis, e.g., 19.6-mg offset per 5°C. Further, application of the autocalibration method had a significant impact on typical metrics used for describing human physical activity, e.g., in Brazil average wrist acceleration was 0.2 to 51% lower than uncalibrated values depending on metric selection (P < 0.01). The autocalibration method as presented helps reduce the calibration error in wearable acceleration sensor data and improves comparability of physical activity measures across study locations. Temperature ultization seems essential when temperature deviates substantially from the average temperature in the record but not for multiday summary measures. PMID:25103964
Kaufmann, Tobias; Völker, Stefan; Gunesch, Laura; Kübler, Andrea
2012-01-01
Brain–computer interfaces (BCI) based on event-related potentials (ERP) allow for selection of characters from a visually presented character-matrix and thus provide a communication channel for users with neurodegenerative disease. Although they have been topic of research for more than 20 years and were multiply proven to be a reliable communication method, BCIs are almost exclusively used in experimental settings, handled by qualified experts. This study investigates if ERP–BCIs can be handled independently by laymen without expert support, which is inevitable for establishing BCIs in end-user’s daily life situations. Furthermore we compared the classic character-by-character text entry against a predictive text entry (PTE) that directly incorporates predictive text into the character-matrix. N = 19 BCI novices handled a user-centered ERP–BCI application on their own without expert support. The software individually adjusted classifier weights and control parameters in the background, invisible to the user (auto-calibration). All participants were able to operate the software on their own and to twice correctly spell a sentence with the auto-calibrated classifier (once with PTE, once without). Our PTE increased spelling speed and, importantly, did not reduce accuracy. In sum, this study demonstrates feasibility of auto-calibrating ERP–BCI use, independently by laymen and the strong benefit of integrating predictive text directly into the character-matrix. PMID:22833713
Brauck, Katja; Maderwald, Stefan; Vogt, Florian M; Zenge, Michael; Barkhausen, Jörg; Herborn, Christoph U
2007-01-01
We sought to compare a three-dimensional, contrast-enhanced, magnetic resonance angiogram (3D CE MRA) sequence combining parallel-imaging (generalised autocalibrating partially parallel acquisitions (GRAPPA)) with a time-resolved echo-shared angiographic technique (TREAT) in an intraindividual comparison to a standard 3D MRA sequence. Four healthy volunteers (27-32 years), and 11 patients (11-82 years) with vascular pathologies of the hand were examined on a 1.5-Tesla (T) MR system (Magnetom Avanto, Siemens, Erlangen, Germany) using two multichannel receiver coils. Following automatic injection (flow rate 2.5 cc/s) of 0.1 mmol/kg gadoterate (Dotarem, Guerbet, Roissy, France), 32 consecutive 3D data sets were collected with the TREAT sequence (TR/TE: 4.02/1.31 ms, FA: 10 degrees, GRAPPA acceleration factor: R=2, TREAT factor: 5, voxel size: 1.0 x 0.7 x 1.3 mm(3)) and a T1-wwighted 3D gradient-echo sequence (TR/TE: 5.3/1.57 ms, FA: 30 degrees, GRAPPA acceleration factor: 2, voxel size: 0.71 x 0.71 x 0.71 mm(3,)). MR data sets were evaluated and compared for image quality and visualisation of vascular details. In the volunteer group, all MR imaging was successful while technical problems prevented acquisition of the standard protocol in two patients. For the corresponding segments, the number of visible segments was equal on both sequences. Overall image quality was significantly better on the standard protocol than on the TREAT protocol. TREAT MRA provided functional information in lesions with rapid blood flow, e.g. detection of feeding and draining vessels in an haemangioma. TREAT-MRA is a robust technique that combines morphological and functional information of the hand vasculature and deals with the very special physiological demands of vascular lesions, such as quick arteriovenous transit time. PMID:16710664
NASA Astrophysics Data System (ADS)
Zhang, Y. Y.; Shao, Q. X.; Ye, A. Z.; Xing, H. T.; Xia, J.
2016-02-01
Integrated water system modeling is a feasible approach to understanding severe water crises in the world and promoting the implementation of integrated river basin management. In this study, a classic hydrological model (the time variant gain model: TVGM) was extended to an integrated water system model by coupling multiple water-related processes in hydrology, biogeochemistry, water quality, and ecology, and considering the interference of human activities. A parameter analysis tool, which included sensitivity analysis, autocalibration and model performance evaluation, was developed to improve modeling efficiency. To demonstrate the model performances, the Shaying River catchment, which is the largest highly regulated and heavily polluted tributary of the Huai River basin in China, was selected as the case study area. The model performances were evaluated on the key water-related components including runoff, water quality, diffuse pollution load (or nonpoint sources) and crop yield. Results showed that our proposed model simulated most components reasonably well. The simulated daily runoff at most regulated and less-regulated stations matched well with the observations. The average correlation coefficient and Nash-Sutcliffe efficiency were 0.85 and 0.70, respectively. Both the simulated low and high flows at most stations were improved when the dam regulation was considered. The daily ammonium-nitrogen (NH4-N) concentration was also well captured with the average correlation coefficient of 0.67. Furthermore, the diffuse source load of NH4-N and the corn yield were reasonably simulated at the administrative region scale. This integrated water system model is expected to improve the simulation performances with extension to more model functionalities, and to provide a scientific basis for the implementation in integrated river basin managements.
Tegeler, Charles H.; Tegeler, Catherine L.; Cook, Jared F.; Lee, Sung W.; Pajewski, Nicholas M.
2015-01-01
Abstract Objective Increased amplitudes in high-frequency brain electrical activity are reported with menopausal hot flashes. We report outcomes associated with the use of High-resolution, relational, resonance-based, electroencephalic mirroring—a noninvasive neurotechnology for autocalibration of neural oscillations—by women with perimenopausal and postmenopausal hot flashes. Methods Twelve women with hot flashes (median age, 56 y; range, 46-69 y) underwent a median of 13 (range, 8-23) intervention sessions for a median of 9.5 days (range, 4-32). This intervention uses algorithmic analysis of brain electrical activity and near real-time translation of brain frequencies into variable tones for acoustic stimulation. Hot flash frequency and severity were recorded by daily diary. Primary outcomes included hot flash severity score, sleep, and depressive symptoms. High-frequency amplitudes (23-36 Hz) from bilateral temporal scalp recordings were measured at baseline and during serial sessions. Self-reported symptom inventories for sleep and depressive symptoms were collected. Results The median change in hot flash severity score was −0.97 (range, −3.00 to 1.00; P = 0.015). Sleep and depression scores decreased by −8.5 points (range, −20 to −1; P = 0.022) and −5.5 points (range, −32 to 8; P = 0.015), respectively. The median sum of amplitudes for the right and left temporal high-frequency brain electrical activity was 8.44 μV (range, 6.27-16.66) at baseline and decreased by a median of −2.96 μV (range, −11.05 to −0.65; P = 0.0005) by the final session. Conclusions Hot flash frequency and severity, symptoms of insomnia and depression, and temporal high-frequency brain electrical activity decrease after High-resolution, relational, resonance-based, electroencephalic mirroring. Larger controlled trials with longer follow-up are warranted. PMID:25668305
Krogh, M.; Painter, J.; Hansen, C.
1996-10-01
Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.
1995-05-01
Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.
Wald, Ingo; Ize, Santiago
2015-07-28
Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
Massively parallel visualization: Parallel rendering
Hansen, C.D.; Krogh, M.; White, W.
1995-12-01
This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.
Parallel machines: Parallel machine languages
Iannucci, R.A. )
1990-01-01
This book presents a framework for understanding the tradeoffs between the conventional view and the dataflow view with the objective of discovering the critical hardware structures which must be present in any scalable, general-purpose parallel computer to effectively tolerate latency and synchronization costs. The author presents an approach to scalable general purpose parallel computation. Linguistic Concerns, Compiling Issues, Intermediate Language Issues, and hardware/technological constraints are presented as a combined approach to architectural Develoement. This book presents the notion of a parallel machine language.
Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.
1995-09-01
In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.
Gorda, B.C.
1992-09-01
Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer's task easier.
Gorda, B.C.
1992-09-01
Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer`s task easier.
Little, J.J.; Poggio, T.; Gamble, E.B. Jr.
1988-01-01
Computer algorithms have been developed for early vision processes that give separate cues to the distance from the viewer of three-dimensional surfaces, their shape, and their material properties. The MIT Vision Machine is a computer system that integrates several early vision modules to achieve high-performance recognition and navigation in unstructured environments. It is also an experimental environment for theoretical progress in early vision algorithms, their parallel implementation, and their integration. The Vision Machine consists of a movable, two-camera Eye-Head input device and an 8K Connection Machine. The authors have developed and implemented several parallel early vision algorithms that compute edge detection, stereopsis, motion, texture, and surface color in close to real time. The integration stage, based on coupled Markov random field models, leads to a cartoon-like map of the discontinuities in the scene, with partial labeling of the brightness edges in terms of their physical origin.
Parallel Information Processing.
ERIC Educational Resources Information Center
Rasmussen, Edie M.
1992-01-01
Examines parallel computer architecture and the use of parallel processors for text. Topics discussed include parallel algorithms; performance evaluation; parallel information processing; parallel access methods for text; parallel and distributed information retrieval systems; parallel hardware for text; and network models for information…
Wagner, J Y; Langemann, M; Schön, G; Kluge, S; Reuter, D A; Saugel, B
2016-05-01
The T-Line(®) system (Tensys(®) Medical Inc., San Diego, CA, USA) non-invasively estimates cardiac output (CO) using autocalibrating pulse contour analysis of the radial artery applanation tonometry-derived arterial waveform. We compared T-Line CO measurements (TL-CO) with invasively obtained CO measurements using transpulmonary thermodilution (TDCO) and calibrated pulse contour analysis (PC-CO) in patients after major gastrointestinal surgery. We compared 1) TL-CO versus TD-CO and 2) TL-CO versus PC-CO in 27 patients treated in the intensive care unit (ICU) after major gastrointestinal surgery. For the assessment of TD-CO and PC-CO we used the PiCCO(®) system (Pulsion Medical Systems SE, Feldkirchen, Germany). Per patient, we compared two sets of TD-CO and 30 minutes of PC-CO measurements with the simultaneously recorded TL-CO values using Bland-Altman analysis. The mean of differences (± standard deviation; 95% limits of agreement) between TL-CO and TD-CO was -0.8 (±1.6; -4.0 to +2.3) l/minute with a percentage error of 45%. For TL-CO versus PC-CO, we observed a mean of differences of -0.4 (±1.5; -3.4 to +2.5) l/minute with a percentage error of 43%. In ICU patients after major gastrointestinal surgery, continuous non-invasive CO measurement based on autocalibrating pulse contour analysis of the radial artery applanation tonometry-derived arterial waveform (TL-CO) is feasible in a clinical study setting. However, the agreement of TL-CO with TD-CO and PC-CO observed in our study indicates that further improvements are needed before the technology can be recommended for clinical use in these patients. PMID:27246932
2011-01-01
Introduction About 3% of people will be diagnosed with epilepsy during their lifetime, but about 70% of people with epilepsy eventually go into remission. Methods and outcomes We conducted a systematic review and aimed to answer the following clinical questions: What are the effects of starting antiepileptic drug treatment following a single seizure? What are the effects of drug monotherapy in people with partial epilepsy? What are the effects of additional drug treatments in people with drug-resistant partial epilepsy? What is the risk of relapse in people in remission when withdrawing antiepileptic drugs? What are the effects of behavioural and psychological treatments for people with epilepsy? What are the effects of surgery in people with drug-resistant temporal lobe epilepsy? We searched: Medline, Embase, The Cochrane Library, and other important databases up to July 2009 (Clinical Evidence reviews are updated periodically; please check our website for the most up-to-date version of this review). We included harms alerts from relevant organisations such as the US Food and Drug Administration (FDA) and the UK Medicines and Healthcare products Regulatory Agency (MHRA). Results We found 83 systematic reviews, RCTs, or observational studies that met our inclusion criteria. We performed a GRADE evaluation of the quality of evidence for interventions. Conclusions In this systematic review we present information relating to the effectiveness and safety of the following interventions: antiepileptic drugs after a single seizure; monotherapy for partial epilepsy using carbamazepine, gabapentin, lamotrigine, levetiracetam, phenobarbital, phenytoin, sodium valproate, or topiramate; addition of second-line drugs for drug-resistant partial epilepsy (allopurinol, eslicarbazepine, gabapentin, lacosamide, lamotrigine, levetiracetam, losigamone, oxcarbazepine, retigabine, tiagabine, topiramate, vigabatrin, or zonisamide); antiepileptic drug withdrawal for people with partial or
Li, Shu; Chan, Cheong; Stockmann, Jason P.; Tagare, Hemant; Adluru, Ganesh; Tam, Leo K.; Galiana, Gigi; Constable, R. Todd; Kozerke, Sebastian; Peters, Dana C.
2014-01-01
Purpose To investigate algebraic reconstruction technique (ART) for parallel imaging reconstruction of radial data, applied to accelerated cardiac cine. Methods A GPU-accelerated ART reconstruction was implemented and applied to simulations, point spread functions (PSF) and in twelve subjects imaged with radial cardiac cine acquisitions. Cine images were reconstructed with radial ART at multiple undersampling levels (192 Nr x Np = 96 to 16). Images were qualitatively and quantitatively analyzed for sharpness and artifacts, and compared to filtered back-projection (FBP), and conjugate gradient SENSE (CG SENSE). Results Radial ART provided reduced artifacts and mainly preserved spatial resolution, for both simulations and in vivo data. Artifacts were qualitatively and quantitatively less with ART than FBP using 48, 32, and 24 Np, although FBP provided quantitatively sharper images at undersampling levels of 48-24 Np (all p<0.05). Use of undersampled radial data for generating auto-calibrated coil-sensitivity profiles resulted in slightly reduced quality. ART was comparable to CG SENSE. GPU-acceleration increased ART reconstruction speed 15-fold, with little impact on the images. Conclusion GPU-accelerated ART is an alternative approach to image reconstruction for parallel radial MR imaging, providing reduced artifacts while mainly maintaining sharpness compared to FBP, as shown by its first application in cardiac studies. PMID:24753213
Tolerant (parallel) Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Special parallel processing workshop
1994-12-01
This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.
Iterative algorithms for large sparse linear systems on parallel computers
NASA Technical Reports Server (NTRS)
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Parallel rendering techniques for massively parallel visualization
Hansen, C.; Krogh, M.; Painter, J.
1995-07-01
As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.
Parallel algorithms and architectures
Albrecht, A.; Jung, H.; Mehlhorn, K.
1987-01-01
Contents of this book are the following: Preparata: Deterministic simulation of idealized parallel computers on more realistic ones; Convex hull of randomly chosen points from a polytope; Dataflow computing; Parallel in sequence; Towards the architecture of an elementary cortical processor; Parallel algorithms and static analysis of parallel programs; Parallel processing of combinatorial search; Communications; An O(nlogn) cost parallel algorithms for the single function coarsest partition problem; Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region; and RELACS - A recursive layout computing system. Parallel linear conflict-free subtree access.
On the parallel solution of parabolic equations
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Youcef
1989-01-01
Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
Conflict-cost based random sampling design for parallel MRI with low rank constraints
NASA Astrophysics Data System (ADS)
Kim, Wan; Zhou, Yihang; Lyu, Jingyuan; Ying, Leslie
2015-05-01
In compressed sensing MRI, it is very important to design sampling pattern for random sampling. For example, SAKE (simultaneous auto-calibrating and k-space estimation) is a parallel MRI reconstruction method using random undersampling. It formulates image reconstruction as a structured low-rank matrix completion problem. Variable density (VD) Poisson discs are typically adopted for 2D random sampling. The basic concept of Poisson disc generation is to guarantee samples are neither too close to nor too far away from each other. However, it is difficult to meet such a condition especially in the high density region. Therefore the sampling becomes inefficient. In this paper, we present an improved random sampling pattern for SAKE reconstruction. The pattern is generated based on a conflict cost with a probability model. The conflict cost measures how many dense samples already assigned are around a target location, while the probability model adopts the generalized Gaussian distribution which includes uniform and Gaussian-like distributions as special cases. Our method preferentially assigns a sample to a k-space location with the least conflict cost on the circle of the highest probability. To evaluate the effectiveness of the proposed random pattern, we compare the performance of SAKEs using both VD Poisson discs and the proposed pattern. Experimental results for brain data show that the proposed pattern yields lower normalized mean square error (NMSE) than VD Poisson discs.
Calibrationless Parallel Imaging Reconstruction Based on Structured Low-Rank Matrix Completion
Shin, Peter J.; Larson, Peder E.Z.; Ohliger, Michael A.; Elad, Michael; Pauly, John M.; Vigneron, Daniel B.; Lustig, Michael
2013-01-01
Purpose A calibrationless parallel imaging reconstruction method, termed simultaneous auto-calibrating and k-space estimation (SAKE), is presented. It is a data-driven, coil-by-coil reconstruction method that does not require a separate calibration step for estimating coil sensitivity information. Methods In SAKE, an under-sampled multi-channel dataset is structured into a single data matrix. Then the reconstruction is formulated as a structured low-rank matrix completion problem. An iterative solution that implements a projection-onto-sets algorithm with singular value thresholding is described. Results Reconstruction results are demonstrated for retrospectively and prospectively under-sampled, multi-channel Cartesian data having no calibration signals. Additionally, non-Cartesian data reconstruction is presented. Finally, improved image quality is demonstrated by combining SAKE with wavelet-based compressed sensing. Conclusion As estimation of coil sensitivity information is not needed, the proposed method could potentially benefit MR applications where acquiring accurate calibration data is limiting or not possible at all. PMID:24248734
Parallel solution of partial differential equations by extrapolation methods
Leland, Robert W.; Rollett, J. S.
2015-02-01
We have found, in the ROGE algorithm, an extrapolation process which is robust, effective and practically simple to implement. It removes the difficulty of needing to make a precise estimate of the over-relaxation parameter for Successive Over-Relaxation (SOR) type methods.
NASA Technical Reports Server (NTRS)
Dorband, John E.
1987-01-01
Massively Parallel Processor (MPP) Parallel FORTH is a derivative of FORTH-83 and Unified Software Systems' Uni-FORTH. The extension of FORTH into the realm of parallel processing on the MPP is described. With few exceptions, Parallel FORTH was made to follow the description of Uni-FORTH as closely as possible. Likewise, the parallel FORTH extensions were designed as philosophically similar to serial FORTH as possible. The MPP hardware characteristics, as viewed by the FORTH programmer, is discussed. Then a description is presented of how parallel FORTH is implemented on the MPP.
Parallel flow diffusion battery
Yeh, H.C.; Cheng, Y.S.
1984-01-01
A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
Parallel flow diffusion battery
Yeh, Hsu-Chi; Cheng, Yung-Sung
1984-08-07
A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Parallel genotypic adaptation: when evolution repeats itself
Wood, Troy E.; Burke, John M.; Rieseberg, Loren H.
2008-01-01
Until recently, parallel genotypic adaptation was considered unlikely because phenotypic differences were thought to be controlled by many genes. There is increasing evidence, however, that phenotypic variation sometimes has a simple genetic basis and that parallel adaptation at the genotypic level may be more frequent than previously believed. Here, we review evidence for parallel genotypic adaptation derived from a survey of the experimental evolution, phylogenetic, and quantitative genetic literature. The most convincing evidence of parallel genotypic adaptation comes from artificial selection experiments involving microbial populations. In some experiments, up to half of the nucleotide substitutions found in independent lineages under uniform selection are the same. Phylogenetic studies provide a means for studying parallel genotypic adaptation in non-experimental systems, but conclusive evidence may be difficult to obtain because homoplasy can arise for other reasons. Nonetheless, phylogenetic approaches have provided evidence of parallel genotypic adaptation across all taxonomic levels, not just microbes. Quantitative genetic approaches also suggest parallel genotypic evolution across both closely and distantly related taxa, but it is important to note that this approach cannot distinguish between parallel changes at homologous loci versus convergent changes at closely linked non-homologous loci. The finding that parallel genotypic adaptation appears to be frequent and occurs at all taxonomic levels has important implications for phylogenetic and evolutionary studies. With respect to phylogenetic analyses, parallel genotypic changes, if common, may result in faulty estimates of phylogenetic relationships. From an evolutionary perspective, the occurrence of parallel genotypic adaptation provides increasing support for determinism in evolution and may provide a partial explanation for how species with low levels of gene flow are held together. PMID:15881688
Eclipse Parallel Tools Platform
Watson, Gregory; DeBardeleben, Nathan; Rasmussen, Craig
2005-02-18
Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices, and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures, and basis
Parallel Adaptive Mesh Refinement
Diachin, L; Hornung, R; Plassmann, P; WIssink, A
2005-03-04
As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the
Parallel Atomistic Simulations
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Visualization and Tracking of Parallel CFD Simulations
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kremenetsky, Mark
1995-01-01
We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Parallel digital forensics infrastructure.
Liebrock, Lorie M.; Duggan, David Patrick
2009-10-01
This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.
Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A.; Seiberlich, Nicole
2015-01-01
Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the under-sampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. PMID:22696125
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan
1994-01-01
A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Parallel preconditioning techniques for sparse CG solvers
Basermann, A.; Reichel, B.; Schelthoff, C.
1996-12-31
Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.
Eclipse Parallel Tools Platform
Energy Science and Technology Software Center (ESTSC)
2005-02-18
Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices,more » and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures
Parallel scheduling algorithms
Dekel, E.; Sahni, S.
1983-01-01
Parallel algorithms are given for scheduling problems such as scheduling to minimize the number of tardy jobs, job sequencing with deadlines, scheduling to minimize earliness and tardiness penalties, channel assignment, and minimizing the mean finish time. The shared memory model of parallel computers is used to obtain fast algorithms. 26 references.
Massively parallel mathematical sieves
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Not Available
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
... Jacksonian seizure; Seizure - partial (focal); Temporal lobe seizure; Epilepsy - partial seizures ... Abou-Khalil BW, Gallagher MJ, Macdonald RL. Epilepsies. In: Daroff RB, ... 6th ed. Philadelphia, PA: Elsevier Saunders; 2012:chap 67. ...
... Jacksonian seizure; Seizure - partial (focal); Temporal lobe seizure; Epilepsy - partial seizures ... Abou-Khalil BW, Gallagher MJ, Macdonald RL. Epilepsies. In: Daroff ... Practice . 7th ed. Philadelphia, PA: Elsevier; 2016:chap 101. ...
NASA Technical Reports Server (NTRS)
Vranish, John M. (Inventor)
2010-01-01
A partial gear bearing including an upper half, comprising peak partial teeth, and a lower, or bottom, half, comprising valley partial teeth. The upper half also has an integrated roller section between each of the peak partial teeth with a radius equal to the gear pitch radius of the radially outwardly extending peak partial teeth. Conversely, the lower half has an integrated roller section between each of the valley half teeth with a radius also equal to the gear pitch radius of the peak partial teeth. The valley partial teeth extend radially inwardly from its roller section. The peak and valley partial teeth are exactly out of phase with each other, as are the roller sections of the upper and lower halves. Essentially, the end roller bearing of the typical gear bearing has been integrated into the normal gear tooth pattern.
Parallel adaptive wavelet collocation method for PDEs
Nejadmalayeri, Alireza; Vezolainen, Alexei; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allows fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.
Parallel nearest neighbor calculations
NASA Astrophysics Data System (ADS)
Trease, Harold
We are just starting to parallelize the nearest neighbor portion of our free-Lagrange code. Our implementation of the nearest neighbor reconnection algorithm has not been parallelizable (i.e., we just flip one connection at a time). In this paper we consider what sort of nearest neighbor algorithms lend themselves to being parallelized. For example, the construction of the Voronoi mesh can be parallelized, but the construction of the Delaunay mesh (dual to the Voronoi mesh) cannot because of degenerate connections. We will show our most recent attempt to tessellate space with triangles or tetrahedrons with a new nearest neighbor construction algorithm called DAM (Dial-A-Mesh). This method has the characteristics of a parallel algorithm and produces a better tessellation of space than the Delaunay mesh. Parallel processing is becoming an everyday reality for us at Los Alamos. Our current production machines are Cray YMPs with 8 processors that can run independently or combined to work on one job. We are also exploring massive parallelism through the use of two 64K processor Connection Machines (CM2), where all the processors run in lock step mode. The effective application of 3-D computer models requires the use of parallel processing to achieve reasonable "turn around" times for our calculations.
Bilingual parallel programming
Foster, I.; Overbeek, R.
1990-01-01
Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach provides and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.
Tai, H.M.; Saeks, R.
1984-03-01
A relaxation algorithm for solving large-scale system simulation problems in parallel is proposed. The algorithm, which is composed of both a time-step parallel algorithm and a component-wise parallel algorithm, is described. The interconnected nature of the system, which is characterized by the component connection model, is fully exploited by this approach. A technique for finding an optimal number of the time steps is also described. Finally, this algorithm is illustrated via several examples in which the possible trade-offs between the speed-up ratio, efficiency, and waiting time are analyzed.
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
NASA Astrophysics Data System (ADS)
2014-10-01
Adam Nelson and Stuart Warriner, from the University of Leeds, talk with Nature Chemistry about their work to develop viable synthetic strategies for preparing new chemical structures in parallel with the identification of desirable biological activity.
ERIC Educational Resources Information Center
Rogers, Pat
1972-01-01
Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)
Simplified Parallel Domain Traversal
Erickson III, David J
2011-01-01
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep by performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.
Partitioning and parallel radiosity
NASA Astrophysics Data System (ADS)
Merzouk, S.; Winkler, C.; Paul, J. C.
1996-03-01
This paper proposes a theoretical framework, based on domain subdivision for parallel radiosity. Moreover, three various implementation approaches, taking advantage of partitioning algorithms and global shared memory architecture, are presented.
Scalable parallel communications
NASA Technical Reports Server (NTRS)
Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.
1992-01-01
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth
NASA Technical Reports Server (NTRS)
Reif, John H.
1987-01-01
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
Continuous parallel coordinates.
Heinrich, Julian; Weiskopf, Daniel
2009-01-01
Typical scientific data is represented on a grid with appropriate interpolation or approximation schemes,defined on a continuous domain. The visualization of such data in parallel coordinates may reveal patterns latently contained in the data and thus can improve the understanding of multidimensional relations. In this paper, we adopt the concept of continuous scatterplots for the visualization of spatially continuous input data to derive a density model for parallel coordinates. Based on the point-line duality between scatterplots and parallel coordinates, we propose a mathematical model that maps density from a continuous scatterplot to parallel coordinates and present different algorithms for both numerical and analytical computation of the resulting density field. In addition, we show how the 2-D model can be used to successively construct continuous parallel coordinates with an arbitrary number of dimensions. Since continuous parallel coordinates interpolate data values within grid cells, a scalable and dense visualization is achieved, which will be demonstrated for typical multi-variate scientific data. PMID:19834230
Parallel algorithms for the spectral transform method
Foster, I.T.; Worley, P.H.
1994-04-01
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, we describe these different parallel algorithms and report on computational experiments that we have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations or a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. We focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional FFTs and other parallel transforms.
Parallel architectures for iterative methods on adaptive, block structured grids
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1983-01-01
A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Parallel time integration software
Energy Science and Technology Software Center (ESTSC)
2014-07-01
This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds mustmore » come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.« less
Parallel time integration software
2014-07-01
This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A
2014-05-20
An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.
Bailey, David H.
2009-11-15
The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage
Adaptive parallel logic networks
NASA Technical Reports Server (NTRS)
Martinez, Tony R.; Vidal, Jacques J.
1988-01-01
Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Speeding up parallel processing
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
Programming parallel vision algorithms
Shapiro, L.G.
1988-01-01
Computer vision requires the processing of large volumes of data and requires parallel architectures and algorithms to be useful in real-time, industrial applications. The INSIGHT dataflow language was designed to allow encoding of vision algorithms at all levels of the computer vision paradigm. INSIGHT programs, which are relational in nature, can be translated into a graph structure that represents an architecture for solving a particular vision problem or a configuration of a reconfigurable computational network. The authors consider here INSIGHT programs that produce a parallel net architecture for solving low-, mid-, and high-level vision tasks.
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Among the highly parallel computing architectures required for advanced scientific computation, those designated 'MIMD' and 'SIMD' have yielded the best results to date. The present development status evaluation of such architectures shown neither to have attained a decisive advantage in most near-homogeneous problems' treatment; in the cases of problems involving numerous dissimilar parts, however, such currently speculative architectures as 'neural networks' or 'data flow' machines may be entailed. Data flow computers are the most practical form of MIMD fine-grained parallel computers yet conceived; they automatically solve the problem of assigning virtual processors to the real processors in the machine.
Coarrars for Parallel Processing
NASA Technical Reports Server (NTRS)
Snyder, W. Van
2011-01-01
The design of the Coarray feature of Fortran 2008 was guided by answering the question "What is the smallest change required to convert Fortran to a robust and efficient parallel language." Two fundamental issues that any parallel programming model must address are work distribution and data distribution. In order to coordinate work distribution and data distribution, methods for communication and synchronization must be provided. Although originally designed for Fortran, the Coarray paradigm has stimulated development in other languages. X10, Chapel, UPC, Titanium, and class libraries being developed for C++ have the same conceptual framework.
Physics of Partially Ionized Plasmas
NASA Astrophysics Data System (ADS)
Krishan, Vinod
2016-05-01
Figures; Preface; 1. Partially ionized plasmas here and everywhere; 2. Multifluid description of partially ionized plasmas; 3. Equilibrium of partially ionized plasmas; 4. Waves in partially ionized plasmas; 5. Advanced topics in partially ionized plasmas; 6. Research problems in partially ionized plasmas; Supplementary matter; Index.
Energy Science and Technology Software Center (ESTSC)
2004-10-21
This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.
NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Subhash, Saini; Bailey, David H.; Lasinski, T. A. (Technical Monitor)
1995-01-01
The NAS Parallel Benchmarks (NPB) were developed in 1991 at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a pencil and paper fashion i.e. the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. In this paper, we present new NPB performance results for the following systems: (a) Parallel-Vector Processors: Cray C90, Cray T'90 and Fujitsu VPP500; (b) Highly Parallel Processors: Cray T3D, IBM SP2 and IBM SP-TN2 (Thin Nodes 2); (c) Symmetric Multiprocessing Processors: Convex Exemplar SPP1000, Cray J90, DEC Alpha Server 8400 5/300, and SGI Power Challenge XL. We also present sustained performance per dollar for Class B LU, SP and BT benchmarks. We also mention NAS future plans of NPB.
High performance parallel architectures
Anderson, R.E. )
1989-09-01
In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.
Foster, I.; Tuecke, S.
1993-01-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.
Parallel Multigrid Equation Solver
Energy Science and Technology Software Center (ESTSC)
2001-09-07
Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.
Parallel Dislocation Simulator
Energy Science and Technology Software Center (ESTSC)
2006-10-30
ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.
Optical parallel selectionist systems
NASA Astrophysics Data System (ADS)
Caulfield, H. John
1993-01-01
There are at least two major classes of computers in nature and technology: connectionist and selectionist. A subset of connectionist systems (Turing Machines) dominates modern computing, although another subset (Neural Networks) is growing rapidly. Selectionist machines have unique capabilities which should allow them to do truly creative operations. It is possible to make a parallel optical selectionist system using methods describes in this paper.
Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan
2010-01-01
We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.
Parallel hierarchical global illumination
Snell, Q.O.
1997-10-08
Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
Parallel hierarchical radiosity rendering
Carter, M.
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Why arthroscopic partial meniscectomy?
Lyu, Shaw-Ruey
2015-09-01
"Arthroscopic Partial Meniscectomy versus Sham Surgery for a Degenerative Meniscal Tear" published in the New England Journal of Medicine on December 26, 2013 draws the conclusion that arthroscopic partial medial meniscectomy provides no significant benefit over sham surgery in patients with a degenerative meniscal tear and no knee osteoarthritis. This result argues against the current practice of performing arthroscopic partial meniscectomy (APM) in patients with a degenerative meniscal tear. Since the number of APM performed has been increasing, the information provided by this study should lead to a change in clinical care of patients with a degenerative meniscus tear. PMID:26488013
Adapting implicit methods to parallel processors
Reeves, L.; McMillin, B.; Okunbor, D.; Riggins, D.
1994-12-31
When numerically solving many types of partial differential equations, it is advantageous to use implicit methods because of their better stability and more flexible parameter choice, (e.g. larger time steps). However, since implicit methods usually require simultaneous knowledge of the entire computational domain, these methods axe difficult to implement directly on distributed memory parallel processors. This leads to infrequent use of implicit methods on parallel/distributed systems. The usual implementation of implicit methods is inefficient due to the nature of parallel systems where it is common to take the computational domain and distribute the grid points over the processors so as to maintain a relatively even workload per processor. This creates a problem at the locations in the domain where adjacent points are not on the same processor. In order for the values at these points to be calculated, messages have to be exchanged between the corresponding processors. Without special adaptation, this will result in idle processors during part of the computation, and as the number of idle processors increases, the lower the effective speed improvement by using a parallel processor.
Partial knee replacement - slideshow
... page: //medlineplus.gov/ency/presentations/100225.htm Partial knee replacement - series To use the sharing features on ... A.M. Editorial team. Related MedlinePlus Health Topics Knee Replacement A.D.A.M., Inc. is accredited ...
Most people recover quickly and have much less pain than they did before surgery. People who have a partial knee replacement recover faster than those who have a total knee replacement. Many people are able to walk ...
Twisted partially pure spinors
NASA Astrophysics Data System (ADS)
Herrera, Rafael; Tellez, Ivan
2016-08-01
Motivated by the relationship between orthogonal complex structures and pure spinors, we define twisted partially pure spinors in order to characterize spinorially subspaces of Euclidean space endowed with a complex structure.
Extendability of parallel sections in vector bundles
NASA Astrophysics Data System (ADS)
Kirschner, Tim
2016-01-01
I address the following question: Given a differentiable manifold M, what are the open subsets U of M such that, for all vector bundles E over M and all linear connections ∇ on E, any ∇-parallel section in E defined on U extends to a ∇-parallel section in E defined on M? For simply connected manifolds M (among others) I describe the entirety of all such sets U which are, in addition, the complement of a C1 submanifold, boundary allowed, of M. This delivers a partial positive answer to a problem posed by Antonio J. Di Scala and Gianni Manno (2014). Furthermore, in case M is an open submanifold of Rn, n ≥ 2, I prove that the complement of U in M, not required to be a submanifold now, can have arbitrarily large n-dimensional Lebesgue measure.
Olmedo, Oscar; Zhang Jie
2010-07-20
Flux ropes are now generally accepted to be the magnetic configuration of coronal mass ejections (CMEs), which may be formed prior to or during solar eruptions. In this study, we model the flux rope as a current-carrying partial torus loop with its two footpoints anchored in the photosphere, and investigate its stability in the context of the torus instability (TI). Previous studies on TI have focused on the configuration of a circular torus and revealed the existence of a critical decay index of the overlying constraining magnetic field. Our study reveals that the critical index is a function of the fractional number of the partial torus, defined by the ratio between the arc length of the partial torus above the photosphere and the circumference of a circular torus of equal radius. We refer to this finding as the partial torus instability (PTI). It is found that a partial torus with a smaller fractional number has a smaller critical index, thus requiring a more gradually decreasing magnetic field to stabilize the flux rope. On the other hand, a partial torus with a larger fractional number has a larger critical index. In the limit of a circular torus when the fractional number approaches 1, the critical index goes to a maximum value. We demonstrate that the PTI helps us to understand the confinement, growth, and eventual eruption of a flux-rope CME.
Parallel Subconvolution Filtering Architectures
NASA Technical Reports Server (NTRS)
Gray, Andrew A.
2003-01-01
These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.
Parallel Anisotropic Tetrahedral Adaptation
NASA Technical Reports Server (NTRS)
Park, Michael A.; Darmofal, David L.
2008-01-01
An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.
Homology, convergence and parallelism.
Ghiselin, Michael T
2016-01-01
Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
NASA Technical Reports Server (NTRS)
Gryphon, Coranth D.; Miller, Mark D.
1991-01-01
PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.
Parallel multilevel preconditioners
Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.
1989-01-01
In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.
Xyce parallel electronic simulator.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Groh, E.F.; Lennox, D.H.
1963-04-23
This invention is concerned with a rigid assembly of parallel plates in which keyways are stamped out along the edges of the plates and a self-retaining key is inserted into aligned keyways. Spacers having similar keyways are included between adjacent plates. The entire assembly is locked into a rigid structure by fastening only the outermost plates to the ends of the keys. (AEC)
Adaptive parallel logic networks
Martinez, T.R.; Vidal, J.J.
1988-02-01
This paper presents a novel class of special purpose processors referred to as ASOCS (adaptive self-organizing concurrent systems). Intended applications include adaptive logic devices, robotics, process control, system malfunction management, and in general, applications of logic reasoning. ASOCS combines massive parallelism with self-organization to attain a distributed mechanism for adaptation. The ASOCS approach is based on an adaptive network composed of many simple computing elements (nodes) which operate in a combinational and asynchronous fashion. Problem specification (programming) is obtained by presenting to the system if-then rules expressed as Boolean conjunctions. New rules are added incrementally. In the current model, when conflicts occur, precedence is given to the most recent inputs. With each rule, desired network response is simply presented to the system, following which the network adjusts itself to maintain consistency and parsimony of representation. Data processing and adaptation form two separate phases of operation. During processing, the network acts as a parallel hardware circuit. Control of the adaptive process is distributed among the network nodes and efficiently exploits parallelism.
Trajectory optimization using parallel shooting method on parallel computer
Wirthman, D.J.; Park, S.Y.; Vadali, S.R.
1995-03-01
The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.
The Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.
Resistor Combinations for Parallel Circuits.
ERIC Educational Resources Information Center
McTernan, James P.
1978-01-01
To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)
NASA Astrophysics Data System (ADS)
Elghariani, Ali; Zoltowski, Michael D.
2012-05-01
In this paper, partial spread OFDM system has been presented and its performance has been studied when different detection techniques are employed, such as minimum mean square error (MMSE), grouped Maximum Likelihood (ML) and approximated integer quadratic programming (IQP) techniques . The performance study also includes applying two different spreading matrices, Hadamard and Vandermonde. Extensive computer simulation have been implemented and important results show that partial spread OFDM system improves the BER performance and the frequency diversity of OFDM compared to both non spread and full spread systems. The results from this paper also show that partial spreading technique combined with suboptimal detector could be a better solution for applications that require low receiver complexity and high information detectability.
Methanol partial oxidation reformer
Ahmed, S.; Kumar, R.; Krumpelt, M.
1999-08-17
A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.
Methanol partial oxidation reformer
Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael
1999-01-01
A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.
Methanol partial oxidation reformer
Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael
2001-01-01
A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.
Methanol partial oxidation reformer
Ahmed, S.; Kumar, R.; Krumpelt, M.
1999-08-24
A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.
Oxygen partial pressure sensor
Dees, D.W.
1994-09-06
A method for detecting oxygen partial pressure and an oxygen partial pressure sensor are provided. The method for measuring oxygen partial pressure includes contacting oxygen to a solid oxide electrolyte and measuring the subsequent change in electrical conductivity of the solid oxide electrolyte. A solid oxide electrolyte is utilized that contacts both a porous electrode and a nonporous electrode. The electrical conductivity of the solid oxide electrolyte is affected when oxygen from an exhaust stream permeates through the porous electrode to establish an equilibrium of oxygen anions in the electrolyte, thereby displacing electrons throughout the electrolyte to form an electron gradient. By adapting the two electrodes to sense a voltage potential between them, the change in electrolyte conductivity due to oxygen presence can be measured. 1 fig.
Oxygen partial pressure sensor
Dees, Dennis W.
1994-01-01
A method for detecting oxygen partial pressure and an oxygen partial pressure sensor are provided. The method for measuring oxygen partial pressure includes contacting oxygen to a solid oxide electrolyte and measuring the subsequent change in electrical conductivity of the solid oxide electrolyte. A solid oxide electrolyte is utilized that contacts both a porous electrode and a nonporous electrode. The electrical conductivity of the solid oxide electrolyte is affected when oxygen from an exhaust stream permeates through the porous electrode to establish an equilibrium of oxygen anions in the electrolyte, thereby displacing electrons throughout the electrolyte to form an electron gradient. By adapting the two electrodes to sense a voltage potential between them, the change in electrolyte conductivity due to oxygen presence can be measured.
Parallel Pascal - An extended Pascal for parallel computers
NASA Technical Reports Server (NTRS)
Reeves, A. P.
1984-01-01
Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.
Parallel Eclipse Project Checkout
NASA Technical Reports Server (NTRS)
Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.
2011-01-01
Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.
Fastpath Speculative Parallelization
NASA Astrophysics Data System (ADS)
Spear, Michael F.; Kelsey, Kirk; Bai, Tongxin; Dalessandro, Luke; Scott, Michael L.; Ding, Chen; Wu, Peng
We describe Fastpath, a system for speculative parallelization of sequential programs on conventional multicore processors. Our system distinguishes between the lead thread, which executes at almost-native speed, and speculative threads, which execute somewhat slower. This allows us to achieve nontrivial speedup, even on two-core machines. We present a mathematical model of potential speedup, parameterized by application characteristics and implementation constants. We also present preliminary results gleaned from two different Fastpath implementations, each derived from an implementation of software transactional memory.
CSM parallel structural methods research
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1989-01-01
Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.
Synchronous Parallel Kinetic Monte Carlo
Mart?nez, E; Marian, J; Kalos, M H
2006-12-14
A novel parallel kinetic Monte Carlo (kMC) algorithm formulated on the basis of perfect time synchronicity is presented. The algorithm provides an exact generalization of any standard serial kMC model and is trivially implemented in parallel architectures. We demonstrate the mathematical validity and parallel performance of the method by solving several well-understood problems in diffusion.
Roo: A parallel theorem prover
Lusk, E.L.; McCune, W.W.; Slaney, J.K.
1991-11-01
We describe a parallel theorem prover based on the Argonne theorem-proving system OTTER. The parallel system, called Roo, runs on shared-memory multiprocessors such as the Sequent Symmetry. We explain the parallel algorithm used and give performance results that demonstrate near-linear speedups on large problems.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Partial Arc Curvilinear Direct Drive Servomotor
NASA Technical Reports Server (NTRS)
Sun, Xiuhong (Inventor)
2014-01-01
A partial arc servomotor assembly having a curvilinear U-channel with two parallel rare earth permanent magnet plates facing each other and a pivoted ironless three phase coil armature winding moves between the plates. An encoder read head is fixed to a mounting plate above the coil armature winding and a curvilinear encoder scale is curved to be co-axis with the curvilinear U-channel permanent magnet track formed by the permanent magnet plates. Driven by a set of miniaturized power electronics devices closely looped with a positioning feedback encoder, the angular position and velocity of the pivoted payload is programmable and precisely controlled.
Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G
2007-04-11
The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.
Baskin, Tobias I.; Gu, Ying
2012-01-01
The extracellular matrix is constructed beyond the plasma membrane, challenging mechanisms for its control by the cell. In plants, the cell wall is highly ordered, with cellulose microfibrils aligned coherently over a scale spanning hundreds of cells. To a considerable extent, deploying aligned microfibrils determines mechanical properties of the cell wall, including strength and compliance. Cellulose microfibrils have long been seen to be aligned in parallel with an array of microtubules in the cell cortex. How do these cortical microtubules affect the cellulose synthase complex? This question has stood for as many years as the parallelism between the elements has been observed, but now an answer is emerging. Here, we review recent work establishing that the link between microtubules and microfibrils is mediated by a protein named cellulose synthase-interacting protein 1 (CSI1). The protein binds both microtubules and components of the cellulose synthase complex. In the absence of CSI1, microfibrils are synthesized but their alignment becomes uncoupled from the microtubules, an effect that is phenocopied in the wild type by depolymerizing the microtubules. The characterization of CSI1 significantly enhances knowledge of how cellulose is aligned, a process that serves as a paradigmatic example of how cells dictate the construction of their extracellular environment. PMID:22902763
Applied Parallel Metadata Indexing
Jacobi, Michael R
2012-08-01
The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.
Parallel ptychographic reconstruction
Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris
2014-01-01
Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174
Partial Participation Revisited.
ERIC Educational Resources Information Center
Ferguson, Dianne L.; Baumgart, Diane
1991-01-01
This article reanalyzes the principle of partial participation in integrated educational programing for students with severe or profound disabilities. The article presents four "error patterns" in how the concept has been used, some reasons why such error patterns have occurred, and strategies for avoiding these errors. (Author/JDD)
NASA Technical Reports Server (NTRS)
Capps, Stephen; Lorandos, Jason; Akhidime, Eval; Bunch, Michael; Lund, Denise; Moore, Nathan; Murakawa, Kiosuke
1989-01-01
The purpose of this study is to investigate comprehensive design requirements associated with designing habitats for humans in a partial gravity environment, then to apply them to a lunar base design. Other potential sites for application include planetary surfaces such as Mars, variable-gravity research facilities, and a rotating spacecraft. Design requirements for partial gravity environments include locomotion changes in less than normal earth gravity; facility design issues, such as interior configuration, module diameter, and geometry; and volumetric requirements based on the previous as well as psychological issues involved in prolonged isolation. For application to a lunar base, it is necessary to study the exterior architecture and configuration to insure optimum circulation patterns while providing dual egress; radiation protection issues are addressed to provide a safe and healthy environment for the crew; and finally, the overall site is studied to locate all associated facilities in context with the habitat. Mission planning is not the purpose of this study; therefore, a Lockheed scenario is used as an outline for the lunar base application, which is then modified to meet the project needs. The goal of this report is to formulate facts on human reactions to partial gravity environments, derive design requirements based on these facts, and apply the requirements to a partial gravity situation which, for this study, was a lunar base.
Partial wave analysis using graphics processing units
NASA Astrophysics Data System (ADS)
Berger, Niklaus; Beijiang, Liu; Jike, Wang
2010-04-01
Partial wave analysis is an important tool for determining resonance properties in hadron spectroscopy. For large data samples however, the un-binned likelihood fits employed are computationally very expensive. At the Beijing Spectrometer (BES) III experiment, an increase in statistics compared to earlier experiments of up to two orders of magnitude is expected. In order to allow for a timely analysis of these datasets, additional computing power with short turnover times has to be made available. It turns out that graphics processing units (GPUs) originally developed for 3D computer games have an architecture of massively parallel single instruction multiple data floating point units that is almost ideally suited for the algorithms employed in partial wave analysis. We have implemented a framework for tensor manipulation and partial wave fits called GPUPWA. The user writes a program in pure C++ whilst the GPUPWA classes handle computations on the GPU, memory transfers, caching and other technical details. In conjunction with a recent graphics processor, the framework provides a speed-up of the partial wave fit by more than two orders of magnitude compared to legacy FORTRAN code.
A systolic array parallelizing compiler
Tseng, P.S. )
1990-01-01
This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.
Parallel node placement method by bubble simulation
NASA Astrophysics Data System (ADS)
Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang
2014-03-01
An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.
DeHart, Mark D; Williams, Mark L; Bowman, Stephen M
2010-01-01
The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement
McKay, Mike
2003-12-01
UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use of EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.
Parallel Polarization State Generation
She, Alan; Capasso, Federico
2016-01-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813
Energy Science and Technology Software Center (ESTSC)
2003-12-01
UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use ofmore » EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.« less
Parallel Polarization State Generation
NASA Astrophysics Data System (ADS)
She, Alan; Capasso, Federico
2016-05-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Parallel Polarization State Generation.
She, Alan; Capasso, Federico
2016-01-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813
Parallel tridiagonal equation solvers
NASA Technical Reports Server (NTRS)
Stone, H. S.
1974-01-01
Three parallel algorithms were compared for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC 4 and CDC STAR. For array computers similar to ILLIAC 4, cyclic odd-even reduction has the least operation count for highly structured sets of equations, and recursive doubling has the least count for relatively unstructured sets of equations. Since the difference in operation counts for these two algorithms is not substantial, their relative running times may be more related to overhead operations, which are not measured in this paper. The third algorithm, based on Buneman's Poisson solver, has more arithmetic operations than the others, and appears to be the least favorable. For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases.
Parallel Imaging Microfluidic Cytometer
Ehrlich, Daniel J.; McKenna, Brian K.; Evans, James G.; Belkina, Anna C.; Denis, Gerald V.; Sherr, David; Cheung, Man Ching
2011-01-01
By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of flow cytometry (FACS) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1-D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity and, (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in approximately 6–10 minutes, about 30-times the speed of most current FACS systems. In 1-D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times the sample throughput of CCD-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. PMID:21704835
Parallelizing OVERFLOW: Experiences, Lessons, Results
NASA Technical Reports Server (NTRS)
Jespersen, Dennis C.
1999-01-01
The computer code OVERFLOW is widely used in the aerodynamic community for the numerical solution of the Navier-Stokes equations. Current trends in computer systems and architectures are toward multiple processors and parallelism, including distributed memory. This report describes work that has been carried out by the author and others at Ames Research Center with the goal of parallelizing OVERFLOW using a variety of parallel architectures and parallelization strategies. This paper begins with a brief description of the OVERFLOW code. This description includes the basic numerical algorithm and some software engineering considerations. Next comes a description of a parallel version of OVERFLOW, OVERFLOW/PVM, using PVM (Parallel Virtual Machine). This parallel version of OVERFLOW uses the manager/worker style and is part of the standard OVERFLOW distribution. Then comes a description of a parallel version of OVERFLOW, OVERFLOW/MPI, using MPI (Message Passing Interface). This parallel version of OVERFLOW uses the SPMD (Single Program Multiple Data) style. Finally comes a discussion of alternatives to explicit message-passing in the context of parallelizing OVERFLOW.
Partially coherent ultrafast spectrography
NASA Astrophysics Data System (ADS)
Bourassin-Bouchet, C.; Couprie, M.-E.
2015-03-01
Modern ultrafast metrology relies on the postulate that the pulse to be measured is fully coherent, that is, that it can be completely described by its spectrum and spectral phase. However, synthesizing fully coherent pulses is not always possible in practice, especially in the domain of emerging ultrashort X-ray sources where temporal metrology is strongly needed. Here we demonstrate how frequency-resolved optical gating (FROG), the first and one of the most widespread techniques for pulse characterization, can be adapted to measure partially coherent pulses even down to the attosecond timescale. No modification of experimental apparatuses is required; only the processing of the measurement changes. To do so, we take our inspiration from other branches of physics where partial coherence is routinely dealt with, such as quantum optics and coherent diffractive imaging. This will have important and immediate applications, such as enabling the measurement of X-ray free-electron laser pulses despite timing jitter.
Partially integrated exhaust manifold
Hayman, Alan W; Baker, Rodney E
2015-01-20
A partially integrated manifold assembly is disclosed which improves performance, reduces cost and provides efficient packaging of engine components. The partially integrated manifold assembly includes a first leg extending from a first port and terminating at a mounting flange for an exhaust gas control valve. Multiple additional legs (depending on the total number of cylinders) are integrally formed with the cylinder head assembly and extend from the ports of the associated cylinder and terminate at an exit port flange. These additional legs are longer than the first leg such that the exit port flange is spaced apart from the mounting flange. This configuration provides increased packaging space adjacent the first leg for any valving that may be required to control the direction and destination of exhaust flow in recirculation to an EGR valve or downstream to a catalytic converter.
Partially coherent ultrafast spectrography
Bourassin-Bouchet, C.; Couprie, M.-E.
2015-01-01
Modern ultrafast metrology relies on the postulate that the pulse to be measured is fully coherent, that is, that it can be completely described by its spectrum and spectral phase. However, synthesizing fully coherent pulses is not always possible in practice, especially in the domain of emerging ultrashort X-ray sources where temporal metrology is strongly needed. Here we demonstrate how frequency-resolved optical gating (FROG), the first and one of the most widespread techniques for pulse characterization, can be adapted to measure partially coherent pulses even down to the attosecond timescale. No modification of experimental apparatuses is required; only the processing of the measurement changes. To do so, we take our inspiration from other branches of physics where partial coherence is routinely dealt with, such as quantum optics and coherent diffractive imaging. This will have important and immediate applications, such as enabling the measurement of X-ray free-electron laser pulses despite timing jitter. PMID:25744080
Partial quantum logics revisited
NASA Astrophysics Data System (ADS)
Vetterlein, Thomas
2011-01-01
Partial Boolean algebras (PBAs) were introduced by Kochen and Specker as an algebraic model reflecting the mutual relationships among quantum-physical yes-no tests. The fact that not all pairs of tests are compatible was taken into special account. In this paper, we review PBAs from two sides. First, we generalise the concept, taking into account also those yes-no tests which are based on unsharp measurements. Namely, we introduce partial MV-algebras, and we define a corresponding logic. Second, we turn to the representation theory of PBAs. In analogy to the case of orthomodular lattices, we give conditions for a PBA to be isomorphic to the PBA of closed subspaces of a complex Hilbert space. Hereby, we do not restrict ourselves to purely algebraic statements; we rather give preference to conditions involving automorphisms of a PBA. We conclude by outlining a critical view on the logico-algebraic approach to the foundational problem of quantum physics.
PMESH: A parallel mesh generator
Hardin, D.D.
1994-10-21
The Parallel Mesh Generation (PMESH) Project is a joint LDRD effort by A Division and Engineering to develop a unique mesh generation system that can construct large calculational meshes (of up to 10{sup 9} elements) on massively parallel computers. Such a capability will remove a critical roadblock to unleashing the power of massively parallel processors (MPPs) for physical analysis. PMESH will support a variety of LLNL 3-D physics codes in the areas of electromagnetics, structural mechanics, thermal analysis, and hydrodynamics.
Nonlinear GRAPPA: a kernel approach to parallel MRI reconstruction.
Chang, Yuchou; Liang, Dong; Ying, Leslie
2012-09-01
GRAPPA linearly combines the undersampled k-space signals to estimate the missing k-space signals where the coefficients are obtained by fitting to some auto-calibration signals (ACS) sampled with Nyquist rate based on the shift-invariant property. At high acceleration factors, GRAPPA reconstruction can suffer from a high level of noise even with a large number of auto-calibration signals. In this work, we propose a nonlinear method to improve GRAPPA. The method is based on the so-called kernel method which is widely used in machine learning. Specifically, the undersampled k-space signals are mapped through a nonlinear transform to a high-dimensional feature space, and then linearly combined to reconstruct the missing k-space data. The linear combination coefficients are also obtained through fitting to the ACS data but in the new feature space. The procedure is equivalent to adding many virtual channels in reconstruction. A polynomial kernel with explicit mapping functions is investigated in this work. Experimental results using phantom and in vivo data demonstrate that the proposed nonlinear GRAPPA method can significantly improve the reconstruction quality over GRAPPA and its state-of-the-art derivatives. PMID:22161975
Parallel processor engine model program
NASA Technical Reports Server (NTRS)
Mclaughlin, P.
1984-01-01
The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.
Parallel computation with the force
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1985-01-01
A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Lau, Sonie
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Parallel Programming in the Age of Ubiquitous Parallelism
NASA Astrophysics Data System (ADS)
Pingali, Keshav
2014-04-01
Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs
General classification of partially polarized partially coherent beams
NASA Astrophysics Data System (ADS)
Martinez-Herrero, Rosario; Piquero, Gemma; Mejias, Pedro M.
2003-05-01
The behavior of the so-called generalized degree of polarization of partially coherent partially polarized beams upon free propagation is investigated. On the basis of this parameter a general classification scheme of partially polarized beams is proposed. The results are applied to certain classes of fields of special interest.
Experts' Understanding of Partial Derivatives Using the Partial Derivative Machine
ERIC Educational Resources Information Center
Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.
2015-01-01
Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of…
Parallel execution model for Prolog
Fagin, B.S.
1987-01-01
One candidate language for parallel symbolic computing is Prolog. Numerous ways for executing Prolog in parallel have been proposed, but current efforts suffer from several deficiencies. Many cannot support fundamental types of concurrency in Prolog. Other models are of purely theoretical interest, ignoring implementation costs. Detailed simulation studies of execution models are scare; at present little is known about the costs and benefits of executing Prolog in parallel. In this thesis, a new parallel execution model for Prolog is presented: the PPP model or Parallel Prolog Processor. The PPP supports AND-parallelism, OR-parallelism, and intelligent backtracking. An implementation of the PPP is described, through the extension of an existing Prolog abstract machine architecture. Several examples of PPP execution are presented, and compilation to the PPP abstract instruction set is discussed. The performance effects of this model are reported, based on a simulation of a large benchmark set. The implications of these results for parallel Prolog systems are discussed, and directions for future work are indicated.
Reordering computations for parallel execution
NASA Technical Reports Server (NTRS)
Adams, L.
1985-01-01
The computations are reordered in the SOR algorithm to maintain the same asymptotic rate of convergence as the rowwise ordering to obtain parallelism at different levels. A parallel program is written to illustrate these ideas and actual machines for implementation of this program are discussed.
Parallelizing Monte Carlo with PMC
Rathkopf, J.A.; Jones, T.R.; Nessett, D.M.; Stanberry, L.C.
1994-11-01
PMC (Parallel Monte Carlo) is a system of generic interface routines that allows easy porting of Monte Carlo packages of large-scale physics simulation codes to Massively Parallel Processor (MPP) computers. By loading various versions of PMC, simulation code developers can configure their codes to run in several modes: serial, Monte Carlo runs on the same processor as the rest of the code; parallel, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on other MPP processor(s); distributed, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on a different machine. This multi-mode approach allows maintenance of a single simulation code source regardless of the target machine. PMC handles passing of messages between nodes on the MPP, passing of messages between a different machine and the MPP, distributing work between nodes, and providing independent, reproducible sequences of random numbers. Several production codes have been parallelized under the PMC system. Excellent parallel efficiency in both the distributed and parallel modes results if sufficient workload is available per processor. Experiences with a Monte Carlo photonics demonstration code and a Monte Carlo neutronics package are described.
The Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Parallel contingency statistics with Titan.
Thompson, David C.; Pebay, Philippe Pierre
2009-09-01
This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.
Problem size, parallel architecture and optimal speedup
NASA Technical Reports Server (NTRS)
Nicol, David M.; Willard, Frank H.
1987-01-01
The communication and synchronization overhead inherent in parallel processing can lead to situations where adding processors to the solution method actually increases execution time. Problem type, problem size, and architecture type all affect the optimal number of processors to employ. The numerical solution of an elliptic partial differential equation is examined in order to study the relationship between problem size and architecture. The equation's domain is discretized into n sup 2 grid points which are divided into partitions and mapped onto the individual processor memories. The relationships between grid size, stencil type, partitioning strategy, processor execution time, and communication network type are analytically quantified. In so doing, the optimal number of processors was determined to assign to the solution, and identified (1) the smallest grid size which fully benefits from using all available processors, (2) the leverage on performance given by increasing processor speed or communication network speed, and (3) the suitability of various architectures for large numerical problems.
Problem size, parallel architecture, and optimal speedup
NASA Technical Reports Server (NTRS)
Nicol, David M.; Willard, Frank H.
1988-01-01
The communication and synchronization overhead inherent in parallel processing can lead to situations where adding processors to the solution method actually increases execution time. Problem type, problem size, and architecture type all affect the optimal number of processors to employ. The numerical solution of an elliptic partial differential equation is examined in order to study the relationship between problem size and architecture. The equation's domain is discretized into n sup 2 grid points which are divided into partitions and mapped onto the individual processor memories. The relationships between grid size, stencil type, partitioning strategy, processor execution time, and communication network type are analytically quantified. In so doing, the optimal number of processors was determined to assign to the solution, and identified (1) the smallest grid size which fully benefits from using all available processors, (2) the leverage on performance given by increasing processor speed or communication network speed, and (3) the suitability of various architectures for large numerical problems.
Parallel Vegetation Stripe Formation Through Hydrologic Interactions
NASA Astrophysics Data System (ADS)
Cheng, Y.; Stieglitz, M.; Engel, V.; Turk, G.
2009-12-01
Vegetation in many parts of the world display intriguing patterns: from the regularly spaced stripes on hillsides to the irregular mosaics. However, it has long been a challenge to describe how these patterns develop. Recently, there have been successes in describing pattern development mathematically. The Klausmeir model (Klausmeir., 1999), which simulates vegetation stripes perpendicular to flow field, consists of two partial differential equations that describe plant and surface water dynamics on a gently sloping landscape. More recently, Rietkerk et al (2004) proposed a simple 2D advection-diffusion model which differs from earlier models in that it includes for hydraulic head interactions. The Rietkerk model simulates plant-water and plant-nutrient dynamics and generates vegetation patterns of reasonable scales: 'maze patterns' on flat ground and stripes perpendicular to flow on slopes. However, to date none of these theoretical studies have been able to simulate the development of regularly spaced vegetation stripes parallel to flow direction. Such vegetation patterns are, for example, characteristic of the ridge and slough system (S&R) in the Everglades. We employ the Rietkerk model to describe for the first time to our knowledge, the formation of parallel stripes from hydrologic interactions. To simulate the perpendicular stripes, Rietkerk et al only allowed for the local advection of water and nutrient in one direction. To simulate parallel stripes, we retain the basic equations of the Rietkerk model but allow for constant advection of water and nutrient in one direction to simulate slope conditions, with evapotranspiration driven advection of water and nutrient perpendicular to the downhill flow direction. In this model, the relatively higher rates of evapotranspiration on the vegetation patches compared to the non-vegetated areas create hydraulic gradients, which then drive the convergence of dissolved nutrients from the downhill flow to the growing
Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library
Energy Science and Technology Software Center (ESTSC)
2015-02-19
ParFELAG is a parallel distributed memory C++ library for numerical upscaling of finite element discretizations. It provides optimal complesity algorithms ro build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured mesh (under the assumption that the topology of the agglomerated entities is correct). Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.
Partially segmented deformable mirror
Bliss, Erlan S.; Smith, James R.; Salmon, J. Thaddeus; Monjes, Julio A.
1991-01-01
A partially segmented deformable mirror is formed with a mirror plate having a smooth and continuous front surface and a plurality of actuators to its back surface. The back surface is divided into triangular areas which are mutually separated by grooves. The grooves are deep enough to make the plate deformable and the actuators for displacing the mirror plate in the direction normal to its surface are inserted in the grooves at the vertices of the triangular areas. Each actuator includes a transducer supported by a receptacle with outer shells having outer surfaces. The vertices have inner walls which are approximately perpendicular to the mirror surface and make planar contacts with the outer surfaces of the outer shells. The adhesive which is used on these contact surfaces tends to contract when it dries but the outer shells can bend and serve to minimize the tendency of the mirror to warp.
Partially segmented deformable mirror
Bliss, E.S.; Smith, J.R.; Salmon, J.T.; Monjes, J.A.
1991-05-21
A partially segmented deformable mirror is formed with a mirror plate having a smooth and continuous front surface and a plurality of actuators to its back surface. The back surface is divided into triangular areas which are mutually separated by grooves. The grooves are deep enough to make the plate deformable and the actuators for displacing the mirror plate in the direction normal to its surface are inserted in the grooves at the vertices of the triangular areas. Each actuator includes a transducer supported by a receptacle with outer shells having outer surfaces. The vertices have inner walls which are approximately perpendicular to the mirror surface and make planar contacts with the outer surfaces of the outer shells. The adhesive which is used on these contact surfaces tends to contract when it dries but the outer shells can bend and serve to minimize the tendency of the mirror to warp. 5 figures.
Krumpelt, Michael; Ahmed, Shabbir; Kumar, Romesh; Doshi, Rajiv
2000-01-01
A two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion. The dehydrogenation portion is a group VIII metal and the oxide-ion conducting portion is selected from a ceramic oxide crystallizing in the fluorite or perovskite structure. There is also disclosed a method of forming a hydrogen rich gas from a source of hydrocarbon fuel in which the hydrocarbon fuel contacts a two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion at a temperature not less than about 400.degree. C. for a time sufficient to generate the hydrogen rich gas while maintaining CO content less than about 5 volume percent. There is also disclosed a method of forming partially oxidized hydrocarbons from ethanes in which ethane gas contacts a two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion for a time and at a temperature sufficient to form an oxide.
Electrical conductivity anisotropy of partially molten peridotite under shear deformation
NASA Astrophysics Data System (ADS)
Zhang, B.; Yoshino, T.; Yamazaki, D.; Manthilake, G. M.; Katsura, T.
2013-12-01
Recent ocean bottom magnetotelluric investigations have revealed a high-conductivity layer (HCL) with high anisotropy characterized by higher conductivity values in the direction parallel to the plate motion beneath the southern East Pacific Rise (Evans et al., 2005) and beneath the edge of the Cocos plate at the Middle America trench offshore of Nicaragua (Naif et al., 2013). These geophysical observations have been attributed to either hydration (water) of mantle minerals or the presence of partial melt. Currently, aligned partial melt has been regarded as the most preferable candidate for explaining the conductivity anisotropy because of the implausibility of proton conduction (Yoshino et al., 2006). In this study, we report development of the conductivity anisotropy between parallel and normal to shear direction on the shear plane in partial molten peridotite as a function of time and shear strain. Starting samples were pre-synthesized partial molten peridotite, showing homogeneous melt distribution. The partially molten peridotite samples were deformed in simple shear geometry at 1 GPa and 1723 K in a DIA-type apparatus with uniaxial deformation facility. Conductivity difference between parallel and normal to shear direction reached one order, which is equivalent to that observed beneath asthenosphere. In contrast, such anisotropic behavior was not found in the melt-free samples, suggesting that development of the conductivity anisotropy was generated under shear stress. Microstructure of the deformed partial molten peridotite shows partial melt tends to preferentially locate grain boundaries parallel to shear direction, and forms continuously thin melt layer sub-parallel to the shear direction, whereas apparently isolated distribution was observed on the section perpendicular to the shear direction. The resultant melt morphology can be approximated by tube like geometry parallel to the shear direction. This observation suggests that the development of
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Lau, Sonie; Yan, Jerry C.
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.
Parallel NPARC: Implementation and Performance
NASA Technical Reports Server (NTRS)
Townsend, S. E.
1996-01-01
Version 3 of the NPARC Navier-Stokes code includes support for large-grain (block level) parallelism using explicit message passing between a heterogeneous collection of computers. This capability has the potential for significant performance gains, depending upon the block data distribution. The parallel implementation uses a master/worker arrangement of processes. The master process assigns blocks to workers, controls worker actions, and provides remote file access for the workers. The processes communicate via explicit message passing using an interface library which provides portability to a number of message passing libraries, such as PVM (Parallel Virtual Machine). A Bourne shell script is used to simplify the task of selecting hosts, starting processes, retrieving remote files, and terminating a computation. This script also provides a simple form of fault tolerance. An analysis of the computational performance of NPARC is presented, using data sets from an F/A-18 inlet study and a Rocket Based Combined Cycle Engine analysis. Parallel speedup and overall computational efficiency were obtained for various NPARC run parameters on a cluster of IBM RS6000 workstations. The data show that although NPARC performance compares favorably with the estimated potential parallelism, typical data sets used with previous versions of NPARC will often need to be reblocked for optimum parallel performance. In one of the cases studied, reblocking increased peak parallel speedup from 3.2 to 11.8.
Parallel incremental compilation. Doctoral thesis
Gafter, N.M.
1990-06-01
The time it takes to compile a large program has been a bottleneck in the software development process. When an interactive programming environment with an incremental compiler is used, compilation speed becomes even more important, but existing incremental compilers are very slow for some types of program changes. We describe a set of techniques that enable incremental compilation to exploit fine-grained concurrency in a shared-memory multi-processor and achieve asymptotic improvement over sequential algorithms. Because parallel non-incremental compilation is a special case of parallel incremental compilation, the design of a parallel compiler is a corollary of our result. Instead of running the individual phases concurrently, our design specifies compiler phases that are mutually sequential. However, each phase is designed to exploit fine-grained parallelism. By allowing each phase to present its output as a complete structure rather than as a stream of data, we can apply techniques such as parallel prefix and parallel divide-and-conquer, and we can construct applicative data structures to achieve sublinear execution time. Parallel algorithms for each phase of a compiler are presented to demonstrate that a complete incremental compiler can achieve execution time that is asymptotically less than sequential algorithms.
EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS
F. PETRINI; W. FENG
1999-09-01
We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.
Parallel integer sorting with medium and fine-scale parallelism
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens; Inglett, Todd Alan
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations
NASA Technical Reports Server (NTRS)
Chrisochoides, Nikos
1995-01-01
We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
Solving unstructured grid problems on massively parallel computers
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1990-01-01
A highly parallel graph mapping technique that enables one to efficiently solve unstructured grid problems on massively parallel computers is presented. Many implicit and explicit methods for solving discretized partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The cost of this communication can negate the high performance promised by massively parallel computing. To eliminate this bottleneck, the graph of the irregular problem is mapped into the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. It is shown that using the heuristic mapping algorithm significantly reduces the communication time compared to a naive assignment of processes to processors.
Automating the parallel processing of fluid and structural dynamics calculations
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.; Cole, Gary L.
1987-01-01
The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilties to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.
Automating the parallel processing of fluid and structural dynamics calculations
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.; Cole, Gary L.
1987-01-01
The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.
Parallel Architecture For Robotics Computation
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1990-01-01
Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.
Multigrid on massively parallel architectures
Falgout, R D; Jones, J E
1999-09-17
The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.
Parallel inverse iteration with reorthogonalization
Fann, G.I.; Littlefield, R.J.
1993-03-01
A parallel method for finding orthogonal eigenvectors of real symmetric tridiagonal is described. The method uses inverse iteration with repeated Modified Gram-Schmidt (MGS) reorthogonalization of the unconverged iterates for clustered eigenvalues. This approach is more parallelizable than reorthogonalizing against fully converged eigenvectors, as is done by LAPACK's current DSTEIN routine. The new method is found to provide accuracy and speed comparable to DSTEIN's and to have good parallel scalability even for matrices with large clusters of eigenvalues. We present al results for residual and orthogonality tests, plus timings on IBM RS/6000 (sequential) and Intel Touchstone DELTA (parallel) computers.
Parallel inverse iteration with reorthogonalization
Fann, G.I.; Littlefield, R.J.
1993-03-01
A parallel method for finding orthogonal eigenvectors of real symmetric tridiagonal is described. The method uses inverse iteration with repeated Modified Gram-Schmidt (MGS) reorthogonalization of the unconverged iterates for clustered eigenvalues. This approach is more parallelizable than reorthogonalizing against fully converged eigenvectors, as is done by LAPACK`s current DSTEIN routine. The new method is found to provide accuracy and speed comparable to DSTEIN`s and to have good parallel scalability even for matrices with large clusters of eigenvalues. We present al results for residual and orthogonality tests, plus timings on IBM RS/6000 (sequential) and Intel Touchstone DELTA (parallel) computers.
Appendix E: Parallel Pascal development system
NASA Technical Reports Server (NTRS)
1985-01-01
The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Turbomachinery CFD on parallel computers
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.
1992-01-01
The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.
Predicting performance of parallel computations
NASA Technical Reports Server (NTRS)
Mak, Victor W.; Lundstrom, Stephen F.
1990-01-01
An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described. A parallel computation is modeled as a task system with precedence relationships expressed as a series-parallel directed acyclic graph. Resources in a concurrent system are modeled as service centers in a queuing network model. Using these two models as inputs, the method outputs predictions of expected execution time of the parallel computation and the concurrent system utilization. The method is validated against both detailed simulation and actual execution on a commercial multiprocessor. Using 100 test cases, the average error of the prediction when compared to simulation statistics is 1.7 percent, with a standard deviation of 1.5 percent; the maximum error is about 10 percent.
Parallel hierarchical method in networks
NASA Astrophysics Data System (ADS)
Malinochka, Olha; Tymchenko, Leonid
2007-09-01
This method of parallel-hierarchical Q-transformation offers new approach to the creation of computing medium - of parallel -hierarchical (PH) networks, being investigated in the form of model of neurolike scheme of data processing [1-5]. The approach has a number of advantages as compared with other methods of formation of neurolike media (for example, already known methods of formation of artificial neural networks). The main advantage of the approach is the usage of multilevel parallel interaction dynamics of information signals at different hierarchy levels of computer networks, that enables to use such known natural features of computations organization as: topographic nature of mapping, simultaneity (parallelism) of signals operation, inlaid cortex, structure, rough hierarchy of the cortex, spatially correlated in time mechanism of perception and training [5].
"Feeling" Series and Parallel Resistances.
ERIC Educational Resources Information Center
Morse, Robert A.
1993-01-01
Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)
Demonstrating Forces between Parallel Wires.
ERIC Educational Resources Information Center
Baker, Blane
2000-01-01
Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)
Parallel computation using limited resources
Sugla, B.
1985-01-01
This thesis addresses itself to the task of designing and analyzing parallel algorithms when the resources of processors, communication, and time are limited. The two parts of this thesis deal with multiprocessor systems and VLSI - the two important parallel processing environments that are prevalent today. In the first part a time-processor-communication tradeoff analysis is conducted for two kinds of problems - N input, 1 output, and N input, N output computations. In the class of problems of the second kind, the problem of prefix computation, an important problem due to the number of naturally occurring computations it can model, is studied. Finally, a general methodology is given for design of parallel algorithms that can be used to optimize a given design to a wide set of architectural variations. The second part of the thesis considers the design of parallel algorithms for the VLSI model of computation when the resource of time is severely restricted.
Parallel algorithms for message decomposition
Teng, S.H.; Wang, B.
1987-06-01
The authors consider the deterministic and random parallel complexity (time and processor) of message decoding: an essential problem in communications systems and translation systems. They present an optimal parallel algorithm to decompose prefix-coded messages and uniquely decipherable-coded messages in O(n/P) time, using O(P) processors (for all P:1 less than or equal toPless than or equal ton/log n) deterministically as well as randomly on the weakest version of parallel random access machines in which concurrent read and concurrent write to a cell in the common memory are not allowed. This is done by reducing decoding to parallel finite-state automata simulation and the prefix sums.
HEATR project: ATR algorithm parallelization
NASA Astrophysics Data System (ADS)
Deardorf, Catherine E.
1998-09-01
High Performance Computing (HPC) Embedded Application for Target Recognition (HEATR) is a project funded by the High Performance Computing Modernization Office through the Common HPC Software Support Initiative (CHSSI). The goal of CHSSI is to produce portable, parallel, multi-purpose, freely distributable, support software to exploit emerging parallel computing technologies and enable application of scalable HPC's for various critical DoD applications. Specifically, the CHSSI goal for HEATR is to provide portable, parallel versions of several existing ATR detection and classification algorithms to the ATR-user community to achieve near real-time capability. The HEATR project will create parallel versions of existing automatic target recognition (ATR) detection and classification algorithms and generate reusable code that will support porting and software development process for ATR HPC software. The HEATR Team has selected detection/classification algorithms from both the model- based and training-based (template-based) arena in order to consider the parallelization requirements for detection/classification algorithms across ATR technology. This would allow the Team to assess the impact that parallelization would have on detection/classification performance across ATR technology. A field demo is included in this project. Finally, any parallel tools produced to support the project will be refined and returned to the ATR user community along with the parallel ATR algorithms. This paper will review: (1) HPCMP structure as it relates to HEATR, (2) Overall structure of the HEATR project, (3) Preliminary results for the first algorithm Alpha Test, (4) CHSSI requirements for HEATR, and (5) Project management issues and lessons learned.
Nevzorova, Y A; Tolba, R; Trautwein, C; Liedtke, C
2015-04-01
The surgical procedure of two-thirds partial hepatectomy (PH) in rodents was first described more than 80 years ago by Higgins and Anderson. Nevertheless, this technique is still a state-of-the-art method for the community of liver researchers as it allows the in-depth analysis of signalling pathways involved in liver regeneration and hepatocarcinogenesis. The importance of PH as a key method in experimental hepatology has even increased in the last decade due to the increasing availability of genetically-modified mouse strains. Here, we propose a standard operating procedure (SOP) for the implementation of PH in mice, which is based on our experience of more than 10 years. In particular, the SOP offers all relevant background information on the PH model and provides comprehensive guidelines for planning and performing PH experiments. We provide established recommendations regarding optimal age and gender of animals, use of appropriate anaesthesia and biometric calculation of the experiments. We finally present an easy-to-follow step-by-step description of the complete surgical procedure including required materials, critical steps and postoperative management. This SOP especially takes into account the latest changes in animal welfare rules in the European Union but is still in agreement with current international regulations. In summary, this article provides comprehensive information for the legal application, design and implementation of PH experiments. PMID:25835741
Partial covariate adjusted regression
Şentürk, Damla; Nguyen, Danh V.
2008-01-01
Covariate adjusted regression (CAR) is a recently proposed adjustment method for regression analysis where both the response and predictors are not directly observed (Şentürk and Müller, 2005). The available data has been distorted by unknown functions of an observable confounding covariate. CAR provides consistent estimators for the coefficients of the regression between the variables of interest, adjusted for the confounder. We develop a broader class of partial covariate adjusted regression (PCAR) models to accommodate both distorted and undistorted (adjusted/unadjusted) predictors. The PCAR model allows for unadjusted predictors, such as age, gender and demographic variables, which are common in the analysis of biomedical and epidemiological data. The available estimation and inference procedures for CAR are shown to be invalid for the proposed PCAR model. We propose new estimators and develop new inference tools for the more general PCAR setting. In particular, we establish the asymptotic normality of the proposed estimators and propose consistent estimators of their asymptotic variances. Finite sample properties of the proposed estimators are investigated using simulation studies and the method is also illustrated with a Pima Indians diabetes data set. PMID:20126296
Architectures for reasoning in parallel
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.
1989-01-01
The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Efficiency of parallel direct optimization.
Janies, D A; Wheeler, W C
2001-03-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. PMID:12240679
The NICMOS Parallel Observing Program
NASA Astrophysics Data System (ADS)
McCarthy, Patrick
2002-07-01
We propose to manage the default set of pure parallels with NICMOS. Our experience with both our GO NICMOS parallel program and the public parallel NICMOS programs in cycle 7 prepared us to make optimal use of the parallel opportunities. The NICMOS G141 grism remains the most powerful survey tool for HAlpha emission-line galaxies at cosmologically interesting redshifts. It is particularly well suited to addressing two key uncertainties regarding the global history of star formation: the peak rate of star formation in the relatively unexplored but critical 1<= z <= 2 epoch, and the amount of star formation missing from UV continuum-based estimates due to high extinction. Our proposed deep G141 exposures will increase the sample of known HAlpha emission- line objects at z ~ 1.3 by roughly an order of magnitude. We will also obtain a mix of F110W and F160W images along random sight-lines to examine the space density and morphologies of the reddest galaxies. The nature of the extremely red galaxies remains unclear and our program of imaging and grism spectroscopy provides unique information regarding both the incidence of obscured star bursts and the build up of stellar mass at intermediate redshifts. In addition to carrying out the parallel program we will populate a public database with calibrated spectra and images, and provide limited ground- based optical and near-IR data for the deepest parallel fields.
Partial lipodystrophy in coeliac disease.
O'Mahony, D; O'Mahony, S; Whelton, M J; McKiernan, J
1990-01-01
The association of coeliac disease and partial lipodystrophy is described. The patient also had deficiencies of serum IgA and C3 complement (the latter associated with partial lipodystrophy). In addition, there was subclinical dermatitis herpetiformis confirmed by skin biopsy. The facial wasting of fully developed partial lipodystrophy may be misinterpreted as a sign of malabsorption but the facial, upper limb, and truncal lipodystrophy contrasts with normal pelvic and lower limb appearances. Images Figure 1 Figure 2 PMID:2379878
Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Trigonometric Integrals via Partial Fractions
ERIC Educational Resources Information Center
Chen, H.; Fulford, M.
2005-01-01
Parametric differentiation is used to derive the partial fractions decompositions of certain rational functions. Those decompositions enable us to integrate some new combinations of trigonometric functions.
Low partial discharge vacuum feedthrough
NASA Technical Reports Server (NTRS)
Benham, J. W.; Peck, S. R.
1979-01-01
Relatively discharge free vacuum feedthrough uses silver-plated copper conductor jacketed by carbon filled silicon semiconductor to reduce concentrated electric fields and minimize occurrence of partial discharge.
Experts' understanding of partial derivatives using the partial derivative machine
NASA Astrophysics Data System (ADS)
Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.
2015-12-01
[This paper is part of the Focused Collection on Upper Division Physics Courses.] Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of students in STEM disciplines. In this paper, we report on an initial study of expert understanding of partial derivatives across three disciplines: physics, engineering, and mathematics. We report on the central research question of how disciplinary experts understand partial derivatives, and how their concept images of partial derivatives differ, with a focus on experimentally measured quantities. Using the partial derivative machine (PDM), we probed expert understanding of partial derivatives in an experimental context without a known functional form. In particular, we investigated which representations were cued by the experts' interactions with the PDM. Whereas the physicists and engineers were quick to use measurements to find a numeric approximation for a derivative, the mathematicians repeatedly returned to speculation as to the functional form; although they were comfortable drawing qualitative conclusions about the system from measurements, they were reluctant to approximate the derivative through measurement. On a theoretical front, we found ways in which existing frameworks for the concept of derivative could be expanded to include numerical approximation.
The economics of parallel trade.
Danzon, P M
1998-03-01
The potential for parallel trade in the European Union (EU) has grown with the accession of low price countries and the harmonisation of registration requirements. Parallel trade implies a conflict between the principle of autonomy of member states to set their own pharmaceutical prices, the principle of free trade and the industrial policy goal of promoting innovative research and development (R&D). Parallel trade in pharmaceuticals does not yield the normal efficiency gains from trade because countries achieve low pharmaceutical prices by aggressive regulation, not through superior efficiency. In fact, parallel trade reduces economic welfare by undermining price differentials between markets. Pharmaceutical R&D is a global joint cost of serving all consumers worldwide; it accounts for roughly 30% of total costs. Optimal (welfare maximising) pricing to cover joint costs (Ramsey pricing) requires setting different prices in different markets, based on inverse demand elasticities. By contrast, parallel trade and regulation based on international price comparisons tend to force price convergence across markets. In response, manufacturers attempt to set a uniform 'euro' price. The primary losers from 'euro' pricing will be consumers in low income countries who will face higher prices or loss of access to new drugs. In the long run, even higher income countries are likely to be worse off with uniform prices, because fewer drugs will be developed. One policy option to preserve price differentials is to exempt on-patent products from parallel trade. An alternative is confidential contracting between individual manufacturers and governments to provide country-specific ex post discounts from the single 'euro' wholesale price, similar to rebates used by managed care in the US. This would preserve differentials in transactions prices even if parallel trade forces convergence of wholesale prices. PMID:10178655
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)
NASA Technical Reports Server (NTRS)
Straeter, T. A.; Markos, A. T.
1975-01-01
A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
Bounded Parallel-Batch Scheduling on Unrelated Parallel Machines
NASA Astrophysics Data System (ADS)
Miao, Cuixia; Zhang, Yuzhong; Wang, Chengfei
In this paper, we consider the bounded parallel-batch scheduling problem on unrelated parallel machines. Problems R m |B|F are NP-hard for any objective function F. For this reason, we discuss the special case with p ij = p i for i = 1, 2, ⋯ , m , j = 1, 2, ⋯ , n. We give optimal algorithms for the general scheduling to minimize total weighted completion time, makespan and the number of tardy jobs. And we design pseudo-polynomial time algorithms for the case with rejection penalty to minimize the makespan and the total weighted completion time plus the total penalty of the rejected jobs, respectively.
Parallelizing Timed Petri Net simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1993-01-01
The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
Forces and pressures in adsorbing partially directed walks
NASA Astrophysics Data System (ADS)
Janse van Rensburg, E. J.; Prellberg, T.
2016-05-01
Polymers in confined spaces lose conformational entropy. This induces a net repulsive entropic force on the walls of the confining space. A model for this phenomenon is a lattice walk between confining walls, and in this paper a model of an adsorbing partially directed walk is used. The walk is placed in a half square lattice {{{L}}}+2 with boundary \\partial {{{L}}}+2, and confined between two vertical parallel walls, which are vertical lines in the lattice, a distance w apart. The free energy of the walk is determined, as a function of w, for walks with endpoints in the confining walls and adsorbing in \\partial {{{L}}}+2. This gives the entropic force on the confining walls as a function of w. It is shown that there are zero force points in this model and the locations of these points are determined, in some cases exactly, and in other cases asymptotically.
Are Electron Partial Waves Real
NASA Astrophysics Data System (ADS)
Yenen, O.; McLaughlin, K. W.
2005-05-01
Experiments determining the partial wave content of electrons are uncommon. The standard approach to partial wave expansion of the wavefunction of electrons often ignores their spin. In this non-relativistic approximation the partial waves are labeled by their orbital angular momentum quantum number, e.g. d-waves. As our previous work has shown, this non-relativistic approximation usually fails for photoelectrons. Partial waves should be further specified by their total angular momentum. With d-waves for example, one would need to distinguish between d3/2 and d5/2 partial waves. Although energetically degenerate, fully relativistic d3/2 and d5/2 partial waves of photoelectrons have fundamentally different angular distributions. Using experimental and theoretical methods we have developed, we obtain partial wave probabilities of photoelectrons from polarization measurements of ionic fluorescence. We found that for selected states of the residual ion, there are energy regions where the photoelectron is in a single partial wave with predictable angular distributions.
PARAVT: Parallel Voronoi Tessellation code
NASA Astrophysics Data System (ADS)
Gonzalez, Roberto E.
2016-01-01
We present a new open source code for massive parallel computation of Voronoi tessellations(VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition take into account consistent boundary computation between tasks, and support periodic conditions. In addition, the code compute neighbors lists, Voronoi density and Voronoi cell volumes for each particle, and can compute density on a regular grid.
Massively parallel MRI detector arrays
NASA Astrophysics Data System (ADS)
Keil, Boris; Wald, Lawrence L.
2013-04-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays.
Massively Parallel MRI Detector Arrays
Keil, Boris; Wald, Lawrence L
2013-01-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758
Massively parallel MRI detector arrays.
Keil, Boris; Wald, Lawrence L
2013-04-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called "ultimate" SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758
Fast data parallel polygon rendering
Ortega, F.A.; Hansen, C.D.
1993-09-01
This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.
Parallel integrated frame synchronizer chip
NASA Technical Reports Server (NTRS)
Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)
2000-01-01
A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.
Parallel Adaptive Mesh Refinement Library
NASA Technical Reports Server (NTRS)
Mac-Neice, Peter; Olson, Kevin
2005-01-01
Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.
Visualizing Parallel Computer System Performance
NASA Technical Reports Server (NTRS)
Malony, Allen D.; Reed, Daniel A.
1988-01-01
Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.
Hybrid parallel programming with MPI and Unified Parallel C.
Dinan, J.; Balaji, P.; Lusk, E.; Sadayappan, P.; Thakur, R.; Mathematics and Computer Science; The Ohio State Univ.
2010-01-01
The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.
Parallel algorithms for mapping pipelined and parallel computations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Gang scheduling a parallel machine
Gorda, B.C.; Brooks, E.D. III.
1991-12-01
Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processes. User programs and their gangs of processes are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantum are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory.
Gang scheduling a parallel machine
Gorda, B.C.; Brooks, E.D. III.
1991-03-01
Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.
ITER LHe Plants Parallel Operation
NASA Astrophysics Data System (ADS)
Fauve, E.; Bonneton, M.; Chalifour, M.; Chang, H.-S.; Chodimella, C.; Monneret, E.; Vincent, G.; Flavien, G.; Fabre, Y.; Grillot, D.
The ITER Cryogenic System includes three identical liquid helium (LHe) plants, with a total average cooling capacity equivalent to 75 kW at 4.5 K.The LHe plants provide the 4.5 K cooling power to the magnets and cryopumps. They are designed to operate in parallel and to handle heavy load variations.In this proceedingwe will describe the presentstatusof the ITER LHe plants with emphasis on i) the project schedule, ii) the plantscharacteristics/layout and iii) the basic principles and control strategies for a stable operation of the three LHe plants in parallel.
Medipix2 parallel readout system
NASA Astrophysics Data System (ADS)
Fanti, V.; Marzeddu, R.; Randaccio, P.
2003-08-01
A fast parallel readout system based on a PCI board has been developed in the framework of the Medipix collaboration. The readout electronics consists of two boards: the motherboard directly interfacing the Medipix2 chip, and the PCI board with digital I/O ports 32 bits wide. The device driver and readout software have been developed at low level in Assembler to allow fast data transfer and image reconstruction. The parallel readout permits a transfer rate up to 64 Mbytes/s. http://medipix.web.cern ch/MEDIPIX/
Parallelization of the SIR code
NASA Astrophysics Data System (ADS)
Thonhofer, S.; Bellot Rubio, L. R.; Utz, D.; Jurčak, J.; Hanslmeier, A.; Piantschitsch, I.; Pauritsch, J.; Lemmerer, B.; Guttenbrunner, S.
A high-resolution 3-dimensional model of the photospheric magnetic field is essential for the investigation of small-scale solar magnetic phenomena. The SIR code is an advanced Stokes-inversion code that deduces physical quantities, e.g. magnetic field vector, temperature, and LOS velocity, from spectropolarimetric data. We extended this code by the capability of directly using large data sets and inverting the pixels in parallel. Due to this parallelization it is now feasible to apply the code directly on extensive data sets. Besides, we included the possibility to use different initial model atmospheres for the inversion, which enhances the quality of the results.
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
Parallel multiscale simulations of a brain aneurysm
NASA Astrophysics Data System (ADS)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Parallel multiscale simulations of a brain aneurysm.
Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Automating parallel implementation of neural learning algorithms.
Rana, O F
2000-06-01
Neural learning algorithms generally involve a number of identical processing units, which are fully or partially connected, and involve an update function, such as a ramp, a sigmoid or a Gaussian function for instance. Some variations also exist, where units can be heterogeneous, or where an alternative update technique is employed, such as a pulse stream generator. Associated with connections are numerical values that must be adjusted using a learning rule, and and dictated by parameters that are learning rule specific, such as momentum, a learning rate, a temperature, amongst others. Usually, neural learning algorithms involve local updates, and a global interaction between units is often discouraged, except in instances where units are fully connected, or involve synchronous updates. In all of these instances, concurrency within a neural algorithm cannot be fully exploited without a suitable implementation strategy. A design scheme is described for translating a neural learning algorithm from inception to implementation on a parallel machine using PVM or MPI libraries, or onto programmable logic such as FPGAs. A designer must first describe the algorithm using a specialised Neural Language, from which a Petri net (PN) model is constructed automatically for verification, and building a performance model. The PN model can be used to study issues such as synchronisation points, resource sharing and concurrency within a learning rule. Specialised constructs are provided to enable a designer to express various aspects of a learning rule, such as the number and connectivity of neural nodes, the interconnection strategies, and information flows required by the learning algorithm. A scheduling and mapping strategy is then used to translate this PN model onto a multiprocessor template. We demonstrate our technique using a Kohonen and backpropagation learning rules, implemented on a loosely coupled workstation cluster, and a dedicated parallel machine, with PVM libraries
High Productivity Implantation ''PARTIAL IMPLANT''
Hino, Masayoshi; Miyamoto, Naoki; Sakai, Shigeki; Matsumoto, Takao
2008-11-03
The patterned ion implantation 'PARTIAL IMPLANT' has been developed as a productivity improvement tool. The Partial Implant can form several different ion dose areas on the wafer surface by controlling the speed of wafer moving and the stepwise rotation of twist axis. The Partial Implant system contains two implant methods. One method is 'DIVIDE PARTIAL IMPLANT', that is aimed at reducing the consumption of the wafer. The Divide Partial Implant evenly divides dose area on one wafer surface into two or three different dose part. Any dose can be selected in each area. So the consumption of the wafer for experimental implantation can be reduced. The second method is 'RING PARTIAL IMPLANT' that is aimed at improving yield by correcting electrical characteristic of devices. The Ring Partial Implant can form concentric ion dose areas. The dose of wafer external area can be selected to be within plus or minus 30% of dose of wafer central area. So the electrical characteristic of devices can be corrected by controlling dose at edge side on the wafer.
Computer copings for partial coverage.
Denissen, H; van der Zel, J; Reisig, J; Vlaar, S; de Ruiter, W; van Waas, R
1999-04-01
Partial coverage posterior tooth preparations are very complex surfaces for computer surface digitization, computer design, and manufacture of ceramic copings. The aim of this study was therefore to determine whether the Computer Integrated Crown Reconstruction (Cicero) system was compatible with a proposed partial coverage preparation design and capable of producing ceramic copings. Posterior teeth were prepared for partial coverage copings with deep gingival chamfers in the proximal boxes and around the functional cusps (buccal of mandibular and lingual of maxillary posterior teeth). The nonfunctional cusps (lingual of mandibular and buccal of maxillary posterior teeth) were prepared with broad bevels following the inclined occlusal plane pattern. Optical impressions were taken of stone dies by means of a fast laser-line scanning method that measured the three-dimensional geometry of the partial coverage preparation. Computers digitized the images, and designed and produced the ceramic copings. The Cicero system digitized the partial coverage preparation surfaces precisely with a minor coefficient of variance of 0.2%. The accuracy of the surface digitization, the design, and the computer aided milling showed that the system was capable of producing partial coverage copings with a mean marginal gap of 74 microns. This value was obtained before optimizing the marginal fit by means of porcelain veneering. In summary, Cicero computer technology, i.e., surface digitization, coping design, and manufacture, was compatible with the described partial coverage preparations for posterior teeth. PMID:11351490
Coherent-mode decomposition of partially polarized, partially coherent sources
NASA Astrophysics Data System (ADS)
Gori, Franco; Santarsiero, Massimo; Simon, Raja; Piquero, Gemma; Borghi, Riccardo; Guattari, Giorgio
2003-01-01
It is shown that any partially polarized, partially coherent source can be expressed in terms of a suitable superposition of transverse coherent modes with orthogonal polarization states. Such modes are determined through the solution of a system of two coupled integral equations. An example, for which the modal decomposition is obtained in closed form in terms of fully linearly polarized Hermite Gaussian modes, is given.
Coherent-mode decomposition of partially polarized, partially coherent sources.
Gori, Franco; Santarsiero, Massimo; Simon, Raja; Piquero, Gemma; Borghi, Riccardo; Guattari, Giorgio
2003-01-01
It is shown that any partially polarized, partially coherent source can be expressed in terms of a suitable superposition of transverse coherent modes with orthogonal polarization states. Such modes are determined through the solution of a system of two coupled integral equations. An example, for which the modal decomposition is obtained in closed form in terms of fully linearly polarized Hermite Gaussian modes, is given. PMID:12542320
Parallel, Distributed Scripting with Python
Miller, P J
2002-05-24
Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI library gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.
Matpar: Parallel Extensions for MATLAB
NASA Technical Reports Server (NTRS)
Springer, P. L.
1998-01-01
Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.
Coupled parallel waveguide semiconductor laser
NASA Technical Reports Server (NTRS)
Katz, J.; Kapon, E.; Lindsey, C.; Rav-Noy, Z.; Margalit, S.; Yariv, A.; Mukai, S.
1984-01-01
The operation of a new type of tunable laser, where the two separately controlled individual lasers are placed vertically in parallel, has been demonstrated. One of the cavities ('control' cavity) is operated below threshold and assists the longitudinal mode selection and tuning of the other laser. With a minor modification, the same device can operate as an independent two-wavelength laser source.
File concepts for parallel I/O
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1989-01-01
The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
Vandewalle, S.
1994-12-31
Time-stepping methods for parabolic partial differential equations are essentially sequential. This prohibits the use of massively parallel computers unless the problem on each time-level is very large. This observation has led to the development of algorithms that operate on more than one time-level simultaneously; that is to say, on grids extending in space and in time. The so-called parabolic multigrid methods solve the time-dependent parabolic PDE as if it were a stationary PDE discretized on a space-time grid. The author has investigated the use of multigrid waveform relaxation, an algorithm developed by Lubich and Ostermann. The algorithm is based on a multigrid acceleration of waveform relaxation, a highly concurrent technique for solving large systems of ordinary differential equations. Another method of this class is the time-parallel multigrid method. This method was developed by Hackbusch and was recently subject of further study by Horton. It extends the elliptic multigrid idea to the set of equations that is derived by discretizing a parabolic problem in space and in time.
Cluster-based parallel image processing toolkit
NASA Astrophysics Data System (ADS)
Squyres, Jeffery M.; Lumsdaine, Andrew; Stevenson, Robert L.
1995-03-01
Many image processing tasks exhibit a high degree of data locality and parallelism and map quite readily to specialized massively parallel computing hardware. However, as network technologies continue to mature, workstation clusters are becoming a viable and economical parallel computing resource, so it is important to understand how to use these environments for parallel image processing as well. In this paper we discuss our implementation of parallel image processing software library (the Parallel Image Processing Toolkit). The Toolkit uses a message- passing model of parallelism designed around the Message Passing Interface (MPI) standard. Experimental results are presented to demonstrate the parallel speedup obtained with the Parallel Image Processing Toolkit in a typical workstation cluster over a wide variety of image processing tasks. We also discuss load balancing and the potential for parallelizing portions of image processing tasks that seem to be inherently sequential, such as visualization and data I/O.
Mirror versus parallel bimanual reaching
2013-01-01
Background In spite of their importance to everyday function, tasks that require both hands to work together such as lifting and carrying large objects have not been well studied and the full potential of how new technology might facilitate recovery remains unknown. Methods To help identify the best modes for self-teleoperated bimanual training, we used an advanced haptic/graphic environment to compare several modes of practice. In a 2-by-2 study, we compared mirror vs. parallel reaching movements, and also compared veridical display to one that transforms the right hand’s cursor to the opposite side, reducing the area that the visual system has to monitor. Twenty healthy, right-handed subjects (5 in each group) practiced 200 movements. We hypothesized that parallel reaching movements would be the best performing, and attending to one visual area would reduce the task difficulty. Results The two-way comparison revealed that mirror movement times took an average 1.24 s longer to complete than parallel. Surprisingly, subjects’ movement times moving to one target (attending to one visual area) also took an average of 1.66 s longer than subjects moving to two targets. For both hands, there was also a significant interaction effect, revealing the lowest errors for parallel movements moving to two targets (p < 0.001). This was the only group that began and maintained low errors throughout training. Conclusion Combined with other evidence, these results suggest that the most intuitive reaching performance can be observed with parallel movements with a veridical display (moving to two separate targets). These results point to the expected levels of challenge for these bimanual training modes, which could be used to advise therapy choices in self-neurorehabilitation. PMID:23837908
Parallel shortest augmenting path algorithm for the assignment problem. Technical report
Balas, E.; Miller, D.; Pekny, J.; Toth, P.
1989-04-01
We describe a parallel version of the shortest augmenting path algorithm for the assignment problem. While generating the initial dual solution and partial assignment in parallel does not require substantive changes in the sequential algorithm, using several augmenting paths in parallel does require a new dual variable recalculation method. The parallel algorithm was tested on a 14-processor Butterfly Plus computer, on problems with up to 900 million variables. The speedup obtained increases with problem size. The algorithm was also embedded into a parallel branch and bound procedure for the traveling salesman problem on a directed graph, which was tested on the Butterfly Plus on problems involving up to 7,500 cities. To our knowledge, these are the largest assignment problems and traveling salesman problems solved so far.
Zhdanov, V. M. Stepanenko, A. A.
2013-12-15
The influence of resonant charge exchange for ion-atom interaction on the viscosity of partially ionized plasma embedded in the magnetic field is investigated. The general system of equations used to derive the viscosity coefficients for an arbitrary plasma component in the 21-moment approximation of Grad’s method is presented. The expressions for the coefficients of total and partial viscosities of a multicomponent partially ionized plasma in the magnetic field are obtained. As an example, the coefficients of the parallel and transverse viscosities for the ionic and neutral components of the partially ionized hydrogen plasma are calculated. It is shown that the account for resonant charge exchange can lead to a substantial change of the parallel and transverse viscosity of the plasma components in the region of low degrees of ionization on the order of 0.1.
Low Mach number parallel and quasi-parallel shocks
NASA Technical Reports Server (NTRS)
Omidi, N.; Quest, K. B.; Winske, D.
1990-01-01
The properties of low-Mach-number parallel and quasi-parallel shocks are studied using the results of one-dimensional hybrid simulations. It is shown that both the structure and ion dissipation at the shocks differ considerably. In the parallel limit, the shock remains coupled to the piston and consists of large-amplitude magnetosonic-whistler waves in the upstream, through the shock and into the downstream region, where the waves eventually damp out. These waves are generated by an ion beam instability due to the interaction between the incident and piston-reflected ions. The excited waves decelerate the plasma sufficiently that it becomes stable far into the downstream. The increase in ion temperature along the shock normal in the downstream region is due to superposition of incident and piston-rflected ions. These two populations of ions remain distinct through the downstream region. While they are both gyrophase-bunched, their counterstreaming nature results in a 180-deg phase shift in their perpendicular velocities.
Accuracy of different impression materials in parallel and nonparallel implants
Vojdani, Mahroo; Torabi, Kianoosh; Ansarifard, Elham
2015-01-01
Background: A precise impression is mandatory to obtain passive fit in implant-supported prostheses. The aim of this study was to compare the accuracy of three impression materials in both parallel and nonparallel implant positions. Materials and Methods: In this experimental study, two partial dentate maxillary acrylic models with four implant analogues in canines and lateral incisors areas were used. One model was simulating the parallel condition and the other nonparallel one, in which implants were tilted 30° bucally and 20° in either mesial or distal directions. Thirty stone casts were made from each model using polyether (Impregum), additional silicone (Monopren) and vinyl siloxanether (Identium), with open tray technique. The distortion values in three-dimensions (X, Y and Z-axis) were measured by coordinate measuring machine. Two-way analysis of variance (ANOVA), one-way ANOVA and Tukey tests were used for data analysis (α = 0.05). Results: Under parallel condition, all the materials showed comparable, accurate casts (P = 0.74). In the presence of angulated implants, while Monopren showed more accurate results compared to Impregum (P = 0.01), Identium yielded almost similar results to those produced by Impregum (P = 0.27) and Monopren (P = 0.26). Conclusion: Within the limitations of this study, in parallel conditions, the type of impression material cannot affect the accuracy of the implant impressions; however, in nonparallel conditions, polyvinyl siloxane is shown to be a better choice, followed by vinyl siloxanether and polyether respectively. PMID:26288620
Numerical computation on massively parallel hypercubes. [Connection machine
McBryan, O.A.
1986-01-01
We describe numerical computations on the Connection Machine, a massively parallel hypercube architecture with 65,536 single-bit processors and 32 Mbytes of memory. A parallel extension of COMMON LISP, provides access to the processors and network. The rich software environment is further enhanced by a powerful virtual processor capability, which extends the degree of fine-grained parallelism beyond 1,000,000. We briefly describe the hardware and indicate the principal features of the parallel programming environment. We then present implementations of SOR, multigrid and pre-conditioned conjugate gradient algorithms for solving partial differential equations on the Connection Machine. Despite the lack of floating point hardware, computation rates above 100 megaflops have been achieved in PDE solution. Virtual processors prove to be a real advantage, easing the effort of software development while improving system performance significantly. The software development effort is also facilitated by the fact that hypercube communications prove to be fast and essentially independent of distance. 29 refs., 4 figs.
Parallel computation with adaptive methods for elliptic and hyperbolic systems
Benantar, M.; Biswas, R.; Flaherty, J.E.; Shephard, M.S.
1990-01-01
We consider the solution of two dimensional vector systems of elliptic and hyperbolic partial differential equations on a shared memory parallel computer. For elliptic problems, the spatial domain is discretized using a finite quadtree mesh generation procedure and the differential system is discretized by a finite element-Galerkin technique with a piecewise linear polynomial basis. Resulting linear algebraic systems are solved using the conjugate gradient technique with element-by-element and symmetric successive over-relaxation preconditioners. Stiffness matrix assembly and linear system solutions are processed in parallel with computations scheduled on noncontiguous quadrants of the tree in order to minimize process synchronization. Determining noncontiguous regions by coloring the regular finite quadtree structure is far simpler than coloring elements of the unstructured mesh that the finite quadtree procedure generates. We describe linear-time complexity coloring procedures that use six and eight colors.
Josephson current in parallel SFS junctions
NASA Astrophysics Data System (ADS)
Ioselevich, Pavel; Ostrovsky, Pavel; Fominov, Yakov; Feigelman, Mikhail
We study a Josephson junction between superconductors connected by two parallel ferromagnetic arms. If the ferromagnets are fully polarised, supercurrent can only flow via Cooper pair splitting between the differently polarised arms. The disorder-average current is suppressed, but mesoscopic fluctuations lead to a significant typical current. We extract the typical current from a current-current correlator. The current is proportional to sin2 α / 2 , where α is the angle between the polarisations of the two arms, revealing the spin dependence of crossed Andreev reflection. Compared to an SNS device of the same geometry, the typical SFS current is small by a factor determined by the properties of the superconducting leads alone. The current is insensitive to the flux threading the area between the ferromagnetic arms of the junction. However, if the ferromagnetic arms are replaced by metal with magnetic impurities, or partially polarised ferromagnets, the Josephson current starts depending on the flux with a period of h / e , i.e. twice the superconducting flux quantum.
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Two Level Parallel Grammatical Evolution
NASA Astrophysics Data System (ADS)
Ošmera, Pavel
This paper describes a Two Level Parallel Grammatical Evolution (TLPGE) that can evolve complete programs using a variable length linear genome to govern the mapping of a Backus Naur Form grammar definition. To increase the efficiency of Grammatical Evolution (GE) the influence of backward processing was tested and a second level with differential evolution was added. The significance of backward coding (BC) and the comparison with standard coding of GEs is presented. The new method is based on parallel grammatical evolution (PGE) with a backward processing algorithm, which is further extended with a differential evolution algorithm. Thus a two-level optimization method was formed in attempt to take advantage of the benefits of both original methods and avoid their difficulties. Both methods used are discussed and the architecture of their combination is described. Also application is discussed and results on a real-word application are described.
Template matching on parallel architectures
Sher
1985-07-01
Many important problems in computer vision can be characterized as template-matching problems on edge images. Some examples are circle detection and line detection. Two techniques for template matching are the Hough transform and correlation. There are two algorithms for correlation: a shift-and-add-based technique and a Fourier-transform-based technique. The most efficient algorithm of these three varies depending on the size of the template and the structure of the image. On different parallel architectures, the choice of algorithms for a specific problem is different. This paper describes two parallel architectures: the WARP and the Butterfly and describes why and how the criterion for making the choice of algorithms differs between the two machines.
Parallel supercomputing with commodity components
Warren, M.S.; Goda, M.P.; Becker, D.J.
1997-09-01
We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10{sup 15} floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.
Parallel processing spacecraft communication system
NASA Technical Reports Server (NTRS)
Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)
1998-01-01
An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.
Parallel multiplex laser feedback interferometry
Zhang, Song; Tan, Yidong; Zhang, Shulian
2013-12-15
We present a parallel multiplex laser feedback interferometer based on spatial multiplexing which avoids the signal crosstalk in the former feedback interferometer. The interferometer outputs two close parallel laser beams, whose frequencies are shifted by two acousto-optic modulators by 2Ω simultaneously. A static reference mirror is inserted into one of the optical paths as the reference optical path. The other beam impinges on the target as the measurement optical path. Phase variations of the two feedback laser beams are simultaneously measured through heterodyne demodulation with two different detectors. Their subtraction accurately reflects the target displacement. Under typical room conditions, experimental results show a resolution of 1.6 nm and accuracy of 7.8 nm within the range of 100 μm.
Massively parallel quantum computer simulator
NASA Astrophysics Data System (ADS)
De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.
2007-01-01
We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
A parallel graph coloring heuristic
Jones, M.T.; Plassmann, P.E. )
1993-05-01
The problem of computing good graph colorings arises in many diverse applications, such as in the estimation of sparse Jacobians and in the development of efficient, parallel iterative methods for solving sparse linear systems. This paper presents an asynchronous graph coloring heuristic well suited to distributed memory parallel computers. Experimental results obtained on an Intel iPSC/860 are presented, which demonstrate that, for graphs arising from finite element applications, the heuristic exhibits scalable performance and generates colorings usually within three or four colors of the best-known linear time sequential heuristics. For bounded degree graphs, it is shown that the expected running time of the heuristic under the P-Ram computation model is bounded by EO(log(n)/log log(n)). This bound is an improvement over the previously known best upper bound for the expected running time of a random heuristic for the graph coloring problem.
Instruction-level parallel processing.
Fisher, J A; Rau, R
1991-09-13
The performance of microprocessors has increased steadily over the past 20 years at a rate of about 50% per year. This is the cumulative result of architectural improvements as well as increases in circuit speed. Moreover, this improvement has been obtained in a transparent fashion, that is, without requiring programmers to rethink their algorithms and programs, thereby enabling the tremendous proliferation of computers that we see today. To continue this performance growth, microprocessor designers have incorporated instruction-level parallelism (ILP) into new designs. ILP utilizes the parallel execution ofthe lowest level computer operations-adds, multiplies, loads, and so on-to increase performance transparently. The use of ILP promises to make possible, within the next few years, microprocessors whose performance is many times that of a CRAY-IS. This article provides an overview of ILP, with an emphasis on ILP architectures-superscalar, VLIW, and dataflow processors-and the compiler techniques necessary to make ILP work well. PMID:17831442
Parallel supercomputing with commodity components
NASA Technical Reports Server (NTRS)
Warren, M. S.; Goda, M. P.; Becker, D. J.
1997-01-01
We have implemented a parallel computer architecture based entirely upon commodity personal computer components. Using 16 Intel Pentium Pro microprocessors and switched fast ethernet as a communication fabric, we have obtained sustained performance on scientific applications in excess of one Gigaflop. During one production astrophysics treecode simulation, we performed 1.2 x 10(sup 15) floating point operations (1.2 Petaflops) over a three week period, with one phase of that simulation running continuously for two weeks without interruption. We report on a variety of disk, memory and network benchmarks. We also present results from the NAS parallel benchmark suite, which indicate that this architecture is competitive with current commercial architectures. In addition, we describe some software written to support efficient message passing, as well as a Linux device driver interface to the Pentium hardware performance monitoring registers.
A new parallel simulation technique
NASA Astrophysics Data System (ADS)
Blanco-Pillado, Jose J.; Olum, Ken D.; Shlaer, Benjamin
2012-01-01
We develop a "semi-parallel" simulation technique suggested by Pretorius and Lehner, in which the simulation spacetime volume is divided into a large number of small 4-volumes that have only initial and final surfaces. Thus there is no two-way communication between processors, and the 4-volumes can be simulated independently and potentially at different times. This technique allows us to simulate much larger volumes than we otherwise could, because we are not limited by total memory size. No processor time is lost waiting for other processors. We compare a cosmic string simulation we developed using the semi-parallel technique with our previous MPI-based code for several test cases and find a factor of 2.6 improvement in the total amount of processor time required to accomplish the same job for strings evolving in the matter-dominated era.
Scans as primitive parallel operations
Blelloch, G.E. . Dept. of Computer Science)
1989-11-01
In most parallel random access machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can execute in no more time than these parallel memory references. This paper outlines an extensive study of the effect of including, in the PRAM models, such scan operations as unit-time primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(log n) factor greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. The authors argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design. These all run on an EREW PRAM with the addition of two scan primitives.
Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring
NASA Technical Reports Server (NTRS)
Padovan, Joe; Kwang, Abel
1994-01-01
This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.
Parallel Power Grid Simulation Toolkit
Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol
2015-09-14
ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.
Efficient, massively parallel eigenvalue computation
NASA Technical Reports Server (NTRS)
Huo, Yan; Schreiber, Robert
1993-01-01
In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.
Massively parallel femtosecond laser processing.
Hasegawa, Satoshi; Ito, Haruyasu; Toyoda, Haruyoshi; Hayasaki, Yoshio
2016-08-01
Massively parallel femtosecond laser processing with more than 1000 beams was demonstrated. Parallel beams were generated by a computer-generated hologram (CGH) displayed on a spatial light modulator (SLM). The key to this technique is to optimize the CGH in the laser processing system using a scheme called in-system optimization. It was analytically demonstrated that the number of beams is determined by the horizontal number of pixels in the SLM N_{SLM} that is imaged at the pupil plane of an objective lens and a distance parameter p_{d} obtained by dividing the distance between adjacent beams by the diffraction-limited beam diameter. A performance limitation of parallel laser processing in our system was estimated at N_{SLM} of 250 and p_{d} of 7.0. Based on these parameters, the maximum number of beams in a hexagonal close-packed structure was calculated to be 1189 by using an analytical equation. PMID:27505815
Highly parallel sparse Cholesky factorization
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Laparoscopic total and partial nephrectomy.
Lee, Benjamin R
2002-01-01
Laparoscopic radical nephrectomy has established its role as a standard of care for the management of renal neoplasms. Long term follow-up has demonstrated laparoscopic radical nephrectomy has shorter patient hospitalization and effective cancer control, with no significant difference in survival compared with open radical nephrectomy. For renal masses less than 4cm, partial nephrectomy is indicated for patients with a solitary kidney or who demonstrate impairment of contralateral renal function. The major technical issue for success of laparoscopic partial nephrectomy is bleeding control and several techniques have been developed to achieve better hemostatic control. Development of new laparoscopic techniques for partial nephrectomy can be divided into 2 categories: hilar control and warm ischemia vs. no hilar control. Development of a laparoscopic Satinsky clamp has achieved en bloc control of the renal hilum in order to allow cold knife excision of the mass, with laparoscopic repair of the collecting system, if needed. Combination of laparoscopic partial nephrectomy with ablative techniques has achieved successful excision of renal masses with adequate hemostasis without hilar clamping. Other techniques without hilar control have been investigated and included the use of a microwave tissue coagulator. In conclusion, laparoscopic radical nephrectomy for renal cell carcinoma has clearly demonstrated low morbidity and equivalent cancer control. The rates for local recurrences and metastatic spread are low and actuarial survival high. Furthermore, laparoscopic partial nephrectomy has demonstrated to be technically feasible, with low morbidity. With short term outcomes demonstrating laparoscopic partial nephrectomy as an efficacious procedure, the role of laparoscopic partial nephrectomy should continue to increase. PMID:15748397
Complete characterization of partially coherent and partially polarized optical fields.
Basso, Gabriel; Oliveira, Luimar; Vidal, Itamar
2014-03-01
We suggest a method to access the second-order, or two-point, Stokes parameters of a partially coherent and partially polarized Gaussian model optical field from an intensity interferometry experiment. Through a remarkably simple experimental arrangement, it is possible to measure the two-point and one-point Stokes parameters simultaneously, allowing the reconstruction of the coherence matrix and the polarization matrix, thus completely characterizing the optical field both statistically and locally on the observation plane. Developments, automation, and applications are pointed out. PMID:24690711
NWChem: scalable parallel computational chemistry
van Dam, Hubertus JJ; De Jong, Wibe A.; Bylaska, Eric J.; Govind, Niranjan; Kowalski, Karol; Straatsma, TP; Valiev, Marat
2011-11-01
NWChem is a general purpose computational chemistry code specifically designed to run on distributed memory parallel computers. The core functionality of the code focuses on molecular dynamics, Hartree-Fock and density functional theory methods for both plane-wave basis sets as well as Gaussian basis sets, tensor contraction engine based coupled cluster capabilities and combined quantum mechanics/molecular mechanics descriptions. It was realized from the beginning that scalable implementations of these methods required a programming paradigm inherently different from what message passing approaches could offer. In response a global address space library, the Global Array Toolkit, was developed. The programming model it offers is based on using predominantly one-sided communication. This model underpins most of the functionality in NWChem and the power of it is exemplified by the fact that the code scales to tens of thousands of processors. In this paper the core capabilities of NWChem are described as well as their implementation to achieve an efficient computational chemistry code with high parallel scalability. NWChem is a modern, open source, computational chemistry code1 specifically designed for large scale parallel applications2. To meet the challenges of developing efficient, scalable and portable programs of this nature a particular code design was adopted. This code design involved two main features. First of all, the code is build up in a modular fashion so that a large variety of functionality can be integrated easily. Secondly, to facilitate writing complex parallel algorithms the Global Array toolkit was developed. This toolkit allows one to write parallel applications in a shared memory like approach, but offers additional mechanisms to exploit data locality to lower communication overheads. This framework has proven to be very successful in computational chemistry but is applicable to any engineering domain. Within the context created by the features
Parallel micromanipulation method for microassembly
NASA Astrophysics Data System (ADS)
Sin, Jeongsik; Stephanou, Harry E.
2001-09-01
Microassembly deals with micron or millimeter scale objects where the tolerance requirements are in the micron range. Typical applications include electronics components (silicon fabricated circuits), optoelectronics components (photo detectors, emitters, amplifiers, optical fibers, microlenses, etc.), and MEMS (Micro-Electro-Mechanical-System) dies. The assembly processes generally require not only high precision but also high throughput at low manufacturing cost. While conventional macroscale assembly methods have been utilized in scaled down versions for microassembly applications, they exhibit limitations on throughput and cost due to the inherently serialized process. Since the assembly process depends heavily on the manipulation performance, an efficient manipulation method for small parts will have a significant impact on the manufacturing of miniaturized products. The objective of this study on 'parallel micromanipulation' is to achieve these three requirements through the handling of multiple small parts simultaneously (in parallel) with high precision (micromanipulation). As a step toward this objective, a new manipulation method is introduced. The method uses a distributed actuation array for gripper free and parallel manipulation, and a centralized, shared actuator for simplified controls. The method has been implemented on a testbed 'Piezo Active Surface (PAS)' in which an actively generated friction force field is the driving force for part manipulation. Basic motion primitives, such as translation and rotation of objects, are made possible with the proposed method. This study discusses the design of the proposed manipulation method PAS, and the corresponding manipulation mechanism. The PAS consists of two piezoelectric actuators for X and Y motion, two linear motion guides, two sets of nozzle arrays, and solenoid valves to switch the pneumatic suction force on and off in individual nozzles. One array of nozzles is fixed relative to the surface on
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
NASA Astrophysics Data System (ADS)
Lu, Liuyan; Lantz, Steven R.; Ren, Zhuyin; Pope, Stephen B.
2009-08-01
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f_mpi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Landsliding in partially saturated materials
Godt, J.W.; Baum, R.L.; Lu, N.
2009-01-01
[1] Rainfall-induced landslides are pervasive in hillslope environments around the world and among the most costly and deadly natural hazards. However, capturing their occurrence with scientific instrumentation in a natural setting is extremely rare. The prevailing thinking on landslide initiation, particularly for those landslides that occur under intense precipitation, is that the failure surface is saturated and has positive pore-water pressures acting on it. Most analytic methods used for landslide hazard assessment are based on the above perception and assume that the failure surface is located beneath a water table. By monitoring the pore water and soil suction response to rainfall, we observed shallow landslide occurrence under partially saturated conditions for the first time in a natural setting. We show that the partially saturated shallow landslide at this site is predictable using measured soil suction and water content and a novel unified effective stress concept for partially saturated earth materials. Copyright 2009 by the American Geophysical Union.
Partial integration raises antitrust concerns.
Brock, T H; Kamoie, B E
2000-11-01
Recently, providers have begun to explore a new model of integrated delivery system, the partially integrated IDS. Typically, a partially integrated IDS is a joint venture, owned by a core group of providers that maintains complete financial and operational independence outside the joint venture. The IDS contracts with other providers to furnish services that the part-owners do not furnish. A partially integrated IDS raises antitrust concerns because the participating providers may be seen as competitors banding together to set prices jointly for healthcare services. Therefore, to minimize their antitrust exposure, providers that are considering this model should be careful to structure the IDS in accordance with the relevant Federal antitrust laws (i.e., Section 1 of the Sherman Act), taking into account the Federal antitrust agencies' various guidelines and enforcement policies. PMID:11688054
Partial Priapism Treated with Pentoxifylline
Cooper, Meghan A.; Carrion, Rafael E.; Yang, Christopher
2015-01-01
ABSTRACT Main findings: A 26-year-old man suffering from partial priapism was successfully treated with a regimen including pentoxifylline, a nonspecific phosphodiesterase inhibitor that is often used to conservatively treat Peyronie's disease. Case hypothesis: Partial priapism is an extremely rare urological condition that is characterized by thrombosis within the proximal segment of a single corpus cavernosum. There have only been 36 reported cases to date. Although several factors have been associated with this unusual disorder, such as trauma or bicycle riding, the etiology is still not completely understood. Treatment is usually conservative and consists of a non-steroidal anti-inflammatory and anti-thrombotic. Promising future implications: This case report supports the utilization of pentoxifylline in patients with partial priapism due to its anti-fibrogenic and anti-thrombotic properties. PMID:26401875
Implementing clips on a parallel computer
NASA Technical Reports Server (NTRS)
Riley, Gary
1987-01-01
The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.
Parallelizing alternating direction implicit solver on GPUs
Technology Transfer Automated Retrieval System (TEKTRAN)
We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...
Shirzad, A.
2007-08-15
Gauge fixing may be done in different ways. We show that using the chain structure to describe a constrained system enables us to use either a full gauge, in which all gauged degrees of freedom are determined, or a partial gauge, in which some first class constraints remain as subsidiary conditions to be imposed on the solutions of the equations of motion. We also show that the number of constants of motion depends on the level in a constraint chain in which the gauge fixing condition is imposed. The relativistic point particle, electromagnetism, and the Polyakov string are discussed as examples and full or partial gauges are distinguished.
Partial pressure analysis of plasmas
Dylla, H.F.
1984-11-01
The application of partial pressure analysis for plasma diagnostic measurements is reviewed. A comparison is made between the techniques of plasma flux analysis and partial pressure analysis for mass spectrometry of plasmas. Emphasis is given to the application of quadrupole mass spectrometers (QMS). The interface problems associated with the coupling of a QMS to a plasma device are discussed including: differential-pumping requirements, electromagnetic interferences from the plasma environment, the detection of surface-active species, ion source interactions, and calibration procedures. Example measurements are presented from process monitoring of glow discharge plasmas which are useful for cleaning and conditioning vacuum vessels.
Apparatus for generating partially coherent radiation
Naulleau, Patrick P.
2005-02-22
Techniques for generating partially coherent radiation and particularly for converting effectively coherent radiation from a synchrotron to partially coherent EUV radiation suitable for projection lithography.
Global Arrays Parallel Programming Toolkit
Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarría-Miranda, Daniel
2011-01-01
The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving
Parallel machine architecture and compiler design facilities
NASA Technical Reports Server (NTRS)
Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex
1990-01-01
The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.
Force user's manual: A portable, parallel FORTRAN
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.
1990-01-01
The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
High Performance Parallel Computational Nanotechnology
NASA Technical Reports Server (NTRS)
Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to
Scalable Parallel Algebraic Multigrid Solvers
Bank, R; Lu, S; Tong, C; Vassilevski, P
2005-03-23
The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.
Fault-tolerant parallel processor
Harper, R.E.; Lala, J.H. )
1991-06-01
This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer. Interprocessor connectivity requirements are met by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity. Redundant groups are synchronized solely by message transmissions and receptions, which aslo provide input data consistency and output voting. Reliability analysis results are presented that demonstrate the reduced failure probability of such a system. Performance analysis results are presented that quantify the temporal overhead involved in executing such fault-tolerance-specific operations. Empirical performance measurements of prototypes of the architecture are presented. 30 refs.
Parallel Assembly of LIGA Components
Christenson, T.R.; Feddema, J.T.
1999-03-04
In this paper, a prototype robotic workcell for the parallel assembly of LIGA components is described. A Cartesian robot is used to press 386 and 485 micron diameter pins into a LIGA substrate and then place a 3-inch diameter wafer with LIGA gears onto the pins. Upward and downward looking microscopes are used to locate holes in the LIGA substrate, pins to be pressed in the holes, and gears to be placed on the pins. This vision system can locate parts within 3 microns, while the Cartesian manipulator can place the parts within 0.4 microns.
True Shear Parallel Plate Viscometer
NASA Technical Reports Server (NTRS)
Ethridge, Edwin; Kaukler, William
2010-01-01
This viscometer (which can also be used as a rheometer) is designed for use with liquids over a large temperature range. The device consists of horizontally disposed, similarly sized, parallel plates with a precisely known gap. The lower plate is driven laterally with a motor to apply shear to the liquid in the gap. The upper plate is freely suspended from a double-arm pendulum with a sufficiently long radius to reduce height variations during the swing to negligible levels. A sensitive load cell measures the shear force applied by the liquid to the upper plate. Viscosity is measured by taking the ratio of shear stress to shear rate.
Exploring Parallel Concordancing in English and Chinese.
ERIC Educational Resources Information Center
Lixun, Wang
2001-01-01
Investigates the value of computer technology as a medium for the delivery of parallel texts in English and Chinese for language learning. A English-Chinese parallel corpus was created for use in parallel concordancing--a technique that has been developed to respond to the desire to study language in its natural contexts of use. (Author/VWL)
Parallel Computing Using Web Servers and "Servlets".
ERIC Educational Resources Information Center
Lo, Alfred; Bloor, Chris; Choi, Y. K.
2000-01-01
Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
Parallel Computation Of Forward Dynamics Of Manipulators
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1993-01-01
Report presents parallel algorithms and special parallel architecture for computation of forward dynamics of robotics manipulators. Products of effort to find best method of parallel computation to achieve required computational efficiency. Significant speedup of computation anticipated as well as cost reduction.
NASA Astrophysics Data System (ADS)
Fliegner, D.; Rétey, A.; Vermaseren, J. A. M.
2001-08-01
The parallel version of the symbolic manipulation program FORM for clusters of workstations and massive parallel systems is presented. We discuss various cluster architectures and the implementation of the parallel program using message passing (MPI). Performance results for real physics applications are shown.
Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism
ERIC Educational Resources Information Center
Agarwal, Mayank
2009-01-01
The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…
Parallel Processing at the High School Level.
ERIC Educational Resources Information Center
Sheary, Kathryn Anne
This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…
17 CFR 12.24 - Parallel proceedings.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 17 Commodity and Securities Exchanges 1 2010-04-01 2010-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...
17 CFR 12.24 - Parallel proceedings.
Code of Federal Regulations, 2014 CFR
2014-04-01
... 17 Commodity and Securities Exchanges 1 2014-04-01 2014-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...
17 CFR 12.24 - Parallel proceedings.
Code of Federal Regulations, 2013 CFR
2013-04-01
... 17 Commodity and Securities Exchanges 1 2013-04-01 2013-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...
17 CFR 12.24 - Parallel proceedings.
Code of Federal Regulations, 2011 CFR
2011-04-01
... 17 Commodity and Securities Exchanges 1 2011-04-01 2011-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...
17 CFR 12.24 - Parallel proceedings.
Code of Federal Regulations, 2012 CFR
2012-04-01
... 17 Commodity and Securities Exchanges 1 2012-04-01 2012-04-01 false Parallel proceedings. 12.24... REPARATIONS General Information and Preliminary Consideration of Pleadings § 12.24 Parallel proceedings. (a) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration...
A role for partial endothelial-mesenchymal transitions in angiogenesis?
Welch-Reardon, Katrina M; Wu, Nan; Hughes, Christopher C W
2015-02-01
The contribution of epithelial-to-mesenchymal transitions (EMT) in both developmental and pathological conditions has been widely recognized and studied. In a parallel process, governed by a similar set of signaling and transcription factors, endothelial-to-mesenchymal transitions (EndoMT) contribute to heart valve formation and the generation of cancer-associated fibroblasts. During angiogenic sprouting, endothelial cells express many of the same genes and break down basement membrane; however, they retain intercellular junctions and migrate as a connected train of cells rather than as individual cells. This has been termed a partial endothelial-to-mesenchymal transition. A key regulatory check-point determines whether cells undergo a full or a partial epithelial-to-mesenchymal transitions/endothelial-to-mesenchymal transition; however, very little is known about how this switch is controlled. Here we discuss these developmental/pathological pathways, with a particular focus on their role in vascular biology. PMID:25425619
A role for partial endothelial-mesenchymal transitions in angiogenesis?
Welch-Reardon, Katrina M.; Wu, Nan; Hughes, Christopher C.W.
2016-01-01
The contribution of epithelial-to-mesenchymal transitions (EMT) in both developmental and pathological conditions has been widely recognized and studied. In a parallel process, governed by a similar set of signaling and transcription factors, endothelial-to-mesenchymal transitions (EndoMT) contribute to heart valve formation and the generation of cancer-associated-fibroblasts. During angiogenic sprouting endothelial cells express many of the same genes and break down basement membrane, however they retain intercellular junctions and migrate as a connected “train” of cells rather than as individual cells. This has been termed a partial EndoMT. A key regulatory check-point determines whether cells undergo a full or a partial EMT/EndoMT, however, very little is known about how this switch is controlled. Here we discuss these developmental/pathologic pathways, with a particular focus on their role in vascular biology. PMID:25425619
Leadership in Partially Distributed Teams
ERIC Educational Resources Information Center
Plotnick, Linda
2009-01-01
Inter-organizational collaboration is becoming more common. When organizations collaborate they often do so in partially distributed teams (PDTs). A PDT is a hybrid team that has at least one collocated subteam and at least two subteams that are geographically distributed and communicate primarily through electronic media. While PDTs share many…
Partially molten magma ocean model
Shirley, D.N.
1983-02-15
The properties of the lunar crust and upper mantle can be explained if the outer 300-400 km of the moon was initially only partially molten rather than fully molten. The top of the partially molten region contained about 20% melt and decreased to 0% at 300-400 km depth. Nuclei of anorthositic crust formed over localized bodies of magma segregated from the partial melt, then grew peripherally until they coverd the moon. Throughout most of its growth period the anorthosite crust floated on a layer of magma a few km thick. The thickness of this layer is regulated by the opposing forces of loss of material by fractional crystallization and addition of magma from the partial melt below. Concentrations of Sr, Eu, and Sm in pristine ferroan anorthosites are found to be consistent with this model, as are trends for the ferroan anorthosites and Mg-rich suites on a diagram of An in plagioclase vs. mg in mafics. Clustering of Eu, Sr, and mg values found among pristine ferroan anorthosites are predicted by this model.
Xyce parallel electronic simulator design.
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Series-connected shaded modules to address partial shading conditions in SPV systems
NASA Astrophysics Data System (ADS)
Pareek, Smita; Dahiya, Ratna
2016-03-01
With the progress of technology and reduced cost of PV cells, the PV systems are being installed in many countries, including India. Even though this method of power generation has sufficient potential but its effective utilization is still lacking. This is because the output power of PV cells depends on many factors like insolation, temperature, climate conditions prevailing nearby, aging, using modules from different technologies/manufacturers or partial shading conditions. Among these factors, partial shading causes major reduction in output power despite the size of PV systems. As a result, the produced power is lower than the expected value. The connection of modules to each other has great impact on output power if they are prone to partial shading conditions. In this paper, PV arrays are investigated under partial shading conditions. The results show that partial shading losses can be minimized by connecting shaded modules in series rather than in parallel.
1998-06-01
In Wisconsin, physicians stopped performing abortions when a Federal District Court Judge refused to issue a temporary restraining order against the state's newly enacted "partial birth" abortion ban that was couched in such vague language it actually covered all abortions. While ostensibly attempting to ban late-term "intact dilation and extraction," the language of the law did not refer to that procedure or to late terms. Instead, it prohibited all abortions in which a physician "partially vaginally delivers a living child, causes the death of the partially delivered child with the intent to kill the child and then completes the delivery of the child." The law also defined "child" as "a human being from the time of fertilization" until birth. It is clear that this abortion ban is unconstitutional under Row v. Wade, and this unconstitutionality is compounded by the fact that the law allowed no exception to protect a woman's health, which is required by Roe for abortion bans after fetal viability. Wisconsin is only one of about 28 states that have enacted similar laws, and only two have restricted the ban to postviability abortions. Many of these laws have been struck down in court, and President Clinton has continued to veto the Federal partial-birth bill. The Wisconsin Judge acknowledged that opponents of the ban will likely prevail when the case is heard, but his action in denying the temporary injunction means that many women in Wisconsin will not receive timely medical care. The partial birth strategy is really only another anti-abortion strategy. PMID:12348556
Efficient parallel global garbage collection on massively parallel computers
Kamada, Tomio; Matsuoka, Satoshi; Yonezawa, Akinori
1994-12-31
On distributed-memory high-performance MPPs where processors are interconnected by an asynchronous network, efficient Garbage Collection (GC) becomes difficult due to inter-node references and references within pending, unprocessed messages. The parallel global GC algorithm (1) takes advantage of reference locality, (2) efficiently traverses references over nodes, (3) admits minimum pause time of ongoing computations, and (4) has been shown to scale up to 1024 node MPPs. The algorithm employs a global weight counting scheme to substantially reduce message traffic. The two methods for confirming the arrival of pending messages are used: one counts numbers of messages and the other uses network `bulldozing.` Performance evaluation in actual implementations on a multicomputer with 32-1024 nodes, Fujitsu AP1000, reveals various favorable properties of the algorithm.
A parallel execution model for Prolog
Fagin, B.
1987-01-01
In this thesis a new parallel execution model for Prolog is presented: The PPP model or Parallel Prolog Processor. The PPP supports AND-parallelism, OR- parallelism, and intelligent backtracking. An implementation of the PPP is described, through the extension of an existing Prolog abstract machine architecture. Several examples of PPP execution are presented and compilation to the PPP abstract instructions set is discussed. The performance effects of this model are reported, based on a simulation of a large benchmark set. The implications of these results for parallel Prolog systems are discussed, and directions for future work are indicated.
[Parallel PLS algorithm using MapReduce and its aplication in spectral modeling].
Yang, Hui-Hua; Du, Ling-Ling; Li, Ling-Qiao; Tang, Tian-Biao; Guo, Tuo; Liang, Qiong-Lin; Wang, Yi-Ming; Luo, Guo-An
2012-09-01
Partial least squares (PLS) has been widely used in spectral analysis and modeling, and it is computation-intensive and time-demanding when dealing with massive data To solve this problem effectively, a novel parallel PLS using MapReduce is proposed, which consists of two procedures, the parallelization of data standardizing and the parallelization of principal component computing. Using NIR spectral modeling as an example, experiments were conducted on a Hadoop cluster, which is a collection of ordinary computers. The experimental results demonstrate that the parallel PLS algorithm proposed can handle massive spectra, can significantly cut down the modeling time, and gains a basically linear speedup, and can be easily scaled up. PMID:23240405
Parallel block schemes for large scale least squares computations
Golub, G.H.; Plemmons, R.J.; Sameh, A.
1986-04-01
Large scale least squares computations arise in a variety of scientific and engineering problems, including geodetic adjustments and surveys, medical image analysis, molecular structures, partial differential equations and substructuring methods in structural engineering. In each of these problems, matrices often arise which possess a block structure which reflects the local connection nature of the underlying physical problem. For example, such super-large nonlinear least squares computations arise in geodesy. Here the coordinates of positions are calculated by iteratively solving overdetermined systems of nonlinear equations by the Gauss-Newton method. The US National Geodetic Survey will complete this year (1986) the readjustment of the North American Datum, a problem which involves over 540 thousand unknowns and over 6.5 million observations (equations). The observation matrix for these least squares computations has a block angular form with 161 diagnonal blocks, each containing 3 to 4 thousand unknowns. In this paper parallel schemes are suggested for the orthogonal factorization of matrices in block angular form and for the associated backsubstitution phase of the least squares computations. In addition, a parallel scheme for the calculation of certain elements of the covariance matrix for such problems is described. It is shown that these algorithms are ideally suited for multiprocessors with three levels of parallelism such as the Cedar system at the University of Illinois. 20 refs., 7 figs.
Domain decomposition methods for the parallel computation of reacting flows
NASA Technical Reports Server (NTRS)
Keyes, David E.
1988-01-01
Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
Information hiding in parallel programs
Foster, I.
1992-01-30
A fundamental principle in program design is to isolate difficult or changeable design decisions. Application of this principle to parallel programs requires identification of decisions that are difficult or subject to change, and the development of techniques for hiding these decisions. We experiment with three complex applications, and identify mapping, communication, and scheduling as areas in which decisions are particularly problematic. We develop computational abstractions that hide such decisions, and show that these abstractions can be used to develop elegant solutions to programming problems. In particular, they allow us to encode common structures, such as transforms, reductions, and meshes, as software cells and templates that can reused in different applications. An important characteristic of these structures is that they do not incorporate mapping, communication, or scheduling decisions: these aspects of the design are specified separately, when composing existing structures to form applications. This separation of concerns allows the same cells and templates to be reused in different contexts.
Nanocapillary Adhesion between Parallel Plates.
Cheng, Shengfeng; Robbins, Mark O
2016-08-01
Molecular dynamics simulations are used to study capillary adhesion from a nanometer scale liquid bridge between two parallel flat solid surfaces. The capillary force, Fcap, and the meniscus shape of the bridge are computed as the separation between the solid surfaces, h, is varied. Macroscopic theory predicts the meniscus shape and the contribution of liquid/vapor interfacial tension to Fcap quite accurately for separations as small as two or three molecular diameters (1-2 nm). However, the total capillary force differs in sign and magnitude from macroscopic theory for h ≲ 5 nm (8-10 diameters) because of molecular layering that is not included in macroscopic theory. For these small separations, the pressure tensor in the fluid becomes anisotropic. The components in the plane of the surface vary smoothly and are consistent with theory based on the macroscopic surface tension. Capillary adhesion is affected by only the perpendicular component, which has strong oscillations as the molecular layering changes. PMID:27413872
Embodied and Distributed Parallel DJing.
Cappelen, Birgitta; Andersson, Anders-Petter
2016-01-01
Everyone has a right to take part in cultural events and activities, such as music performances and music making. Enforcing that right, within Universal Design, is often limited to a focus on physical access to public areas, hearing aids etc., or groups of persons with special needs performing in traditional ways. The latter might be people with disabilities, being musicians playing traditional instruments, or actors playing theatre. In this paper we focus on the innovative potential of including people with special needs, when creating new cultural activities. In our project RHYME our goal was to create health promoting activities for children with severe disabilities, by developing new musical and multimedia technologies. Because of the users' extreme demands and rich contribution, we ended up creating both a new genre of musical instruments and a new art form. We call this new art form Embodied and Distributed Parallel DJing, and the new genre of instruments for Empowering Multi-Sensorial Things. PMID:27534347
Parallel discovery of Alzheimer's therapeutics.
Lo, Andrew W; Ho, Carole; Cummings, Jayna; Kosik, Kenneth S
2014-06-18
As the prevalence of Alzheimer's disease (AD) grows, so do the costs it imposes on society. Scientific, clinical, and financial interests have focused current drug discovery efforts largely on the single biological pathway that leads to amyloid deposition. This effort has resulted in slow progress and disappointing outcomes. Here, we describe a "portfolio approach" in which multiple distinct drug development projects are undertaken simultaneously. Although a greater upfront investment is required, the probability of at least one success should be higher with "multiple shots on goal," increasing the efficiency of this undertaking. However, our portfolio simulations show that the risk-adjusted return on investment of parallel discovery is insufficient to attract private-sector funding. Nevertheless, the future cost savings of an effective AD therapy to Medicare and Medicaid far exceed this investment, suggesting that government funding is both essential and financially beneficial. PMID:24944190
Parallel Network Simulations with NEURON
Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.
2009-01-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488
A highly parallel signal processor
NASA Astrophysics Data System (ADS)
Bigham, Jackson D., Jr.
There is an increasing need for signal processors functional across a broad range of problems, from radar systems to E-O and ESM applications. To meet this challenge, a signal processing system capable of efficiently meeting the processing requirements over a broad range of avionics sensor systems has been developed. The CDC Parallel Modular Signal Processor (PMSP) is a complete MIL/E-5400-qualified digital signal processing system capable of computation rates greater than 600 MOPS (million operations per second). The signal processing element of the PMSP is the Micro-AFP. It is an all-VLSI processor capable of executing multiple simultaneous operations. Up to five Micro-AFPs and 12 MB of main store memory (MSM), along with associated control and I/O functions, are contained in the PMSP's standard ATR enclosure.
NASA Astrophysics Data System (ADS)
McKague, Matthew
2016-04-01
Self-testing allows us to determine, through classical interaction only, whether some players in a non-local game share particular quantum states. Most work on self-testing has concentrated on developing tests for small states like one pair of maximally entangled qubits, or on tests where there is a separate player for each qubit, as in a graph state. Here we consider the case of testing many maximally entangled pairs of qubits shared between two players. Previously such a test was shown where testing is sequential, i.e., one pair is tested at a time. Here we consider the parallel case where all pairs are tested simultaneously, giving considerably more power to dishonest players. We derive sufficient conditions for a self-test for many maximally entangled pairs of qubits shared between two players and also two constructions for self-tests where all pairs are tested simultaneously.
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1991-01-01
The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.
Device for balancing parallel strings
Mashikian, Matthew S.
1985-01-01
A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.
Hybrid Optimization Parallel Search PACKage
Energy Science and Technology Software Center (ESTSC)
2009-11-10
HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework providesmore » a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, a useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less
Parallel computing in enterprise modeling.
Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.
2008-08-01
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Wettability of partially suspended graphene.
Ondarçuhu, Thierry; Thomas, Vincent; Nuñez, Marc; Dujardin, Erik; Rahman, Atikur; Black, Charles T; Checco, Antonio
2016-01-01
The dependence of the wettability of graphene on the nature of the underlying substrate remains only partially understood. Here, we systematically investigate the role of liquid-substrate interactions on the wettability of graphene by varying the area fraction of suspended graphene from 0 to 95% by means of nanotextured substrates. We find that completely suspended graphene exhibits the highest water contact angle (85° ± 5°) compared to partially suspended or supported graphene, regardless of the hydrophobicity (hydrophilicity) of the substrate. Further, 80% of the long-range water-substrate interactions are screened by the graphene monolayer, the wettability of which is primarily determined by short-range graphene-liquid interactions. By its well-defined chemical and geometrical properties, supported graphene therefore provides a model system to elucidate the relative contribution of short and long range interactions to the macroscopic contact angle. PMID:27072195
Wettability of partially suspended graphene
Ondarçuhu, Thierry; Thomas, Vincent; Nuñez, Marc; Dujardin, Erik; Rahman, Atikur; Black, Charles T.; Checco, Antonio
2016-04-13
Dependence on the wettability of graphene on the nature of the underlying substrate remains only partially understood. We systematically investigate the role of liquid-substrate interactions on the wettability of graphene by varying the area fraction of suspended graphene from 0 to 95% by means of nanotextured substrates. We find that completely suspended graphene exhibits the highest water contact angle (85° ± 5°) compared to partially suspended or supported graphene, regardless of the hydrophobicity (hydrophilicity) of the substrate. Moreover, 80% of the long-range water-substrate interactions are screened by the graphene monolayer, the wettability of which is primarily determined by short-range graphene-liquidmore » interactions. By its well-defined chemical and geometrical properties, supported graphene therefore provides a model system to elucidate the relative contribution of short and long range interactions to the macroscopic contact angle.« less
Wettability of partially suspended graphene
Ondarçuhu, Thierry; Thomas, Vincent; Nuñez, Marc; Dujardin, Erik; Rahman, Atikur; Black, Charles T.; Checco, Antonio
2016-01-01
The dependence of the wettability of graphene on the nature of the underlying substrate remains only partially understood. Here, we systematically investigate the role of liquid-substrate interactions on the wettability of graphene by varying the area fraction of suspended graphene from 0 to 95% by means of nanotextured substrates. We find that completely suspended graphene exhibits the highest water contact angle (85° ± 5°) compared to partially suspended or supported graphene, regardless of the hydrophobicity (hydrophilicity) of the substrate. Further, 80% of the long-range water-substrate interactions are screened by the graphene monolayer, the wettability of which is primarily determined by short-range graphene-liquid interactions. By its well-defined chemical and geometrical properties, supported graphene therefore provides a model system to elucidate the relative contribution of short and long range interactions to the macroscopic contact angle. PMID:27072195
Partial hydatidiform mole: ultrasonographic features.
Woo, J S; Hsu, C; Fung, L L; Ma, H K
1983-05-01
Four patients with partial hyatidiform mole managed at the Queen Mary Hospital, Hong Kong, are described. The diagnosis of blighted ovum or missed abortion was made on the sonographic findings prior to suction evacuation. The dominant features in these cases consisted of a relatively large central transonic area bearing the appearance of an empty gestational sac and surrounded by a thick rim of low-level placenta-like echoes; in contrast with the case of the blighted ovum, a well-defined echogenic sac wall is absent. In another 9 patients with molar pregnancy managed during the same period, the more typical 'snow-storm' vesicular appearance was present. It was concluded that the anembryonic appearance described should alert the sonologist and clinician to the possible diagnosis of partial hydatitiform mole. The evacuated material from the uterine cavity should be examined morphologically and if possible cytogenetically. PMID:6578773
Wettability of partially suspended graphene
NASA Astrophysics Data System (ADS)
Ondarçuhu, Thierry; Thomas, Vincent; Nuñez, Marc; Dujardin, Erik; Rahman, Atikur; Black, Charles T.; Checco, Antonio
2016-04-01
The dependence of the wettability of graphene on the nature of the underlying substrate remains only partially understood. Here, we systematically investigate the role of liquid-substrate interactions on the wettability of graphene by varying the area fraction of suspended graphene from 0 to 95% by means of nanotextured substrates. We find that completely suspended graphene exhibits the highest water contact angle (85° ± 5°) compared to partially suspended or supported graphene, regardless of the hydrophobicity (hydrophilicity) of the substrate. Further, 80% of the long-range water-substrate interactions are screened by the graphene monolayer, the wettability of which is primarily determined by short-range graphene-liquid interactions. By its well-defined chemical and geometrical properties, supported graphene therefore provides a model system to elucidate the relative contribution of short and long range interactions to the macroscopic contact angle.
Partial stabilization-based guidance.
Shafiei, M H; Binazadeh, T
2012-01-01
A novel nonlinear missile guidance law against maneuvering targets is designed based on the principles of partial stability. It is demonstrated that in a real approach which is adopted with actual situations, each state of the guidance system must have a special behavior and asymptotic stability or exponential stability of all states is not realistic. Thus, a new guidance law is developed based on the partial stability theorem in such a way that the behaviors of states in the closed-loop system are in conformity with a real guidance scenario that leads to collision. The performance of the proposed guidance law in terms of interception time and control effort is compared with the sliding mode guidance law by means of numerical simulations. PMID:21963401
Integrated Task and Data Parallel Programming
NASA Technical Reports Server (NTRS)
Grimshaw, A. S.
1998-01-01
This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated
The ParaScope parallel programming environment
NASA Technical Reports Server (NTRS)
Cooper, Keith D.; Hall, Mary W.; Hood, Robert T.; Kennedy, Ken; Mckinley, Kathryn S.; Mellor-Crummey, John M.; Torczon, Linda; Warren, Scott K.
1993-01-01
The ParaScope parallel programming environment, developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by providing a variety of interactive program transformations that are effective in exposing parallelism. The debugging system detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis, program instrumentation, and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in FORTRAN D, a machine-independent parallel programming language intended for use with both distributed-memory and shared-memory parallel computers.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2014-10-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2013-10-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Preliminary results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2015-11-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.
Computer-Aided Parallelizer and Optimizer
NASA Technical Reports Server (NTRS)
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
Adaptive self-calibrating iterative GRAPPA reconstruction.
Park, Suhyung; Park, Jaeseok
2012-06-01
Parallel magnetic resonance imaging in k-space such as generalized auto-calibrating partially parallel acquisition exploits spatial correlation among neighboring signals over multiple coils in calibration to estimate missing signals in reconstruction. It is often challenging to achieve accurate calibration information due to data corruption with noises and spatially varying correlation. The purpose of this work is to address these problems simultaneously by developing a new, adaptive iterative generalized auto-calibrating partially parallel acquisition with dynamic self-calibration. With increasing iterations, under a framework of the Kalman filter spatial correlation is estimated dynamically updating calibration signals in a measurement model and using fixed-point state transition in a process model while missing signals outside the step-varying calibration region are reconstructed, leading to adaptive self-calibration and reconstruction. Noise statistic is incorporated in the Kalman filter models, yielding coil-weighted de-noising in reconstruction. Numerical and in vivo studies are performed, demonstrating that the proposed method yields highly accurate calibration and thus reduces artifacts and noises even at high acceleration. PMID:21994010
Auto-calibration of ultrasonic lubricant-film thickness measurements
NASA Astrophysics Data System (ADS)
Reddyhoff, T.; Dwyer-Joyce, R. S.; Zhang, J.; Drinkwater, B. W.
2008-04-01
The measurement of oil film thickness in a lubricated component is essential information for performance monitoring and design. It is well established that such measurements can be made ultrasonically if the lubricant film is modelled as a collection of small springs. The ultrasonic method requires that component faces are separated and a reference reflection recorded in order to obtain a reflection coefficient value from which film thickness is calculated. The novel and practically useful approach put forward in this paper and validated experimentally allows reflection coefficient measurement without the requirement for a reference. This involves simultaneously measuring the amplitude and phase of an ultrasonic pulse reflected from a layer. Provided that the acoustic properties of the substrate are known, the theoretical relationship between the two can be fitted to the data in order to yield reflection coefficient amplitude and phase for an infinitely thick layer. This is equivalent to measuring a reference signal directly, but importantly does not require the materials to be separated. The further valuable aspect of this approach, which is demonstrated experimentally, is its ability to be used as a self-calibrating routine, inherently compensating for temperature effects. This is due to the relationship between the amplitude and phase being unaffected by changes in temperature which cause unwanted changes to the incident pulse. Finally, error analysis is performed showing how the accuracy of the results can be optimized. A finding of particular significance is the strong dependence of the accuracy of the technique on the amplitude of reflection coefficient input data used. This places some limitations on the applicability of the technique.
Autocalibrating Tiled Projectors on Piecewise Smooth Vertically Extruded Surfaces.
Sajadi, Behzad; Majumder, Aditi
2011-09-01
In this paper, we present a novel technique to calibrate multiple casually aligned projectors on fiducial-free piecewise smooth vertically extruded surfaces using a single camera. Such surfaces include cylindrical displays and CAVEs, common in immersive virtual reality systems. We impose two priors to the display surface. We assume the surface is a piecewise smooth vertically extruded surface for which the aspect ratio of the rectangle formed by the four corners of the surface is known and the boundary is visible and segmentable. Using these priors, we can estimate the display's 3D geometry and camera extrinsic parameters using a nonlinear optimization technique from a single image without any explicit display to camera correspondences. Using the estimated camera and display properties, the intrinsic and extrinsic parameters of each projector are recovered using a single projected pattern seen by the camera. This in turn is used to register the images on the display from any arbitrary viewpoint making it appropriate for virtual reality systems. The fast convergence and robustness of this method is achieved via a novel dimension reduction technique for camera parameter estimation and a novel deterministic technique for projector property estimation. This simplicity, efficiency, and robustness of our method enable several coveted features for nonplanar projection-based displays. First, it allows fast recalibration in the face of projector, display or camera movements and even change in display shape. Second, this opens up, for the first time, the possibility of allowing multiple projectors to overlap on the corners of the CAVE-a popular immersive VR display system. Finally, this opens up the possibility of easily deploying multiprojector displays on aesthetic novel shapes for edutainment and digital signage applications. PMID:21301026
How to: applying and interpreting the SWAT autocalibration tools
Technology Transfer Automated Retrieval System (TEKTRAN)
Watershed-level modelers have expressed a need, through ongoing discussions within the USDA-ARS Conservation Effects Assessment Program and the broader international research community, for a better understanding of uncertainty related to hard-to-measure input parameters and to the remaining interna...
A generic fine-grained parallel C
NASA Technical Reports Server (NTRS)
Hamet, L.; Dorband, John E.
1988-01-01
With the present availability of parallel processors of vastly different architectures, there is a need for a common language interface to multiple types of machines. The parallel C compiler, currently under development, is intended to be such a language. This language is based on the belief that an algorithm designed around fine-grained parallelism can be mapped relatively easily to different parallel architectures, since a large percentage of the parallelism has been identified. The compiler generates a FORTH-like machine-independent intermediate code. A machine-dependent translator will reside on each machine to generate the appropriate executable code, taking advantage of the particular architectures. The goal of this project is to allow a user to run the same program on such machines as the Massively Parallel Processor, the CRAY, the Connection Machine, and the CYBER 205 as well as serial machines such as VAXes, Macintoshes and Sun workstations.
Parallel automated adaptive procedures for unstructured meshes
NASA Technical Reports Server (NTRS)
Shephard, M. S.; Flaherty, J. E.; Decougny, H. L.; Ozturan, C.; Bottasso, C. L.; Beall, M. W.
1995-01-01
Consideration is given to the techniques required to support adaptive analysis of automatically generated unstructured meshes on distributed memory MIMD parallel computers. The key areas of new development are focused on the support of effective parallel computations when the structure of the numerical discretization, the mesh, is evolving, and in fact constructed, during the computation. All the procedures presented operate in parallel on already distributed mesh information. Starting from a mesh definition in terms of a topological hierarchy, techniques to support the distribution, redistribution and communication among the mesh entities over the processors is given, and algorithms to dynamically balance processor workload based on the migration of mesh entities are given. A procedure to automatically generate meshes in parallel, starting from CAD geometric models, is given. Parallel procedures to enrich the mesh through local mesh modifications are also given. Finally, the combination of these techniques to produce a parallel automated finite element analysis procedure for rotorcraft aerodynamics calculations is discussed and demonstrated.
Design considerations for parallel graphics libraries
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Linearly exact parallel closures for slab geometry
Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun
2013-08-15
Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients)
Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Runtime volume visualization for parallel CFD
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu
1995-01-01
This paper discusses some aspects of design of a data distributed, massively parallel volume rendering library for runtime visualization of parallel computational fluid dynamics simulations in a message-passing environment. Unlike the traditional scheme in which visualization is a postprocessing step, the rendering is done in place on each node processor. Computational scientists who run large-scale simulations on a massively parallel computer can thus perform interactive monitoring of their simulations. The current library provides an interface to handle volume data on rectilinear grids. The same design principles can be generalized to handle other types of grids. For demonstration, we run a parallel Navier-Stokes solver making use of this rendering library on the Intel Paragon XP/S. The interactive visual response achieved is found to be very useful. Performance studies show that the parallel rendering process is scalable with the size of the simulation as well as with the parallel computer.
Linearly exact parallel closures for slab geometry
NASA Astrophysics Data System (ADS)
Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun
2013-08-01
Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).
Algorithmically Specialized Parallel Architecture For Robotics
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1991-01-01
Computing system called Robot Mathematics Processor (RMP) contains large number of processor elements (PE's) connected in various parallel and serial combinations reconfigurable via software. Special-purpose architecture designed for solving diverse computational problems in robot control, simulation, trajectory generation, workspace analysis, and like. System an MIMD-SIMD parallel architecture capable of exploiting parallelism in different forms and at several computational levels. Major advantage lies in design of cells, which provides flexibility and reconfigurability superior to previous SIMD processors.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.
Six-Degree-Of-Freedom Parallel Minimanipulator
NASA Technical Reports Server (NTRS)
Tahmasebi, Farhad; Tsai, Lung-Wen
1994-01-01
Six-degree-of-freedom parallel minimanipulator stiffer and simpler than earlier six-degree-of-freedom manipulators. Includes only three inextensible limbs with universal joints at ends. Limbs have equal lengths and act in parallel as they share load on manipulated platform. Designed to provide high resolution and high stiffness for fine control of position and force in hybrid serial/parallel-manipulator system.
Running Geant on T. Node parallel computer
Jejcic, A.; Maillard, J.; Silva, J. ); Mignot, B. )
1990-08-01
AnInmos transputer-based computer has been utilized to overcome the difficulties due to the limitations on the processing abilities of event parallelism and multiprocessor farms (i.e., the so called bus-crisis) and the concern regarding the growing sizes of databases typical in High Energy Physics. This study was done on the T.Node parallel computer manufactured by TELMAT. Detailed figures are reported concerning the event parallelization. (AIP)
Racing in parallel: Quantum versus Classical
NASA Astrophysics Data System (ADS)
Steiger, Damian S.; Troyer, Matthias
In a fair comparison of the performance of a quantum algorithm to a classical one it is important to treat them on equal footing, both regarding resource usage and parallelism. We show how one may otherwise mistakenly attribute speedup due to parallelism as quantum speedup. We apply such an analysis both to analog quantum devices (quantum annealers) and gate model algorithms and give several examples where a careful analysis of parallelism makes a significant difference in the comparison between classical and quantum algorithms.
A parallel algorithm for global routing
NASA Technical Reports Server (NTRS)
Brouwer, Randall J.; Banerjee, Prithviraj
1990-01-01
A Parallel Hierarchical algorithm for Global Routing (PHIGURE) is presented. The router is based on the work of Burstein and Pelavin, but has many extensions for general global routing and parallel execution. Main features of the algorithm include structured hierarchical decomposition into separate independent tasks which are suitable for parallel execution and adaptive simplex solution for adding feedthroughs and adjusting channel heights for row-based layout. Alternative decomposition methods and the various levels of parallelism available in the algorithm are examined closely. The algorithm is described and results are presented for a shared-memory multiprocessor implementation.
Parallel execution and scriptability in micromagnetic simulations
NASA Astrophysics Data System (ADS)
Fischbacher, Thomas; Franchin, Matteo; Bordignon, Giuliano; Knittel, Andreas; Fangohr, Hans
2009-04-01
We demonstrate the feasibility of an "encapsulated parallelism" approach toward micromagnetic simulations that combines offering a high degree of flexibility to the user with the efficient utilization of parallel computing resources. While parallelization is obviously desirable to address the high numerical effort required for realistic micromagnetic simulations through utilizing now widely available multiprocessor systems (including desktop multicore CPUs and computing clusters), conventional approaches toward parallelization impose strong restrictions on the structure of programs: numerical operations have to be executed across all processors in a synchronized fashion. This means that from the user's perspective, either the structure of the entire simulation is rigidly defined from the beginning and cannot be adjusted easily, or making modifications to the computation sequence requires advanced knowledge in parallel programming. We explain how this dilemma is resolved in the NMAG simulation package in such a way that the user can utilize without any additional effort on his side both the computational power of multiple CPUs and the flexibility to tailor execution sequences for specific problems: simulation scripts written for single-processor machines can just as well be executed on parallel machines and behave in precisely the same way, up to increased speed. We provide a simple instructive magnetic resonance simulation example that demonstrates utilizing both custom execution sequences and parallelism at the same time. Furthermore, we show that this strategy of encapsulating parallelism even allows to benefit from speed gains through parallel execution in simulations controlled by interactive commands given at a command line interface.
MMS Observations of Parallel Electric Fields
NASA Astrophysics Data System (ADS)
Ergun, R.; Goodrich, K.; Wilder, F. D.; Sturner, A. P.; Holmes, J.; Stawarz, J. E.; Malaspina, D.; Usanova, M.; Torbert, R. B.; Lindqvist, P. A.; Khotyaintsev, Y. V.; Burch, J. L.; Strangeway, R. J.; Russell, C. T.; Pollock, C. J.; Giles, B. L.; Hesse, M.; Goldman, M. V.; Drake, J. F.; Phan, T.; Nakamura, R.
2015-12-01
Parallel electric fields are a necessary condition for magnetic reconnection with non-zero guide field and are ultimately accountable for topological reconfiguration of a magnetic field. Parallel electric fields also play a strong role in charged particle acceleration and turbulence. The Magnetospheric Multiscale (MMS) mission targets these three universal plasma processes. The MMS satellites have an accurate three-dimensional electric field measurement, which can identify parallel electric fields as low as 1 mV/m at four adjacent locations. We present preliminary observations of parallel electric fields from MMS and provide an early interpretation of their impact on magnetic reconnection, in particular, where the topological change occurs. We also examine the role of parallel electric fields in particle acceleration. Direct particle acceleration by parallel electric fields is well established in the auroral region. Observations of double layers in by the Van Allan Probes suggest that acceleration by parallel electric fields may be significant in energizing some populations of the radiation belts. THEMIS observations also indicate that some of the largest parallel electric fields are found in regions of strong field-aligned currents associated with turbulence, suggesting a highly non-linear dissipation mechanism. We discuss how the MMS observations extend our understanding of the role of parallel electric fields in some of the most critical processes in the magnetosphere.
A Parallel Multigrid Method for Neutronics Applications
Alcouffe, Raymond E.
2001-01-01
The multigrid method has been shown to be the most effective general method for solving the multi-dimensional diffusion equation encountered in neutronics. This being the method of choice, we develop a strategy for implementing the multigrid method on computers of massively parallel architecture. This leads us to strategies for parallelizing the relaxation, contraction (interpolation), and prolongation operators involved in the method. We then compare the efficiency of our parallel multigrid with other parallel methods for solving the diffusion equation on selected problems encountered in reactor physics.
Conformal pure radiation with parallel rays
NASA Astrophysics Data System (ADS)
Leistner, Thomas; Nurowski, Paweł
2012-03-01
We define pure radiation metrics with parallel rays to be n-dimensional pseudo-Riemannian metrics that admit a parallel null line bundle K and whose Ricci tensor vanishes on vectors that are orthogonal to K. We give necessary conditions in terms of the Weyl, Cotton and Bach tensors for a pseudo-Riemannian metric to be conformal to a pure radiation metric with parallel rays. Then, we derive conditions in terms of the tractor calculus that are equivalent to the existence of a pure radiation metric with parallel rays in a conformal class. We also give analogous results for n-dimensional pseudo-Riemannian pp-waves.
Parallel auto-correlative statistics with VTK.
Pebay, Philippe Pierre; Bennett, Janine Camille
2013-08-01
This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Remarks on parallel computations in MATLAB environment
NASA Astrophysics Data System (ADS)
Opalska, Katarzyna; Opalski, Leszek
2013-10-01
The paper attempts to summarize author's investigation of parallel computation capability of MATLAB environment in solving large ordinary differential equations (ODEs). Two MATLAB versions were tested and two parallelization techniques: one used multiple processors-cores, the other - CUDA compatible Graphics Processing Units (GPUs). A set of parameterized test problems was specially designed to expose different capabilities/limitations of the different variants of the parallel computation environment tested. Presented results illustrate clearly the superiority of the newer MATLAB version and, elapsed time advantage of GPU-parallelized computations for large dimensionality problems over the multiple processor-cores (with speed-up factor strongly dependent on the problem structure).
A scalable 2-D parallel sparse solver
Kothari, S.C.; Mitra, S.
1995-12-01
Scalability beyond a small number of processors, typically 32 or less, is known to be a problem for existing parallel general sparse (PGS) direct solvers. This paper presents a parallel general sparse PGS direct solver for general sparse linear systems on distributed memory machines. The algorithm is based on the well-known sequential sparse algorithm Y12M. To achieve efficient parallelization, a 2-D scattered decomposition of the sparse matrix is used. The proposed algorithm is more scalable than existing parallel sparse direct solvers. Its scalability is evaluated on a 256 processor nCUBE2s machine using Boeing/Harwell benchmark matrices.
Parallel computations and control of adaptive structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)
1991-01-01
The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Applications of Parallel Processing to Astrodynamics
NASA Astrophysics Data System (ADS)
Coffey, S.; Healy, L.; Neal, H.
1996-03-01
Parallel processing is being used to improve the catalog of earth orbiting satellites and for problems associated with the catalog. Initial efforts centered around using SIMD parallel processors to perform debris conjunction analysis and satellite dynamics studies. More recently, the availability of cheap supercomputing processors and parallel processing software such as PVM have enabled the reutilization of existing astrodynamics software in distributed parallel processing environments, Computations once taking many days with traditional mainframes are now being performed in only a few hours. Efforts underway for the US Naval Space Command include conjunction prediction, uncorrelated target processing and a new space object catalog based on orbit determination and prediction with special perturbations methods.
Parallel pivoting combined with parallel reduction and fill-in control
NASA Technical Reports Server (NTRS)
Alaghband, Gita
1989-01-01
Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generation of fill-ins and check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel pivoting technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.
Matsuda, Yoshimasa; Utsuzawa, Shin; Kurimoto, Takeaki; Haishi, Tomoyuki; Yamazaki, Yukako; Kose, Katsumi; Anno, Izumi; Marutani, Mitsuhiro
2003-07-01
A super-parallel MR microscope in which multiple (up to 100) samples can be imaged simultaneously at high spatial resolution is described. The system consists of a multichannel transmitter-receiver system and a gradient probe array housed in a large-bore magnet. An eight-channel MR microscope was constructed for verification of the system concept, and a four-channel MR microscope was constructed for a practical application. Eight chemically fixed mouse fetuses were simultaneously imaged at the 200 micro m(3) voxel resolution in a 1.5 T superconducting magnet of a whole-body MRI, and four chemically fixed human embryos were simultaneously imaged at 120 micro m(3) voxel resolution in a 2.35 T superconducting magnet. Although the spatial resolutions achieved were not strictly those of MR microscopy, the system design proposed here can be used to attain a much higher spatial resolution imaging of multiple samples, because higher magnetic field gradients can be generated at multiple positions in a homogeneous magnetic field. PMID:12815693
Parallel processing in immune networks
NASA Astrophysics Data System (ADS)
Agliari, Elena; Barra, Adriano; Bartolucci, Silvia; Galluzzi, Andrea; Guerra, Francesco; Moauro, Francesco
2013-04-01
In this work, we adopt a statistical-mechanics approach to investigate basic, systemic features exhibited by adaptive immune systems. The lymphocyte network made by B cells and T cells is modeled by a bipartite spin glass, where, following biological prescriptions, links connecting B cells and T cells are sparse. Interestingly, the dilution performed on links is shown to make the system able to orchestrate parallel strategies to fight several pathogens at the same time; this multitasking capability constitutes a remarkable, key property of immune systems as multiple antigens are always present within the host. We also define the stochastic process ruling the temporal evolution of lymphocyte activity and show its relaxation toward an equilibrium measure allowing statistical-mechanics investigations. Analytical results are compared with Monte Carlo simulations and signal-to-noise outcomes showing overall excellent agreement. Finally, within our model, a rationale for the experimentally well-evidenced correlation between lymphocytosis and autoimmunity is achieved; this sheds further light on the systemic features exhibited by immune networks.
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1995-01-01
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
Vectoring of parallel synthetic jets
NASA Astrophysics Data System (ADS)
Berk, Tim; Ganapathisubramani, Bharathram; Gomit, Guillaume
2015-11-01
A pair of parallel synthetic jets can be vectored by applying a phase difference between the two driving signals. The resulting jet can be merged or bifurcated and either vectored towards the actuator leading in phase or the actuator lagging in phase. In the present study, the influence of phase difference and Strouhal number on the vectoring behaviour is examined experimentally. Phase-locked vorticity fields, measured using Particle Image Velocimetry (PIV), are used to track vortex pairs. The physical mechanisms that explain the diversity in vectoring behaviour are observed based on the vortex trajectories. For a fixed phase difference, the vectoring behaviour is shown to be primarily influenced by pinch-off time of vortex rings generated by the synthetic jets. Beyond a certain formation number, the pinch-off timescale becomes invariant. In this region, the vectoring behaviour is determined by the distance between subsequent vortex rings. We acknowledge the financial support from the European Research Council (ERC grant agreement no. 277472).
Positive partial transpose from spectra
Hildebrand, Roland
2007-11-15
In this paper we solve the following problem. Let H{sub nm} be a Hilbert space of dimension nm, and let A be a positive semidefinite self-adjoint linear operator on H{sub nm}. Under which conditions on the spectrum has A a positive partial transpose (is PPT) with respect to any partition H{sub n} x H{sub m} of the space H{sub nm} as a tensor product of an n-dimensional and an m-dimensional Hilbert space? We show that the necessary and sufficient conditions can be expressed as a set of linear matrix inequalities (LMIs) on the eigenvalues of A.
Partial coalescence of soap bubbles
NASA Astrophysics Data System (ADS)
Harris, Daniel M.; Pucci, Giuseppe; Bush, John W. M.
2015-11-01
We present the results of an experimental investigation of the merger of a soap bubble with a planar soap film. When gently deposited onto a horizontal film, a bubble may interact with the underlying film in such a way as to decrease in size, leaving behind a smaller daughter bubble with approximately half the radius of its progenitor. The process repeats up to three times, with each partial coalescence event occurring over a time scale comparable to the inertial-capillary time. Our results are compared to the recent numerical simulations of Martin and Blanchette and to the coalescence cascade of droplets on a fluid bath.
NASA Astrophysics Data System (ADS)
Herrera, I.; Herrera, G. S.
2015-12-01
Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
A partially duplicated discoid lateral meniscus.
Kim, S J; Lee, Y T; Choi, C H; Kim, D W
1998-01-01
Partially duplicated discoid lateral meniscus has not been previously reported. We present a case of a partially duplicated discoid lateral meniscus with a peripheral tear of the meniscus and a concomitant cartilage lesion of the lateral femoral condyle. PMID:9681547
The language parallel Pascal and other aspects of the massively parallel processor
NASA Technical Reports Server (NTRS)
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Parallel-End-Point Drafting Compass
NASA Technical Reports Server (NTRS)
Cronander, J.
1986-01-01
Parallelogram linkage ensures greater accuracy in drafting and scribing. Two members of arm of compass remain parallel for all angles pair makes with hub axis. They maintain opposing end members in parallelism. Parallelogram-linkage principle used on dividers as well as on compasses.
EPIC: E-field Parallel Imaging Correlator
NASA Astrophysics Data System (ADS)
Thyagarajan, Nithyanandan; Beardsley, Adam P.; Bowman, Judd D.; Morales, Miguel F.
2015-11-01
E-field Parallel Imaging Correlator (EPIC), a highly parallelized Object Oriented Python package, implements the Modular Optimal Frequency Fourier (MOFF) imaging technique. It also includes visibility-based imaging using the software holography technique and a simulator for generating electric fields from a sky model. EPIC can accept dual-polarization inputs and produce images of all four instrumental cross-polarizations.
Parallel Computing Strategies for Irregular Algorithms
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
MULTIOBJECTIVE PARALLEL GENETIC ALGORITHM FOR WASTE MINIMIZATION
In this research we have developed an efficient multiobjective parallel genetic algorithm (MOPGA) for waste minimization problems. This MOPGA integrates PGAPack (Levine, 1996) and NSGA-II (Deb, 2000) with novel modifications. PGAPack is a master-slave parallel implementation of a...
Bayer image parallel decoding based on GPU
NASA Astrophysics Data System (ADS)
Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua
2012-11-01
In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
National Combustion Code: Parallel Implementation and Performance
NASA Technical Reports Server (NTRS)
Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.
2000-01-01
The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.
Parallel language support on shared memory multiprocessors
Sah, A.
1991-01-01
The study of general purpose parallel computing requires efficient and inexpensive platforms for parallel program execution. This helps in ascertaining tradeoff choices between hardware complexity and software solutions for massively parallel systems design. In this paper, the authors present an implementation of an efficient parallel execution model on shared memory multiprocessors based on a Threaded Abstract Machine. The authors discuss a k-way generalized locking strategy suitable for our model. The authors study the performance gains obtained by a queuing strategy which uses multiple gueues with reduced access contention. The authors also present performance models in shared memory machines, related to lock contention and serialization in shared memory allocation. A bin-based memory management technique which reduces the serialization is presented. These issues are critical for obtaining an efficient parallel execution environment.
A parallel variable metric optimization algorithm
NASA Technical Reports Server (NTRS)
Straeter, T. A.
1973-01-01
An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.
Differences Between Distributed and Parallel Systems
Brightwell, R.; Maccabe, A.B.; Rissen, R.
1998-10-01
Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.
Parallel Algebraic Multigrid Methods - High Performance Preconditioners
Yang, U M
2004-11-11
The development of high performance, massively parallel computers and the increasing demands of computationally challenging applications have necessitated the development of scalable solvers and preconditioners. One of the most effective ways to achieve scalability is the use of multigrid or multilevel techniques. Algebraic multigrid (AMG) is a very efficient algorithm for solving large problems on unstructured grids. While much of it can be parallelized in a straightforward way, some components of the classical algorithm, particularly the coarsening process and some of the most efficient smoothers, are highly sequential, and require new parallel approaches. This chapter presents the basic principles of AMG and gives an overview of various parallel implementations of AMG, including descriptions of parallel coarsening schemes and smoothers, some numerical results as well as references to existing software packages.
Configuration space representation in parallel coordinates
NASA Technical Reports Server (NTRS)
Fiorini, Paolo; Inselberg, Alfred
1989-01-01
By means of a system of parallel coordinates, a nonprojective mapping from R exp N to R squared is obtained for any positive integer N. In this way multivariate data and relations can be represented in the Euclidean plane (embedded in the projective plane). Basically, R squared with Cartesian coordinates is augmented by N parallel axes, one for each variable. The N joint variables of a robotic device can be represented graphically by using parallel coordinates. It is pointed out that some properties of the relation are better perceived visually from the parallel coordinate representation, and that new algorithms and data structures can be obtained from this representation. The main features of parallel coordinates are described, and an example is presented of their use for configuration space representation of a mechanical arm (where Cartesian coordinates cannot be used).
A paradigm for parallel unstructured grid generation
Gaither, A.; Marcum, D.; Reese, D.
1996-12-31
In this paper, a sequential 2D unstructured grid generator based on iterative point insertion and local reconnection is coupled with a Delauney tessellation domain decomposition scheme to create a scalable parallel unstructured grid generator. The Message Passing Interface (MPI) is used for distributed communication in the parallel grid generator. This work attempts to provide a generic framework to enable the parallelization of fast sequential unstructured grid generators in order to compute grand-challenge scale grids for Computational Field Simulation (CFS). Motivation for moving from sequential to scalable parallel grid generation is presented. Delaunay tessellation and iterative point insertion and local reconnection (advancing front method only) unstructured grid generation techniques are discussed with emphasis on how these techniques can be utilized for parallel unstructured grid generation. Domain decomposition techniques are discussed for both Delauney and advancing front unstructured grid generation with emphasis placed on the differences needed for both grid quality and algorithmic efficiency.
Broadcasting a message in a parallel computer
Berg, Jeremy E.; Faraj, Ahmad A.
2011-08-02
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Randomized parallel speedups for list ranking
Vishkin, U.
1987-06-01
The following problem is considered: given a linked list of length n, compute the distance of each element of the linked list from the end of the list. The problem has two standard deterministic algorithms: a linear time serial algorithm, and an O(n log n)/ rho + log n) time parallel algorithm using rho processors. The authors present a randomized parallel algorithm for the problem. The algorithm is designed for an exclusive-read exclusive-write parallel random access machine (EREW PRAM). It runs almost surely in time O(n/rho + log n log* n) using rho processors. Using a recently published parallel prefix sums algorithm the list-ranking algorithm can be adapted to run on a concurrent-read concurrent-write parallel random access machine (CRCW PRAM) almost surely in time O(n/rho + log n) using rho processors.
Implementation and performance of parallel Prolog interpreter
Wei, S.; Kale, L.V.; Balkrishna, R. . Dept. of Computer Science)
1988-01-01
In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
Parallel fast Fourier transforms for non power of two data
Semeraro, B.D.
1994-09-01
This report deals with parallel algorithms for computing discrete Fourier transforms of real sequences of length N not equal to a power of two. The method described is an extension of existing power of two transforms to sequences with N a product of small primes. In particular, this implementation requires N = 2{sup p}3{sup q}5{sup r}. The communication required is the same as for a transform of length N = 2{sup p}. The algorithm presented is intended for use in the solution of partial differential equations, or in any situation in which a large number of forward and backward transforms must be performed and in which the Fourier Coefficients need not be ordered. This implementation is a one dimensional FFT but the techniques are applicable to multidimensional transforms as well. The algorithm has been implemented on a 128 node Intel Ipsc/860.
Parallel preconditioning for the solution of nonsymmetric banded linear systems
Amodio, P.; Mazzia, F.
1994-12-31
Many computational techniques require the solution of banded linear systems. Common examples derive from the solution of partial differential equations and of boundary value problems. In particular the authors are interested in the parallel solution of block Hessemberg linear systems Gx = f, arising from the solution of ordinary differential equations by means of boundary value methods (BVMs), even if the considered preconditioning may be applied to any block banded linear system. BVMs have been extensively investigated in the last few years and their stability properties give promising results. A new class of BVMs called Reverse Adams, which are BV-A-stable for orders up to 6, and BV-A{sub 0}-stable for orders up to 9, have been studied.
Decoding of QOSTBC concatenates RS code using parallel interference cancellation
NASA Astrophysics Data System (ADS)
Yan, Zhenghang; Lu, Yilong; Ma, Maode; Yang, Yuhang
2010-02-01
Comparing with orthogonal space time block code (OSTBC), quasi orthogonal space time block code (QOSTBC) can achieve high transmission rate with partial diversity. In this paper, we present a QOSTBC concatenated Reed-Solomon (RS) error correction code structure. At the receiver, pairwise detection and error correction are first implemented. The decoded data are regrouped. Parallel interference cancellation (PIC) and dual orthogonal space time block code (OSTBC) maximum likelihood decoding are deployed to the regrouped data. The pure concatenated scheme is shown to have higher diversity order and have better error performance at high signal-to-noise ratio (SNR) scenario than both QOSTBC and OSTBC schemes. The PIC and dual OSTBC decoding algorithm can further obtain more than 1.3 dB gains than the pure concatenated scheme at 10-6 bit error probability.
A parallel multigrid method for data-driven multiprocessor systems
Lin, C.H.; Gaudiot, J.L.; Proskurowski, W.
1989-12-31
The multigrid algorithm (MG) is recognized as an efficient and rapidly converging method to solve a wide family of partial differential equations (PDE). When this method is implemented on a multiprocessor system, its major drawback is the low utilization of processors. Due to the sequentiality of the standard algorithm, the fine grid levels cannot start relaxation until the coarse grid levels complete their own relaxation. Indeed, of all processors active on the fine two dimensional grid level only one fourth will be active at the coarse grid level, leaving full 75% idle. In this paper, a novel parallel V-cycle multigrid (PVM) algorithm is proposed to cure the idle processors` problem. Highly programmable systems such as data-flow architectures are then applied to support this new algorithm. The experiments based on the proposed architecture show that the convergence rate of the new algorithm is about twice faster than that of the standard method and twice as efficient system utilization is achieved.
ALPS: A framework for parallel adaptive PDE solution
NASA Astrophysics Data System (ADS)
Burstedde, Carsten; Burtscher, Martin; Ghattas, Omar; Stadler, Georg; Tu, Tiankai; Wilcox, Lucas C.
2009-07-01
Adaptive mesh refinement and coarsening (AMR) is essential for the numerical solution of partial differential equations (PDEs) that exhibit behavior over a wide range of length and time scales. Because of the complex dynamic data structures and communication patterns and frequent data exchange and redistribution, scaling dynamic AMR to tens of thousands of processors has long been considered a challenge. We are developing ALPS, a library for dynamic mesh adaptation of PDEs that is designed to scale to hundreds of thousands of compute cores. Our approach uses parallel forest-of-octree-based hexahedral finite element meshes and dynamic load balancing based on space-filling curves. ALPS supports arbitrary-order accurate continuous and discontinuous finite element/spectral element discretizations on general geometries. We present scalability and performance results for two applications from geophysics: seismic wave propagation and mantle convection.
Beard, R.A.
1990-03-01
The purpose of this thesis is to explore the methods used to parallelize NP-complete problems and the degree of improvement that can be realized using a distributed parallel processor to solve these combinatoric problems. Common NP-complete problem characteristics such as a priori reductions, use of partial-state information, and inhomogeneous searches are identified and studied. The set covering problem (SCP) is implemented for this research because many applications such as information retrieval, task scheduling, and VLSI expression simplification can be structured as an SCP problem. In addition, its generic NP-complete common characteristics are well documented and a parallel implementation has not been reported. Parallel programming design techniques involve decomposing the problem and developing the parallel algorithms. The major components of a parallel solution are developed in a four phase process. First, a meta-level design is accomplished using an appropriate design language such as UNITY. Then, the UNITY design is transformed into an algorithm and implementation specific to a distributed architecture. Finally, a complexity analysis of the algorithm is performed. the a priori reductions are divided-and-conquer algorithms; whereas, the search for the optimal set cover is accomplished with a branch-and-bound algorithm. The search utilizes a global best cost maintained at a central location for distribution to all processors. Three methods of load balancing are implemented and studied: coarse grain with static allocation of the search space, fine grain with dynamic allocation, and dynamic load balancing.
Partial coalescence of soap bubbles
NASA Astrophysics Data System (ADS)
Pucci, G.; Harris, D. M.; Bush, J. W. M.
2015-06-01
We present the results of an experimental investigation of the merger of a soap bubble with a planar soap film. When gently deposited onto a horizontal film, a bubble may interact with the underlying film in such a way as to decrease in size, leaving behind a smaller daughter bubble with approximately half the radius of its progenitor. The process repeats up to three times, with each partial coalescence event occurring over a time scale comparable to the inertial-capillary time. Our results are compared to the recent numerical simulations of Martin and Blanchette ["Simulations of surfactant effects on the dynamics of coalescing drops and bubbles," Phys. Fluids 27, 012103 (2015)] and to the coalescence cascade of droplets on a fluid bath.
Witte H.; Plate, S
2013-05-03
The international Muon Ionization Cooling Experiment (MICE) is a large scale experiment which is presently assembled at the Rutherford Appleton Laboratory in Didcot, UK. The purpose of MICE is to demonstrate the concept of ionization cooling experimentally. Ionization cooling is an important accelerator concept which will be essential for future HEP experiments such as a potential Muon Collider or a Neutrino Factory. The MICE experiment will house up to 18 superconducting solenoids, all of which produce a substantial amount of magnetic flux. Recently it was realized that this magnetic flux leads to a considerable stray magnetic field in the MICE hall. This is a concern as technical equipment in the MICE hall may may be compromised by this. In July 2012 a concept called partial return yoke was presented to the MICE community, which reduces the stray field in the MICE hall to a safe level. This report summarizes the general concept, engineering considerations and the expected shielding performance.
Partially coherent lensfree tomographic microscopy⋄
Isikman, Serhan O.; Bishara, Waheb; Ozcan, Aydogan
2012-01-01
Optical sectioning of biological specimens provides detailed volumetric information regarding their internal structure. To provide a complementary approach to existing three-dimensional (3D) microscopy modalities, we have recently demonstrated lensfree optical tomography that offers high-throughput imaging within a compact and simple platform. In this approach, in-line holograms of objects at different angles of partially coherent illumination are recorded using a digital sensor-array, which enables computing pixel super-resolved tomographic images of the specimen. This imaging modality, which forms the focus of this review, offers micrometer-scale 3D resolution over large imaging volumes of, for example, 10–15 mm3, and can be assembled in light weight and compact architectures. Therefore, lensfree optical tomography might be particularly useful for lab-on-a-chip applications as well as for microscopy needs in resource-limited settings. PMID:22193016
Gooding, Thomas Michael; McCarthy, Patrick Joseph
2010-03-02
A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.
Modulated heat pulse propagation and partial transport barriers in chaotic magnetic fields
del-Castillo-Negrete, Diego; Blazevski, Daniel
2016-04-01
Direct numerical simulations of the time dependent parallel heat transport equation modeling heat pulses driven by power modulation in 3-dimensional chaotic magnetic fields are presented. The numerical method is based on the Fourier formulation of a Lagrangian-Green's function method that provides an accurate and efficient technique for the solution of the parallel heat transport equation in the presence of harmonic power modulation. The numerical results presented provide conclusive evidence that even in the absence of magnetic flux surfaces, chaotic magnetic field configurations with intermediate levels of stochasticity exhibit transport barriers to modulated heat pulse propagation. In particular, high-order islands and remnants of destroyed flux surfaces (Cantori) act as partial barriers that slow down or even stop the propagation of heat waves at places where the magnetic field connection length exhibits a strong gradient. The key parameter ismore » $$\\gamma=\\sqrt{\\omega/2 \\chi_\\parallel}$$ that determines the length scale, $$1/\\gamma$$, of the heat wave penetration along the magnetic field line. For large perturbation frequencies, $$\\omega \\gg 1$$, or small parallel thermal conductivities, $$\\chi_\\parallel \\ll 1$$, parallel heat transport is strongly damped and the magnetic field partial barriers act as robust barriers where the heat wave amplitude vanishes and its phase speed slows down to a halt. On the other hand, in the limit of small $$\\gamma$$, parallel heat transport is largely unimpeded, global transport is observed and the radial amplitude and phase speed of the heat wave remain finite. Results on modulated heat pulse propagation in fully stochastic fields and across magnetic islands are also presented. In qualitative agreement with recent experiments in LHD and DIII-D, it is shown that the elliptic (O) and hyperbolic (X) points of magnetic islands have a direct impact on the spatio-temporal dependence of the amplitude and the time delay
Goulet, T.; Keszei, E.; Jay-Gerin, J.
1988-03-15
We present a three-dimensional probabilistic model of particle transport in a medium where the particles suffer quasielastic collisions. The model accounts for bulk and surface scattering, as well as partial reflections at the boundaries of the medium. We give analytical and numerical methods for the evaluation of the particle transmission probability in the case of a medium with a plane-parallel geometry. The influence of the various parameters of the model on this probability is also discussed.
The delayed coupling method: An algorithm for solving banded diagonal matrix problems in parallel
Mattor, N.; Williams, T.J.; Hewett, D.W.; Dimits, A.M.
1997-09-01
We present a new algorithm for solving banded diagonal matrix problems efficiently on distributed-memory parallel computers, designed originally for use in dynamic alternating-direction implicit partial differential equation solvers. The algorithm optimizes efficiency with respect to the number of numerical operations and to the amount of interprocessor communication. This is called the ``delayed coupling method`` because the communication is deferred until needed. We focus here on tridiagonal and periodic tridiagonal systems.
Parallelism extraction and program restructuring for parallel simulation of digital systems
Vellandi, B.L.
1990-01-01
Two topics currently of interest to the computer aided design (CADF) for the very-large-scale integrated circuit (VLSI) community are using the VHSIC Hardware Description Language (VHDL) effectively and decreasing simulation times of VLSI designs through parallel execution of the simulator. The goal of this research is to increase the degree of parallelism obtainable in VHDL simulation, and consequently to decrease simulation times. The research targets simulation on massively parallel architectures. Experimentation and instrumentation were done on the SIMD Connection Machine. The author discusses her method used to extract parallelism and restructure a VHDL program, experimental results using this method, and requirements for a parallel architecture for fast simulation.
Super-resolved parallel MRI by spatiotemporal encoding.
Schmidt, Rita; Baishya, Bikash; Ben-Eliezer, Noam; Seginer, Amir; Frydman, Lucio
2014-01-01
Recent studies described an "ultrafast" scanning method based on spatiotemporal (SPEN) principles. SPEN demonstrates numerous potential advantages over EPI-based alternatives, at no additional expense in experimental complexity. An important aspect that SPEN still needs to achieve for providing a competitive ultrafast MRI acquisition alternative, entails exploiting parallel imaging algorithms without compromising its proven capabilities. The present work introduces a combination of multi-band frequency-swept pulses simultaneously encoding multiple, partial fields-of-view, together with a new algorithm merging a Super-Resolved SPEN image reconstruction and SENSE multiple-receiving methods. This approach enables one to reduce both the excitation and acquisition times of sub-second SPEN acquisitions by the customary acceleration factor R, without compromises in either the method's spatial resolution, SAR deposition, or capability to operate in multi-slice mode. The performance of these new single-shot imaging sequences and their ancillary algorithms were explored and corroborated on phantoms and human volunteers at 3 T. The gains of the parallelized approach were particularly evident when dealing with heterogeneous systems subject to major T2/T2* effects, as is the case upon single-scan imaging near tissue/air interfaces. PMID:24120293
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU
Xia, Yong; Wang, Kuanquan; Zhang, Henggui
2015-01-01
Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations. PMID:26581957
On combining computational differentiation and toolkits for parallel scientific computing.
Bischof, C. H.; Buecker, H. M.; Hovland, P. D.
2000-06-08
Automatic differentiation is a powerful technique for evaluating derivatives of functions given in the form of a high-level programming language such as Fortran, C, or C++. The program is treated as a potentially very long sequence of elementary statements to which the chain rule of differential calculus is applied over and over again. Combining automatic differentiation and the organizational structure of toolkits for parallel scientific computing provides a mechanism for evaluating derivatives by exploiting mathematical insight on a higher level. In these toolkits, algorithmic structures such as BLAS-like operations, linear and nonlinear solvers, or integrators for ordinary differential equations can be identified by their standardized interfaces and recognized as high-level mathematical objects rather than as a sequence of elementary statements. In this note, the differentiation of a linear solver with respect to some parameter vector is taken as an example. Mathematical insight is used to reformulate this problem into the solution of multiple linear systems that share the same coefficient matrix but differ in their right-hand sides. The experiments reported here use ADIC, a tool for the automatic differentiation of C programs, and PETSC, an object-oriented toolkit for the parallel solution of scientific problems modeled by partial differential equations.
Parallel computation of manipulator inverse dynamics
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1991-01-01
In this article, parallel computation of manipulator inverse dynamics is investigated. A hierarchical graph-based mapping approach is devised to analyze the inherent parallelism in the Newton-Euler formulation at several computational levels, and to derive the features of an abstract architecture for exploitation of parallelism. At each level, a parallel algorithm represents the application of a parallel model of computation that transforms the computation into a graph whose structure defines the features of an abstract architecture, i.e., number of processors, communication structure, etc. Data-flow analysis is employed to derive the time lower bound in the computation as well as the sequencing of the abstract architecture. The features of the target architecture are defined by optimization of the abstract architecture to exploit maximum parallelism while minimizing architectural complexity. An architecture is designed and implemented that is capable of efficient exploitation of parallelism at several computational levels. The computation time of the Newton-Euler formulation for a 6-degree-of-freedom (dof) general manipulator is measured as 187 microsec. The increase in computation time for each additional dof is 23 microsec, which leads to a computation time of less than 500 microsec, even for a 12-dof redundant arm.
On the Scalability of Parallel UCT
NASA Astrophysics Data System (ADS)
Segal, Richard B.
The parallelization of MCTS across multiple-machines has proven surprisingly difficult. The limitations of existing algorithms were evident in the 2009 Computer Olympiad where Zen using a single four-core machine defeated both Fuego with ten eight-core machines, and Mogo with twenty thirty-two core machines. This paper investigates the limits of parallel MCTS in order to understand why distributed parallelism has proven so difficult and to pave the way towards future distributed algorithms with better scaling. We first analyze the single-threaded scaling of Fuego and find that there is an upper bound on the play-quality improvements which can come from additional search. We then analyze the scaling of an idealized N-core shared memory machine to determine the maximum amount of parallelism supported by MCTS. We show that parallel speedup depends critically on how much time is given to each player. We use this relationship to predict parallel scaling for time scales beyond what can be empirically evaluated due to the immense computation required. Our results show that MCTS can scale nearly perfectly to at least 64 threads when combined with virtual loss, but without virtual loss scaling is limited to just eight threads. We also find that for competition time controls scaling to thousands of threads is impossible not necessarily due to MCTS not scaling, but because high levels of parallelism can start to bump up against the upper performance bound of Fuego itself.
Performance of the Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
Code Parallelization with CAPO: A User Manual
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2001-01-01
A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
A simple hyperbolic model for communication in parallel processing environments
NASA Technical Reports Server (NTRS)
Stoica, Ion; Sultan, Florin; Keyes, David
1994-01-01
We introduce a model for communication costs in parallel processing environments called the 'hyperbolic model,' which generalizes two-parameter dedicated-link models in an analytically simple way. Dedicated interprocessor links parameterized by a latency and a transfer rate that are independent of load are assumed by many existing communication models; such models are unrealistic for workstation networks. The communication system is modeled as a directed communication graph in which terminal nodes represent the application processes that initiate the sending and receiving of the information and in which internal nodes, called communication blocks (CBs), reflect the layered structure of the underlying communication architecture. The direction of graph edges specifies the flow of the information carried through messages. Each CB is characterized by a two-parameter hyperbolic function of the message size that represents the service time needed for processing the message. The parameters are evaluated in the limits of very large and very small messages. Rules are given for reducing a communication graph consisting of many to an equivalent two-parameter form, while maintaining an approximation for the service time that is exact in both large and small limits. The model is validated on a dedicated Ethernet network of workstations by experiments with communication subprograms arising in scientific applications, for which a tight fit of the model predictions with actual measurements of the communication and synchronization time between end processes is demonstrated. The model is then used to evaluate the performance of two simple parallel scientific applications from partial differential equations: domain decomposition and time-parallel multigrid. In an appropriate limit, we also show the compatibility of the hyperbolic model with the recently proposed LogP model.
Domain decomposition methods for the parallel computation of reacting flows
NASA Astrophysics Data System (ADS)
Keyes, David E.
1989-05-01
Domain decomposition is a natural route to parallel computing for partial differential equation solvers. In this procedure, subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, we make comparisons between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demostrate for it approximately 10-fold speedup on 16 processsors. The three special features of reacting flow models in relation to these linear systems are: the possibly large number of degrees of freedom per gridpoint, the dominance of dense intra-point source-term coupling over inter-point convective-diffusive coupling throughout significant portions of the flow-field and strong nonlinearities which restrict the time step to small values (independent of linear algebraic considerations) throughout significant portions of the iteration history. Though these features are exploited to advantage herein, many aspects of the paper are applicable to the modeling of general convective-diffusive systems.
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1998-01-01
A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation
NASA Technical Reports Server (NTRS)
Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.
1996-01-01
We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
Run-time recognition of task parallelism within the P++ parallel array class library
Parsons, R.; Quinlan, D.
1993-11-01
This paper explores the use of a run-time system to recognize task parallelism with a C++ array class library. Run-time systems currently support data parallelism in P++, FORTRAN 90 D, and High Performance FORTRAN. But data parallelism in insufficient for many applications, including adaptive mesh refinement. Without access to both data and task parallelism such applications exhibit several orders of magnitude more message passing and poor performance. In this work, a C++ array class library is used to implement deferred evaluation and run-time dependence for task parallelism recognition, tp obtain task parallelism through a data flow interpretation of data parallel array statements. Performance results show that that analysis and optimizations are both efficient and practical, allowing us to consider more substantial optimizations.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Xyce parallel electronic simulator : users' guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique
SLAPP: A systolic linear algebra parallel processor
Drake, B.L.; Luk, F.T.; Speiser, J.M.; Symanski, J.J.
1987-07-01
Systolic array computer architectures provide a means for fast computation of the linear algebra algorithms that form the building blocks of many signal-processing algorithms, facilitating their real-time computation. For applications to signal processing, the systolic array operates on matrices, an inherently parallel view of the data, using numerical linear algebra algorithms that have been suitably parallelized to efficiently utilize the available hardware. This article describes work currently underway at the Naval Ocean Systems Center, San Diego, California, to build a two-dimensional systolic array, SLAPP, demonstrating efficient and modular parallelization of key matric computations for real-time signal- and image-processing problems.
Language constructs for modular parallel programs
Foster, I.
1996-03-01
We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.
Parallelization of the Implicit RPLUS Algorithm
NASA Technical Reports Server (NTRS)
Orkwis, Paul D.
1997-01-01
The multiblock reacting Navier-Stokes flow solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.
Parallelization of the Implicit RPLUS Algorithm
NASA Technical Reports Server (NTRS)
Orkwis, Paul D.
1994-01-01
The multiblock reacting Navier-Stokes flow-solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.
Massively parallel neurocomputing for aerospace applications
NASA Technical Reports Server (NTRS)
Fijany, Amir; Barhen, Jacob; Toomarian, Nikzad
1993-01-01
An innovative hybrid, analog-digital charge-domain technology, for the massively parallel VLSI implementation of certain large scale matrix-vector operations, has recently been introduced. It employs arrays of Charge Coupled/Charge Injection Device cells holding an analog matrix of charge, which process digital vectors in parallel by means of binary, non-destructive charge transfer operations. The impact of this technology on massively parallel processing is discussed. Fundamentally new classes of algorithms, specifically designed for this emerging technology, as applied to signal processing, are derived.
Knowledge representation into Ada parallel processing
NASA Technical Reports Server (NTRS)
Masotto, Tom; Babikyan, Carol; Harper, Richard
1990-01-01
The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.
Parallel Climate Analysis Toolkit (ParCAT)
Energy Science and Technology Software Center (ESTSC)
2013-06-30
The parallel analysis toolkit (ParCAT) provides parallel statistical processing of large climate model simulation datasets. ParCAT provides parallel point-wise average calculations, frequency distributions, sum/differences of two datasets, and difference-of-average and average-of-difference for two datasets for arbitrary subsets of simulation time. ParCAT is a command-line utility that can be easily integrated in scripts or embedded in other application. ParCAT supports CMIP5 post-processed datasets as well as non-CMIP5 post-processed datasets. ParCAT reads and writes standard netCDF files.
Distributed parallel messaging for multiprocessor systems
Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka
2013-06-04
A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Parallel optical memories for very large databases
NASA Astrophysics Data System (ADS)
Mitkas, Pericles A.; Berra, P. B.
1993-02-01
The steady increase in volume of current and future databases dictates the development of massive secondary storage devices that allow parallel access and exhibit high I/O data rates. Optical memories, such as parallel optical disks and holograms, can satisfy these requirements because they combine high recording density and parallel one- or two-dimensional output. Several configurations for database storage involving different types of optical memory devices are investigated. All these approaches include some level of optical preprocessing in the form of data filtering in an attempt to reduce the amount of data per transaction that reach the electronic front-end.
Formation of parallel two-phase flow in nanochannel and application to solvent extraction
NASA Astrophysics Data System (ADS)
Kazoe, Yutaka; Ugajin, Takuya; Ohta, Ryoichi; Mawatari, Kazuma; Kitamori, Takehiko; The University of Tokyo Team
2015-11-01
Micro chemical systems have realized high-throughput analysis in ultra small volumes. Our group has established unit operations such as extraction, separation and reaction, and a concept of integration of chemical processes using parallel multi-phase flows in microchannels. Recently, the research field has been extended to 10-1000 nm space (extended-nanospace). Exploiting extended-nanospace, we developed ultra high performance chemical operations such as aL-chromatography and single molecule immunoassay. However, formation of parallel multi-phase flow in nanochannels has been difficult. The challenge is to control liquid-liquid/gas-liquid interfaces in 100 nm-scale. For this purpose, this study developed a partial surface modification method of nanochannel and verified formation of parallel two-phase flow. We achieved partial hydrophobic modification using focused ion beam (FIB). Using this method, formation of parallel water/dodecane two-phase flow in a nanochannel of 1500 nm width and 890 nm depth was succeeded. Solvent extraction of lipid, which is a basic separation in bioanalysis, was achieved in 25 fL volume much smaller than single cell. This study will greatly contribute to develop novel nanofluidic devices for chemical analysis and chemical synthesis. This work was supported by Japan Science and Technology Agency, Core Research for Evolutional Science and Technology.
An observational and thermodynamic investigation of carbonate partial melting
NASA Astrophysics Data System (ADS)
Floess, David; Baumgartner, Lukas P.; Vonlanthen, Pierre
2015-01-01
Melting experiments available in the literature show that carbonates and pelites melt at similar conditions in the crust. While partial melting of pelitic rocks is common and well-documented, reports of partial melting in carbonates are rare and ambiguous, mainly because of intensive recrystallization and the resulting lack of criteria for unequivocal identification of melting. Here we present microstructural, textural, and geochemical evidence for partial melting of calcareous dolomite marbles in the contact aureole of the Tertiary Adamello Batholith. Petrographic observations and X-ray micro-computed tomography (X-ray μCT) show that calcite crystallized either in cm- to dm-scale melt pockets, or as an interstitial phase forming an interconnected network between dolomite grains. Calcite-dolomite thermometry yields a temperature of at least 670 °C, which is well above the minimum melting temperature of ∼600 °C reported for the CaO-MgO-CO2-H2O system. Rare-earth element (REE) partition coefficients (KDcc/do) range between 9-35 for adjacent calcite-dolomite pairs. These KD values are 3-10 times higher than equilibrium values between dolomite and calcite reported in the literature. They suggest partitioning of incompatible elements into a melt phase. The δ18O and δ13C isotopic values of calcite and dolomite support this interpretation. Crystallographic orientations measured by electron backscattered diffraction (EBSD) show a clustering of c-axes for dolomite and interstitial calcite normal to the foliation plane, a typical feature for compressional deformation, whereas calcite crystallized in pockets shows a strong clustering of c-axes parallel to the pocket walls, suggesting that it crystallized after deformation had stopped. All this together suggests the formation of partial melts in these carbonates. A Schreinemaker analysis of the experimental data for a CO2-H2O fluid-saturated system indeed predicts formation of calcite-rich melt between 650-880 °C, in
Partial and full-thickness neuroretinal transplants.
Ghosh, F; Juliusson, B; Arnér, K; Ehinger, B
1999-01-01
Adult and embryonic rabbit retinal sheets were transplanted into the subretinal space of adult rabbits. The transplants were either full-thickness with intact layering, or gelatin embedded and vibratome sectioned with the inner retina removed. The full-thickness grafts were positioned subretinally by means of a glass capillary in which they were partially folded. The vibratome sectioned ones were placed using a plastic injector in which the gelatin embedded graft was flat. The embryonic full-thickness grafts were followed clinically up to 3 months, and the other 3 transplant types up to 1 month postoperatively, after which the retina was sectioned and stained for light microscopy. Surgical complications were more common in eyes receiving vibratome sectioned grafts with 10 out of 34 eyes displaying blood in the vitreous. Four of these eyes also developed total retinal detachment. Out of 17 eyes receiving full-thickness grafts, only one displayed these complications. Histologically, 11 out of 13 embryonic full-thickness transplants revealed straight, laminated transplants with correct polarity, and with all normal retinal layers present. In these transplants, fusion with the host increased in time. Of the adult full-thickness transplants, only 1 out of 4 survived, and this graft showed signs of degeneration. The vibratome sectioned adult transplants in a few cases survived the first two postoperative weeks. In these grafts, both inner and outer retina were present, indicating an incomplete vibratome sectioning. With longer postoperative times, the number of surviving transplants in this group diminished considerably. All vibratome sectioned embryonic transplants developed into rosettes and sometimes also into laminated sections with reversed polarity. It can be concluded that in rabbits, the surgical technique used for vibratome sectioned transplants requires a larger sclerotomy and retinotomy, since they have to be kept flat in the transplanting instrument due to
DNAPL invasion into a partially saturated dead-end fracture
gwsu@lbl.gov
2004-06-17
The critical height for DNAPL entry into a partially watersaturated, dead-end fracture is derived and compared to laboratoryobservations. Experiments conducted in an analog, parallel-plate fracturedemonstrate that DNAPL accumulates above the water until the height ofthe DNAPL overcomes the sum of the capillary forces at the DNAPL-airinterface and at the DNAPL-water interface. These experiments also showthat DNAPL preferentially enters the water at locations where DNAPL haspreviously entered, and the entry heights for these subsequent entriesare lower than the heights measured for the initial invasion. The wettingcontact angle at the DNAPL-water interface becomes larger at thelocations where the DNAPL has already entered the water because ofresidual DNAPL on the fracture walls, which results in lowering thecritical entry height at those locations. The experiments alsodemonstrate that a DNAPL lens can remain nearly immobile above the waterfor a period of time before eventually redistributing itself and enteringthe water.
Channeled partial Mueller matrix polarimetry
NASA Astrophysics Data System (ADS)
Alenin, Andrey S.; Tyo, J. S.
2015-09-01
In prior work,1,2 we introduced methods to treat channeled systems in a way that is similar to Data Reduction Method (DRM), by focusing attention on the Fourier content of the measurement conditions. Introduction of Q enabled us to more readily extract the performance of the system and thereby optimize it to obtain reconstruction with the least noise. The analysis tools developed for that exercise can be expanded to be applicable to partial Mueller Matrix Polarimeters (pMMPs), which were a topic of prior discussion as well. In this treatment, we combine the principles involved in both of those research trajectories and identify a set of channeled pMMP families. As a result, the measurement structure of such systems is completely known and the design of a channeled pMMP intended for any given task becomes a search over a finite set of possibilities, with the additional channel rotation allowing for a more desirable Mueller element mixing.
GLSMs for partial flag manifolds
NASA Astrophysics Data System (ADS)
Donagi, Ron; Sharpe, Eric
2008-12-01
In this paper we outline some aspects of nonabelian gauged linear sigma models. First, we review how partial flag manifolds (generalizing Grassmannians) are described physically by nonabelian gauged linear sigma models, paying attention to realizations of tangent bundles and other aspects pertinent to (0, 2) models. Second, we review constructions of Calabi-Yau complete intersections within such flag manifolds, and properties of the gauged linear sigma models. We discuss a number of examples of nonabelian GLSMs in which the Kähler phases are not birational, and in which at least one phase is realized in some fashion other than as a complete intersection, extending previous work of Hori-Tong. We also review an example of an abelian GLSM exhibiting the same phenomenon. We tentatively identify the mathematical relationship between such non-birational phases, as examples of Kuznetsov's homological projective duality. Finally, we discuss linear sigma model moduli spaces in these gauged linear sigma models. We argue that the moduli spaces being realized physically by these GLSMs are precisely Quot and hyperquot schemes, as one would expect mathematically.
Runtime support for parallelizing data mining algorithms
NASA Astrophysics Data System (ADS)
Jin, Ruoming; Agrawal, Gagan
2002-03-01
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Parallel Implementation of the Discontinuous Galerkin Method
NASA Technical Reports Server (NTRS)
Baggag, Abdalkader; Atkins, Harold; Keyes, David
1999-01-01
This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.
Improved chopper circuit uses parallel transistors
NASA Technical Reports Server (NTRS)
1966-01-01
Parallel transistor chopper circuit operates with one transistor in the forward mode and the other in the inverse mode. By using this method, it acts as a single, symmetrical, bidirectional transistor, and reduces and stabilizes the offset voltage.
Parallel processor programs in the Federal Government
NASA Technical Reports Server (NTRS)
Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.
1985-01-01
In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.
Parallelism In Rule-Based Systems
NASA Astrophysics Data System (ADS)
Sabharwal, Arvind; Iyengar, S. Sitharama; de Saussure, G.; Weisbin, C. R.
1988-03-01
Rule-based systems, which have proven to be extremely useful for several Artificial Intelligence and Expert Systems applications, currently face severe limitations due to the slow speed of their execution. To achieve the desired speed-up, this paper addresses the problem of parallelization of production systems and explores the various architectural and algorithmic possibilities. The inherent sources of parallelism in the production system structure are analyzed and the trade-offs, limitations and feasibility of exploitation of these sources of parallelism are presented. Based on this analysis, we propose a dedicated, coarse-grained, n-ary tree multiprocessor architecture for the parallel implementation of rule-based systems and then present algorithms for partitioning of rules in this architecture.
Parallel programming with PCN. Revision 1
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
The PISCES 2 parallel programming environment
NASA Technical Reports Server (NTRS)
Pratt, Terrence W.
1987-01-01
PISCES 2 is a programming environment for scientific and engineering computations on MIMD parallel computers. It is currently implemented on a flexible FLEX/32 at NASA Langley, a 20 processor machine with both shared and local memories. The environment provides an extended Fortran for applications programming, a configuration environment for setting up a run on the parallel machine, and a run-time environment for monitoring and controlling program execution. This paper describes the overall design of the system and its implementation on the FLEX/32. Emphasis is placed on several novel aspects of the design: the use of a carefully defined virtual machine, programmer control of the mapping of virtual machine to actual hardware, forces for medium-granularity parallelism, and windows for parallel distribution of data. Some preliminary measurements of storage use are included.
Parallel line scanning ophthalmoscope for retinal imaging.
Vienola, Kari V; Damodaran, Mathi; Braaf, Boy; Vermeer, Koenraad A; de Boer, Johannes F
2015-11-15
A parallel line scanning ophthalmoscope (PLSO) is presented using a digital micromirror device (DMD) for parallel confocal line imaging of the retina. The posterior part of the eye is illuminated using up to seven parallel lines, which were projected at 100 Hz. The DMD offers a high degree of parallelism in illuminating the retina compared to traditional scanning laser ophthalmoscope systems utilizing scanning mirrors. The system operated at the shot-noise limit with a signal-to-noise ratio of 28 for an optical power measured at the cornea of 100 μW. To demonstrate the imaging capabilities of the system, the macula and the optic nerve head of a healthy volunteer were imaged. Confocal images show good contrast and lateral resolution with a 10°×10° field of view. PMID:26565868
Parallel algorithms for dynamically partitioning unstructured grids
Diniz, P.; Plimpton, S.; Hendrickson, B.; Leland, R.
1994-10-01
Grid partitioning is the method of choice for decomposing a wide variety of computational problems into naturally parallel pieces. In problems where computational load on the grid or the grid itself changes as the simulation progresses, the ability to repartition dynamically and in parallel is attractive for achieving higher performance. We describe three algorithms suitable for parallel dynamic load-balancing which attempt to partition unstructured grids so that computational load is balanced and communication is minimized. The execution time of algorithms and the quality of the partitions they generate are compared to results from serial partitioners for two large grids. The integration of the algorithms into a parallel particle simulation is also briefly discussed.
Massively Parallel Computing: A Sandia Perspective
Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.
1999-05-06
The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
Social Problems and Deviance: Some Parallel Issues
ERIC Educational Resources Information Center
Kitsuse, John I.; Spector, Malcolm
1975-01-01
Explores parallel developments in labeling theory and in the value conflict approach to social problems. Similarities in their critiques of functionalism and etiological theory as well as their emphasis on the definitional process are noted. (Author)
Parallel supercomputing today and the cedar approach.
Kuck, D J; Davidson, E S; Lawrie, D H; Sameh, A H
1986-02-28
More and more scientists and engineers are becoming interested in using supercomputers. Earlier barriers to using these machines are disappearing as software for their use improves. Meanwhile, new parallel supercomputer architectures are emerging that may provide rapid growth in performance. These systems may use a large number of processors with an intricate memory system that is both parallel and hierarchical; they will require even more advanced software. Compilers that restructure user programs to exploit the machine organization seem to be essential. A wide range of algorithms and applications is being developed in an effort to provide high parallel processing performance in many fields. The Cedar supercomputer, presently operating with eight processors in parallel, uses advanced system and applications software developed at the University of Illinois during the past 12 years. This software should allow the number of processors in Cedar to be doubled annually, providing rapid performance advances in the next decade. PMID:17740294
Feature Clustering for Accelerating Parallel Coordinate Descent
Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.
2012-12-06
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
Finite element computation with parallel VLSI
NASA Technical Reports Server (NTRS)
Mcgregor, J.; Salama, M.
1983-01-01
This paper describes a parallel processing computer consisting of a 16-bit microcomputer as a master processor which controls and coordinates the activities of 8086/8087 VLSI chip set slave processors working in parallel. The hardware is inexpensive and can be flexibly configured and programmed to perform various functions. This makes it a useful research tool for the development of, and experimentation with parallel mathematical algorithms. Application of the hardware to computational tasks involved in the finite element analysis method is demonstrated by the generation and assembly of beam finite element stiffness matrices. A number of possible schemes for the implementation of N-elements on N- or n-processors (N is greater than n) are described, and the speedup factors of their time consumption are determined as a function of the number of available parallel processors.
NAS Parallel Benchmarks, Multi-Zone Versions
NASA Technical Reports Server (NTRS)
vanderWijngaart, Rob F.; Haopiang, Jin
2003-01-01
We describe an extension of the NAS Parallel Benchmarks (NPB) suite that involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy, which is common among structured-mesh production flow solver codes in use at NASA Ames and elsewhere, provides relatively easily exploitable coarse-grain parallelism between meshes. Since the individual application benchmarks also allow fine-grain parallelism themselves, this NPB extension, named NPB Multi-Zone (NPB-MZ), is a good candidate for testing hybrid and multi-level parallelization tools and strategies.
Data parallel sorting for particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
The Nexus task-parallel runtime system
Foster, I.; Tuecke, S.; Kesselman, C.
1994-12-31
A runtime system provides a parallel language compiler with an interface to the low-level facilities required to support interaction between concurrently executing program components. Nexus is a portable runtime system for task-parallel programming languages. Distinguishing features of Nexus include its support for multiple threads of control, dynamic processor acquisition, dynamic address space creation, a global memory model via interprocessor references, and asynchronous events. In addition, it supports heterogeneity at multiple levels, allowing a single computation to utilize different programming languages, executables, processors, and network protocols. Nexus is currently being used as a compiler target for two task-parallel languages: Fortran M and Compositional C++. In this paper, we present the Nexus design, outline techniques used to implement Nexus on parallel computers, show how it is used in compilers, and compare its performance with that of another runtime system.
Parallel processing of a rotating shaft simulation
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.
1989-01-01
A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.
Modified mesh-connected parallel computers
Carlson, D.A. )
1988-10-01
The mesh-connected parallel computer is an important parallel processing organization that has been used in the past for the design of supercomputing systems. In this paper, the authors explore modifications of a mesh-connected parallel computer for the purpose of increasing the efficiency of executing important application programs. These modifications are made by adding one or more global mesh structures to the processing array. They show how our modifications allow asymptotic improvements in the efficiency of executing computations having low to medium interprocessor communication requirements (e.g., tree computations, prefix computations, finding the connected components of a graph). For computations with high interprocessor communication requirements such as sorting, they show that they offer no speedup. They also compare the modified mesh-connected parallel computer to other similar organizations including the pyramid, the X-tree, and the mesh-of-trees.
Fast and practical parallel polynomial interpolation
Egecioglu, O.; Gallopoulos, E.; Koc, C.K.
1987-01-01
We present fast and practical parallel algorithms for the computation and evaluation of interpolating polynomials. The algorithms make use of fast parallel prefix techniques for the calculation of divided differences in the Newton representation of the interpolating polynomial. For n + 1 given input pairs the proposed interpolation algorithm requires 2 (log (n + 1)) + 2 parallel arithmetic steps and circuit size O(n/sup 2/). The algorithms are numerically stable and their floating-point implementation results in error accumulation similar to that of the widely used serial algorithms. This is in contrast to other fast serial and parallel interpolation algorithms which are subject to much larger roundoff. We demonstrate that in a distributed memory environment context, a cube connected system is very suitable for the algorithms' implementation, exhibiting very small communication cost. As further advantages we note that our techniques do not require equidistant points, preconditioning, or use of the Fast Fourier Transform. 21 refs., 4 figs.
Asynchronous parallel pattern search for nonlinear optimization
P. D. Hough; T. G. Kolda; V. J. Torczon
2000-01-01
Parallel pattern search (PPS) can be quite useful for engineering optimization problems characterized by a small number of variables (say 10--50) and by expensive objective function evaluations such as complex simulations that take from minutes to hours to run. However, PPS, which was originally designed for execution on homogeneous and tightly-coupled parallel machine, is not well suited to the more heterogeneous, loosely-coupled, and even fault-prone parallel systems available today. Specifically, PPS is hindered by synchronization penalties and cannot recover in the event of a failure. The authors introduce a new asynchronous and fault tolerant parallel pattern search (AAPS) method and demonstrate its effectiveness on both simple test problems as well as some engineering optimization problems
Massive parallelism in the future of science
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
Massive parallelism appears in three domains of action of concern to scientists, where it produces collective action that is not possible from any individual agent's behavior. In the domain of data parallelism, computers comprising very large numbers of processing agents, one for each data item in the result will be designed. These agents collectively can solve problems thousands of times faster than current supercomputers. In the domain of distributed parallelism, computations comprising large numbers of resource attached to the world network will be designed. The network will support computations far beyond the power of any one machine. In the domain of people parallelism collaborations among large groups of scientists around the world who participate in projects that endure well past the sojourns of individuals within them will be designed. Computing and telecommunications technology will support the large, long projects that will characterize big science by the turn of the century. Scientists must become masters in these three domains during the coming decade.
A join algorithm for combining AND parallel solutions in AND/OR parallel systems
Ramkumar, B. ); Kale, L.V. )
1992-02-01
When two or more literals in the body of a Prolog clause are solved in (AND) parallel, their solutions need to be joined to compute solutions for the clause. This is often a difficult problem in parallel Prolog systems that exploit OR and independent AND parallelism in Prolog programs. In several AND/OR parallel systems proposed recently, this problem is side-stepped at the cost of unexploited OR parallelism in the program, in part due to the complexity of the backtracking algorithm beneath AND parallel branches. In some cases, the data dependency graphs used by these systems cannot represent all the exploitable independent AND parallelism known at compile time. In this paper, we describe the compile time analysis for an optimized join algorithm for supporting independent AND parallelism in logic programs efficiently without leaving and OR parallelism unexploited. We then discuss how this analysis can be used to yield very efficient runtime behavior. We also discuss problems associated with a tree representation of the search space when arbitrarily complex data dependency graphs are permitted. We describe how these problems can be resolved by mapping the search space onto data dependency graphs themselves. The algorithm has been implemented in a compiler for parallel Prolog based on the reduce-OR process model. The algorithm is suitable for the implementation of AND/OR systems on both shared and nonshared memory machines. Performance on benchmark programs.
A survey of parallel programming tools
NASA Technical Reports Server (NTRS)
Cheng, Doreen Y.
1991-01-01
This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
Fast Parallel Computation Of Multibody Dynamics
NASA Technical Reports Server (NTRS)
Fijany, Amir; Kwan, Gregory L.; Bagherzadeh, Nader
1996-01-01
Constraint-force algorithm fast, efficient, parallel-computation algorithm for solving forward dynamics problem of multibody system like robot arm or vehicle. Solves problem in minimum time proportional to log(N) by use of optimal number of processors proportional to N, where N is number of dynamical degrees of freedom: in this sense, constraint-force algorithm both time-optimal and processor-optimal parallel-processing algorithm.
LDV Measurement of Confined Parallel Jet Mixing
R.F. Kunz; S.W. D'Amico; P.F. Vassallo; M.A. Zaccaria
2001-01-31
Laser Doppler Velocimetry (LDV) measurements were taken in a confinement, bounded by two parallel walls, into which issues a row of parallel jets. Two-component measurements were taken of two mean velocity components and three Reynolds stress components. As observed in isolated three dimensional wall bounded jets, the transverse diffusion of the jets is quite large. The data indicate that this rapid mixing process is due to strong secondary flows, transport of large inlet intensities and Reynolds stress anisotropy effects.
Enhancing Scalability of Parallel Structured AMR Calculations
Wissink, A M; Hysom, D; Hornung, R D
2003-02-10
This paper discusses parallel scaling performance of large scale parallel structured adaptive mesh refinement (SAMR) calculations in SAMRAI. Previous work revealed that poor scaling qualities in the adaptive gridding operations in SAMR calculations cause them to become dominant for cases run on up to 512 processors. This work describes algorithms we have developed to enhance the efficiency of the adaptive gridding operations. Performance of the algorithms is evaluated for two adaptive benchmarks run on up 512 processors of an IBM SP system.
HOPSPACK: Hybrid Optimization Parallel Search Package.
Gray, Genetha A.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica
2008-12-01
In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4
Computational electromagnetics and parallel dense matrix computations
Forsman, K.; Kettunen, L.; Gropp, W.; Levine, D.
1995-06-01
We present computational results using CORAL, a parallel, three-dimensional, nonlinear magnetostatic code based on a volume integral equation formulation. A key feature of CORAL is the ability to solve, in parallel, the large, dense systems of linear equations that are inherent in the use of integral equation methods. Using the Chameleon and PSLES libraries ensures portability and access to the latest linear algebra solution technology.
Partitioning And Packing Equations For Parallel Processing
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.; Milner, Edward J.
1989-01-01
Algorithm developed to identify parallelism in set of coupled ordinary differential equations that describe physical system and to divide set into parallel computational paths, along with parts of solution proceeds independently of others during at least part of time. Path-identifying algorithm creates number of paths consisting of equations that must be computed serially and table that gives dependent and independent arguments and "can start," "can end," and "must end" times of each equation. "Must end" time used subsequently by packing algorithm.
Parallel algorithms for unconstrained optimizations by multisplitting
He, Qing
1994-12-31
In this paper a new parallel iterative algorithm for unconstrained optimization using the idea of multisplitting is proposed. This algorithm uses the existing sequential algorithms without any parallelization. Some convergence and numerical results for this algorithm are presented. The experiments are performed on an Intel iPSC/860 Hyper Cube with 64 nodes. It is interesting that the sequential implementation on one node shows that if the problem is split properly, the algorithm converges much faster than one without splitting.
Acoustic simulation in architecture with parallel algorithm
NASA Astrophysics Data System (ADS)
Li, Xiaohong; Zhang, Xinrong; Li, Dan
2004-03-01
In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.
A parallel PCG solver for MODFLOW.
Dong, Yanhui; Li, Guomin
2009-01-01
In order to simulate large-scale ground water flow problems more efficiently with MODFLOW, the OpenMP programming paradigm was used to parallelize the preconditioned conjugate-gradient (PCG) solver with in this study. Incremental parallelization, the significant advantage supported by OpenMP on a shared-memory computer, made the solver transit to a parallel program smoothly one block of code at a time. The parallel PCG solver, suitable for both MODFLOW-2000 and MODFLOW-2005, is verified using an 8-processor computer. Both the impact of compilers and different model domain sizes were considered in the numerical experiments. Based on the timing results, execution times using the parallel PCG solver are typically about 1.40 to 5.31 times faster than those using the serial one. In addition, the simulation results are the exact same as the original PCG solver, because the majority of serial codes were not changed. It is worth noting that this parallelizing approach reduces cost in terms of software maintenance because only a single source PCG solver code needs to be maintained in the MODFLOW source tree. PMID:19563427
Modeling Parallel System Workloads with Temporal Locality
NASA Astrophysics Data System (ADS)
Minh, Tran Ngoc; Wolters, Lex
In parallel systems, similar jobs tend to arrive within bursty periods. This fact leads to the existence of the locality phenomenon, a persistent similarity between nearby jobs, in real parallel computer workloads. This important phenomenon deserves to be taken into account and used as a characteristic of any workload model. Regrettably, this property has received little if any attention of researchers and synthetic workloads used for performance evaluation to date often do not have locality. With respect to this research trend, Feitelson has suggested a general repetition approach to model locality in synthetic workloads [6]. Using this approach, Li et al. recently introduced a new method for modeling temporal locality in workload attributes such as run time and memory [14]. However, with the assumption that each job in the synthetic workload requires a single processor, the parallelism has not been taken into account in their study. In this paper, we propose a new model for parallel computer workloads based on their result. In our research, we firstly improve their model to control locality of a run time process better and then model the parallelism. The key idea for modeling the parallelism is to control the cross-correlation between the run time and the number of processors. Experimental results show that not only the cross-correlation is controlled well by our model, but also the marginal distribution can be fitted nicely. Furthermore, the locality feature is also obtained in our model.
Parallel object-oriented adaptive mesh refinement
Balsara, D.; Quinlan, D.J.
1997-04-01
In this paper we study adaptive mesh refinement (AMR) for elliptic and hyperbolic systems. We use the Asynchronous Fast Adaptive Composite Grid Method (AFACX), a parallel algorithm based upon the of Fast Adaptive Composite Grid Method (FAC) as a test case of an adaptive elliptic solver. For our hyperbolic system example we use TVD and ENO schemes for solving the Euler and MHD equations. We use the structured grid load balancer MLB as a tool for obtaining a load balanced distribution in a parallel environment. Parallel adaptive mesh refinement poses difficulties in expressing both the basic single grid solver, whether elliptic or hyperbolic, in a fashion that parallelizes seamlessly. It also requires that these basic solvers work together within the adaptive mesh refinement algorithm which uses the single grid solvers as one part of its adaptive solution process. We show that use of AMR++, an object-oriented library within the OVERTURE Framework, simplifies the development of AMR applications. Parallel support is provided and abstracted through the use of the P++ parallel array class.
The Xyce Parallel Electronic Simulator - An Overview
HUTCHINSON,SCOTT A.; KEITER,ERIC R.; HOEKSTRA,ROBERT J.; WATTS,HERMAN A.; WATERS,ARLON J.; SCHELLS,REGINA L.; WIX,STEVEN D.
2000-12-08
The Xyce{trademark} Parallel Electronic Simulator has been written to support the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on providing the capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). In addition, they are providing improved performance for numerical kernels using state-of-the-art algorithms, support for modeling circuit phenomena at a variety of abstraction levels and using object-oriented and modern coding-practices that ensure the code will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows.
Efficient communication in massively parallel computers
Cypher, R.E.
1989-01-01
A fundamental operation in parallel computation is sorting. Sorting is important not only because it is required by many algorithms, but also because it can be used to implement irregular, pointer-based communication. The author studies two algorithms for sorting in massively parallel computers. First, he examines Shellsort. Shellsort is a sorting algorithm that is based on a sequence of parameters called increments. Shellsort can be used to create a parallel sorting device known as a sorting network. Researchers have suggested that if the correct increment sequence is used, an optimal size sorting network can be obtained. All published increment sequences have been monotonically decreasing. He shows that no monotonically decreasing increment sequence will yield an optimal size sorting network. Second, he presents a sorting algorithm called Cubesort. Cubesort is the fastest known sorting algorithm for a variety of parallel computers aver a wide range of parameters. He also presents a paradigm for developing parallel algorithms that have efficient communication. The paradigm, called the data reduction paradigm, consists of using a divide-and-conquer strategy. Both the division and combination phases of the divide-and-conquer algorithm may require irregular, pointer-based communication between processors. However, the problem is divided so as to limit the amount of data that must be communicated. As a result the communication can be performed efficiently. He presents data reduction algorithms for the image component labeling problem, the closest pair problem and four versions of the parallel prefix problem.
Quantum states with strong positive partial transpose
Chruscinski, Dariusz; Jurkowski, Jacek; Kossakowski, Andrzej
2008-02-15
We construct a large class of bipartite M x N quantum states which defines a proper subset of states with positive partial transposes (PPTs). Any state from this class has PPT but the positivity of its partial transposition is recognized with respect to canonical factorization of the original density operator. We propose to call elements from this class states with strong positive partial transposes (SPPTs). We conjecture that all SPPT states are separable.
20% PARTIAL SIBERIAN SNAKE IN THE AGS.
Huang, H; Bai, M; Brown, K A; Glenn, W; Luccio, A U; Mackay, W W; Montag, C; Ptitsyn, V; Roser, T; Tsoupas, N; Zeno, K; Ranjbar, V; Spinka, H; Underwood, D
2002-11-06
An 11.4% partial Siberian snake was used to successfully accelerate polarized proton through a strong intrinsic depolarizing spin resonance in the AGS. No noticeable depolarization was observed. This opens up the possibility of using a 20% to 30% partial Siberian snake in the AGS to overcome all weak and strong depolarizing spin resonances. Some design and operation issues of the new partial Siberian snake are discussed.