Goerner, Frank L.; Duong, Timothy; Stafford, R. Jason; Clarke, Geoffrey D.
2013-01-01
Purpose: To investigate the utility of five different standard measurement methods for determining image uniformity for partially parallel imaging (PPI) acquisitions in terms of consistency across a variety of pulse sequences and reconstruction strategies. Methods: Images were produced with a phantom using a 12-channel head matrix coil in a 3T MRI system (TIM TRIO, Siemens Medical Solutions, Erlangen, Germany). Images produced using echo-planar, fast spin echo, gradient echo, and balanced steady state free precession pulse sequences were evaluated. Two different PPI reconstruction methods were investigated, generalized autocalibrating partially parallel acquisition algorithm (GRAPPA) and modified sensitivity-encoding (mSENSE) with acceleration factors (R) of 2, 3, and 4. Additionally images were acquired with conventional, two-dimensional Fourier imaging methods (R = 1). Five measurement methods of uniformity, recommended by the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) were considered. The methods investigated were (1) an ACR method and a (2) NEMA method for calculating the peak deviation nonuniformity, (3) a modification of a NEMA method used to produce a gray scale uniformity map, (4) determining the normalized absolute average deviation uniformity, and (5) a NEMA method that focused on 17 areas of the image to measure uniformity. Changes in uniformity as a function of reconstruction method at the same R-value were also investigated. Two-way analysis of variance (ANOVA) was used to determine whether R-value or reconstruction method had a greater influence on signal intensity uniformity measurements for partially parallel MRI. Results: Two of the methods studied had consistently negative slopes when signal intensity uniformity was plotted against R-value. The results obtained comparing mSENSE against GRAPPA found no consistent difference between GRAPPA and mSENSE with regard to signal intensity uniformity
NASA Astrophysics Data System (ADS)
Lee, Mike M.; Cho, Byung Lok
2001-11-01
In this paper, we proposed a new First Partial product Addition (FPA) architecture with new compressor (or parallel counter) to CSA tree built in the process of adding partial product for improving speed in the fast parallel multiplier to improve the speed of calculating partial product by about 20% compared with existing parallel counter using full Adder. The new circuit reduces the CLA bit finding final sum by N/2 using the novel FPA architecture. A 5.14ns of multiplication speed of the 16X16 multiplier is obtained using 0.25um CMOS technology. The architecture of the multiplier is easily opted for pipeline design and demonstrates high speed performance.
The Force Singularity for Partially Immersed Parallel Plates
NASA Astrophysics Data System (ADS)
Bhatnagar, Rajat; Finn, Robert
2016-05-01
In earlier work, we provided a general description of the forces of attraction and repulsion, encountered by two parallel vertical plates of infinite extent and of possibly differing materials, when partially immersed in an infinite liquid bath and subject to surface tension forces. In the present study, we examine some unusual details of the exotic behavior that can occur at the singular configuration separating infinite rise from infinite descent of the fluid between the plates, as the plates approach each other. In connection with this singular behavior, we present also some new estimates on meniscus height details.
Solution of partial differential equations on vector and parallel computers
NASA Technical Reports Server (NTRS)
Ortega, J. M.; Voigt, R. G.
1985-01-01
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Applicability of Parallel Computing to Partial Wave Analysis
NASA Astrophysics Data System (ADS)
Ruger, Justin; Gilfoyle, Gerard; Weygand, Dennis; CLAS Collaboration
2013-10-01
Bound states of Quantum Chromodynamics (QCD) give insights into the nature of confinement, a key element of the strong interaction. States may be identified from weak signals extracted from the analysis of high statistics data from reactions with many final state particles. One of the best tools for the analysis of these reactions is Partial Wave Analysis (PWA). PWA transforms an ensemble of experimental data from a large acceptance detector from free particle eigenstates to angular momentum eigenstates. The PWA program must be fast enough to deal with the large amounts of data available currently, as processing time scales with the number of events. The scope of this research is to study the applicability and scalability of Intel's Xeon Phi using the Many Integrated Core (MIC) architecture when applied to the existing PWA code at Jefferson Laboratory. An algorithm was developed for the Xeon Phi and scaled across 240 available threads, giving parallel functionality to the PWA which was originally written serially. This scaling can make the fitting process fifteen times faster. Supported by the US Department of Energy.
Dynamic grid refinement for partial differential equations on parallel computers
NASA Technical Reports Server (NTRS)
Mccormick, S.; Quinlan, D.
1989-01-01
The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems.
Polarization imaging apparatus with auto-calibration
Zou, Yingyin Kevin; Zhao, Hongzhi; Chen, Qiushui
2013-08-20
A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5.degree., a second variable phase retarder with its optical axis aligned 45.degree., a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I.sub.0, I.sub.1, I.sub.2 and I.sub.3 of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (.pi.,0), (.pi.,.pi.) and (.pi./2,.pi.), respectively. Then four Stokes components of a Stokes image, S.sub.0, S.sub.1, S.sub.2 and S.sub.3 were calculated using the four intensity images.
Polarization Imaging Apparatus with Auto-Calibration
NASA Technical Reports Server (NTRS)
Zou, Yingyin Kevin (Inventor); Zhao, Hongzhi (Inventor); Chen, Qiushui (Inventor)
2013-01-01
A polarization imaging apparatus measures the Stokes image of a sample. The apparatus consists of an optical lens set, a first variable phase retarder (VPR) with its optical axis aligned 22.5 deg, a second variable phase retarder with its optical axis aligned 45 deg, a linear polarizer, a imaging sensor for sensing the intensity images of the sample, a controller and a computer. Two variable phase retarders were controlled independently by a computer through a controller unit which generates a sequential of voltages to control the phase retardations of the first and second variable phase retarders. A auto-calibration procedure was incorporated into the polarization imaging apparatus to correct the misalignment of first and second VPRs, as well as the half-wave voltage of the VPRs. A set of four intensity images, I(sub 0), I(sub 1), I(sub 2) and I(sub 3) of the sample were captured by imaging sensor when the phase retardations of VPRs were set at (0,0), (pi,0), (pi,pi) and (pi/2,pi), respectively. Then four Stokes components of a Stokes image, S(sub 0), S(sub 1), S(sub 2) and S(sub 3) were calculated using the four intensity images.
NASA Technical Reports Server (NTRS)
Toomarian, N.; Fijany, A.; Barhen, J.
1993-01-01
Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.
Characterization of high resolution MR images reconstructed by a GRAPPA based parallel technique
NASA Astrophysics Data System (ADS)
Banerjee, Suchandrima; Majumdar, Sharmila
2006-03-01
This work implemented an auto-calibrating parallel imaging technique and applied it to in vivo magnetic resonance imaging (MRI) of trabecular bone micro-architecture. A Generalized auto-calibrating partially parallel acquisition (GRAPPA) based reconstruction technique using modified robust data fitting was developed. The MR data was acquired with an eight channel phased array receiver on three normal volunteers on a General Electric 3 Tesla scanner. Microstructures comprising the trabecular bone architecture are of the order of 100 microns and hence their depiction requires very high imaging resolution. This work examined the effects of GRAPPA based parallel imaging on signal and noise characteristics and effective spatial resolution in high resolution (HR) images, for the range of undersampling or reduction factors 2-4. Additionally quantitative analysis was performed to obtain structural measures of trabecular bone from the images. Image quality in terms of contrast and depiction of structures was maintained in parallel images for reduction factors up to 3. Comparison between regular and parallel images suggested similar spatial resolution for both. However differences in noise characteristics in parallel images compared to regular images affected the threshholding based quantification. This suggested that GRAPPA based parallel images might require different analysis techniques. In conclusion, the study showed the feasibility of using parallel imaging techniques in HR-MRI of trabecular bone, although quantification strategies will have to be further investigated. Reduction of acquisition time using parallel techniques can improve the clinical feasibility of MRI of trabecular bone for prognosis and staging of the skeletal disorder osteoporosis.
Allidina, A.Y.; Malinowski, K.; Singh, M.G.
1982-12-01
The possibilities were explored for enhancing parallelism in the simulation of systems described by algebraic equations, ordinary differential equations and partial differential equations. These techniques, using multiprocessors, were developed to speed up simulations, e.g. for nuclear accidents. Issues involved in their design included suitable approximations to bring the problem into a numerically manageable form and a numerical procedure to perform the computations necessary to solve the problem accurately. Parallel processing techniques used as simulation procedures, and a design of a simulation scheme and simulation procedure employing parallel computer facilities, were both considered.
NASA Technical Reports Server (NTRS)
Nguyen, Howard; Willacy, Karen; Allen, Mark
2012-01-01
KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.
NASA Astrophysics Data System (ADS)
Lyu, Jingyuan; Nakarmi, Ukash; Zhang, Chaoyi; Ying, Leslie
2016-05-01
This paper presents a new approach to highly accelerated dynamic parallel MRI using low rank matrix completion, partial separability (PS) model. In data acquisition, k-space data is moderately randomly undersampled at the center kspace navigator locations, but highly undersampled at the outer k-space for each temporal frame. In reconstruction, the navigator data is reconstructed from undersampled data using structured low-rank matrix completion. After all the unacquired navigator data is estimated, the partial separable model is used to obtain partial k-t data. Then the parallel imaging method is used to acquire the entire dynamic image series from highly undersampled data. The proposed method has shown to achieve high quality reconstructions with reduction factors up to 31, and temporal resolution of 29ms, when the conventional PS method fails.
Autocalibration in hydrologic modeling: Using SWAT2005 in small-scale watersheds
Technology Transfer Automated Retrieval System (TEKTRAN)
SWAT is a physically-based model that can simulate water quality and quantity at the watershed scale. Due to many of the processes involved in the manual or auto-calibration of model parameters and the knowledge of realistic input values, calibration can become difficult. An autocalibration-sensitiv...
Adaptive methods and parallel computation for partial differential equations. Final report
Biswas, R.; Benantar, M.; Flaherty, J.E.
1992-05-01
Consider the adaptive solution of two-dimensional vector systems of hyperbolic and elliptic partial differential equations on shared-memory parallel computers. Hyperbolic systems are approximated by an explicit finite volume technique and solved by a recursive local mesh refinement procedure on a tree-structured grid. Local refinement of the time steps and spatial cells of a coarse base mesh is performed in regions where a refinement indicator exceeds a prescribed tolerance. Computational procedures that sequentially traverse the tree while processing solutions on each grid in parallel, that process solutions at the same tree level in parallel, and that dynamically assign processors to nodes of the tree have been developed and applied to an example. Computational results comparing a variety of heuristic processor load balancing techniques and refinement strategies are presented.
Fast parallel algorithms and enumeration techniques for partial k-trees
Narayanan, C.
1989-01-01
Recent research by several authors have resulted in systematic way of developing linear-time sequential algorithms for a host of problem: on a fairly general class of graphs variously known as bounded decomposable graphs, graphs of bounded treewidth, partial k-trees, etc. Partial k-trees arise in a variety of real-life applications such as network reliability, VLSI design and database systems and hence fast sequential algorithms on these graphs have been found to be desirable. The linear-time methodologies were independently developed by Bern, Lawler, and Wong ((10)), Arnborg and Proskurowski ((6)), Bodlaender ((14)), and Courcelle ((25)). Wimer ((89)) significantly extended the work of Bern, Lawler and Wong. All of these approaches share the common thread of using dynamic programming on a tree structure. In particular the methodology of Wimer uses a parse-tree as the data structure. The methodologies claim linear-time algorithms on partial k-trees for fixed k, for a number of combinatorial optimization problems given the tree structure as input. It is known that obtaining the tree structure is NP-hard. This dissertation investigates three important classes of problems: (1) Developing parallel algorithms for constructing a k-tree embedding, finding a tree decomposition and most notably obtaining a parse-tree for a partial k-tree. (2) Developing parallel algorithms for parse-tree computations, testing isomorphism of k-trees, and finding a 2-tree embedding of a cactus. (3) Obtaining techniques for counting vertex/edge subsets satisfying a certain property in some classes of partial k-trees. The parallel algorithms the author has developed are in class NC and are either new or improve upon the existing results of Bodlaender (13). The difference equations he has obtained for counting certain sub-graphs are not known in the literature so far.
Auto-calibration system of EMG sensor suit
NASA Astrophysics Data System (ADS)
Suzuki, Yousuke; Tanaka, Takayuki; Feng, Maria Q.
2005-12-01
Biogenic measurement has been studied as a robot's interface. We have studied the wearable sensor suit as a robot's interface. Some kinds of sensor disks are embedded the sensor suit to the wet suit-like material. The sensor suit measures a wearing person's joint, and muscular activity. In this report, we aim to establish an auto-calibration system for measuring joint torques by using EMG sensors based on neural network and sensor disks of a lattice. The Torque presumption was performed using the share neural network, which learned the data that formed the whole subject's teacher data. Additional training of the share neural network was carried out using the individual teaching data. As a result, that was able to do the neural network training in short time, high probability and high accuracy to training of initial neural network. Moreover, high-presumed accuracy was able to be acquired by this method Next, Sensor disks of a lattice was developed. EMG is measurable, checking the state of an electrode by that can measure biogenic impedance. That was able to measure EMG by sensor disks which has low impedance We measured EMG and joint torque by trial production sensor suit and torque measuring instrument. The predominancy of the torque presumption using the share neural network was check. We proposed Measurement system, which consists sensor disk of lattice. Experimental results show the proposed method is effective for the auto-calibration.
Active catheter tracking using parallel MRI and real-time image reconstruction.
Bock, Michael; Müller, Sven; Zuehlsdorff, Sven; Speier, Peter; Fink, Christian; Hallscheidt, Peter; Umathum, Reiner; Semmler, Wolfhard
2006-06-01
In this work active MR catheter tracking with automatic slice alignment was combined with an autocalibrated parallel imaging technique. Using an optimized generalized autocalibrating partially parallel acquisitions (GRAPPA) algorithm with an acceleration factor of 2, we were able to reduce the acquisition time per image by 34%. To accelerate real-time GRAPPA image reconstruction, the coil sensitivities were updated only after slice reorientation. For a 2D trueFISP acquisition (160 x 256 matrix, 80% phase matrix, half Fourier acquisition, TR = 3.7 ms, GRAPPA factor = 2) real-time image reconstruction was achieved with up to six imaging coils. In a single animal experiment the method was used to steer a catheter from the vena cava through the beating heart into the pulmonary vasculature at an image update rate of about five images per second. Under all slice orientations, parallel image reconstruction was accomplished with only minor image artifacts, and the increased temporal resolution provided a sharp delineation of intracardial structures, such as the papillary muscle. PMID:16683261
Analysis and Modeling of Parallel Photovoltaic Systems under Partial Shading Conditions
NASA Astrophysics Data System (ADS)
Buddala, Santhoshi Snigdha
Since the industrial revolution, fossil fuels like petroleum, coal, oil, natural gas and other non-renewable energy sources have been used as the primary energy source. The consumption of fossil fuels releases various harmful gases into the atmosphere as byproducts which are hazardous in nature and they tend to deplete the protective layers and affect the overall environmental balance. Also the fossil fuels are bounded resources of energy and rapid depletion of these sources of energy, have prompted the need to investigate alternate sources of energy called renewable energy. One such promising source of renewable energy is the solar/photovoltaic energy. This work focuses on investigating a new solar array architecture with solar cells connected in parallel configuration. By retaining the structural simplicity of the parallel architecture, a theoretical small signal model of the solar cell is proposed and modeled to analyze the variations in the module parameters when subjected to partial shading conditions. Simulations were run in SPICE to validate the model implemented in Matlab. The voltage limitations of the proposed architecture are addressed by adopting a simple dc-dc boost converter and evaluating the performance of the architecture in terms of efficiencies by comparing it with the traditional architectures. SPICE simulations are used to compare the architectures and identify the best one in terms of power conversion efficiency under partial shading conditions.
Nuclear norm-regularized k-space-based parallel imaging reconstruction
NASA Astrophysics Data System (ADS)
Xu, Lin; Liu, Xiaoyun
2014-04-01
Parallel imaging reconstruction suffers from serious noise amplification at high accelerations that can be alleviated with regularization by imposing some prior information or constraints on image. Nevertheless, point-wise interpolation of missing k-space data restricts the use of prior information in k-space-based parallel imaging reconstructions like generalized auto-calibrating partial acquisitions (GRAPPA). In this study, a regularized k-space based parallel imaging reconstruction is presented. We first formulate the reconstruction of missing data within a patch as a linear inverse problem. Instead of exploiting prior information on image or its transform domain, the proposed method exploits the rank deficiency of structured matrix consisting of vectorized patches form entire k-space, which leads to a nuclear norm-regularized problem solved by the numeric algorithms iteratively. Brain imaging studies are performed, demonstrating that the proposed method is capable of mitigating noise at high accelerations in GRAPPA reconstruction.
Parallelizing across time when solving time-dependent partial differential equations
Worley, P.H.
1991-09-01
The standard numerical algorithms for solving time-dependent partial differential equations (PDEs) are inherently sequential in the time direction. This paper describes algorithms for the time-accurate solution of certain classes of linear hyperbolic and parabolic PDEs that can be parallelized in both time and space and have serial complexities that are proportional to the serial complexities of the best known algorithms. The algorithms for parabolic PDEs are variants of the waveform relaxation multigrid method (WFMG) of Lubich and Ostermann where the scalar ordinary differential equations (ODEs) that make up the kernel of WFMG are solved using a cyclic reduction type algorithm. The algorithms for hyperbolic PDEs use the cyclic reduction algorithm to solve ODEs along characteristics. 43 refs.
NASA Technical Reports Server (NTRS)
Hunt, L. R.; Villarreal, Ramiro
1987-01-01
System theorists understand that the same mathematical objects which determine controllability for nonlinear control systems of ordinary differential equations (ODEs) also determine hypoellipticity for linear partial differentail equations (PDEs). Moreover, almost any study of ODE systems begins with linear systems. It is remarkable that Hormander's paper on hypoellipticity of second order linear p.d.e.'s starts with equations due to Kolmogorov, which are shown to be analogous to the linear PDEs. Eigenvalue placement by state feedback for a controllable linear system can be paralleled for a Kolmogorov equation if an appropriate type of feedback is introduced. Results concerning transformations of nonlinear systems to linear systems are similar to results for transforming a linear PDE to a Kolmogorov equation.
Parallel Proportion Fair Scheduling in DAS with Partial Channel State Information
NASA Astrophysics Data System (ADS)
Jiang, Zhanjun; Wu, Jiang; Wang, Dongming; You, Xiaohu
A parallel multiplexing scheduling (PMS) scheme is proposed for distributed antenna systems (DAS), which greatly improves average system throughput due to multi-user diversity and multi-user multiplexing. However, PMS has poor fairness because of the use of the “best channel selection” criteria in the scheduler. Thus we present a parallel proportional fair scheduling (PPFS) scheme, which combines PMS with proportional fair scheduling (PFS) to achieve a tradeoff between average throughput and fairness. In PPFS, the “relative signal to noise ratio (SNR)” is employed as a metric to select the user instead of the “relative throughput” in the original PFS. And only partial channel state information (CSI) is fed back to the base station (BS) in PPFS. Moreover, there are multiple users selected to transmit simultaneously at each slot in PPFS, while only one user occupies all channel resources at each slot in PFS. Consequently, PPFS improves fairness performance of PMS greatly with a relatively small loss of average throughput compared to PFS.
Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations
NASA Technical Reports Server (NTRS)
Fijany, Amir
1993-01-01
In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.
A Partial Order Reduction Technique for Parallel Timed Automaton Model Checking
NASA Astrophysics Data System (ADS)
Jianhua, Zhao; Linzhang, Wang; Xuandong, Li
We propose a partial order reduction technique for timed automaton model checking in this paper. We first show that the symbolic successors w.r.t. partial order paths can be computed using DBMs. An algorithm is presented to compute such successors incrementally. This algorithm can avoid splitting the symbolic states because of the enumeration order of independent transitions. A reachability analysis algorithm based on this successor computation algorithm is presented. Our technique can be combined with some static analysis techniques in the literate. Further more, we present a rule to avoid exploring all enabled transitions, thus the space requirements of model checking are further reduced.
Choongsang Cho; Sangkeun Lee
2016-04-01
Image smoothing has been used for image segmentation, image reconstruction, object classification, and 3D content generation. Several smoothing approaches have been used at the pre-processing step to retain the critical edge, while removing noise and small details. However, they have limited performance, especially in removing small details and smoothing discrete regions. Therefore, to provide fast and accurate smoothing, we propose an effective scheme that uses a weighted combination of the gradient, Laplacian, and diagonal derivatives of a smoothed image. In addition, to reduce computational complexity, we designed and implemented a parallel processing structure for the proposed scheme on a graphics processing unit (GPU). For an objective evaluation of the smoothing performance, the images were linearly quantized into several layers to generate experimental images, and the quantized images were smoothed using several methods for reconstructing the smoothly changed shape and intensity of the original image. Experimental results showed that the proposed scheme has higher objective scores and better successful smoothing performance than similar schemes, while preserving and removing critical and trivial details, respectively. For computational complexity, the proposed smoothing scheme running on a GPU provided 18 and 16 times lower complexity than the proposed smoothing scheme running on a CPU and the L0-based smoothing scheme, respectively. In addition, a simple noise reduction test was conducted to show the characteristics of the proposed approach; it reported that the presented algorithm outperforms the state-of-the art algorithms by more than 5.4 dB. Therefore, we believe that the proposed scheme can be a useful tool for efficient image smoothing. PMID:26886985
NASA Astrophysics Data System (ADS)
Acebrón, Juan A.; Rodríguez-Rozas, Ángel
2011-09-01
A probabilistic representation for initial value semilinear parabolic problems based on generalized random trees has been derived. Two different strategies have been proposed, both requiring generating suitable random trees combined with a Pade approximant for approximating accurately a given divergent series. Such series are obtained by summing the partial contribution to the solution coming from trees with arbitrary number of branches. The new representation greatly expands the class of problems amenable to be solved probabilistically, and was used successfully to develop a generalized probabilistic domain decomposition method. Such a method has been shown to be suited for massively parallel computers, enjoying full scalability and fault tolerance. Finally, a few numerical examples are given to illustrate the remarkable performance of the algorithm, comparing the results with those obtained with a classical method.
Technology Transfer Automated Retrieval System (TEKTRAN)
Autocalibration of a water quality model such as SWAT (Soil and Water Assessment Tool) can be a powerful, labor-saving tool. When multi-gage or multi-pollutant calibration is desired, autocalibration is essential because the time involved in manual calibration becomes prohibitive. The ArcSWAT Interf...
Prototype of an auto-calibrating, context-aware, hybrid brain-computer interface.
Faller, J; Torrellas, S; Miralles, F; Holzner, C; Kapeller, C; Guger, C; Bund, J; Müller-Putz, G R; Scherer, R
2012-01-01
We present the prototype of a context-aware framework that allows users to control smart home devices and to access internet services via a Hybrid BCI system of an auto-calibrating sensorimotor rhythm (SMR) based BCI and another assistive device (Integra Mouse mouth joystick). While there is extensive literature that describes the merit of Hybrid BCIs, auto-calibrating and co-adaptive ERD BCI training paradigms, specialized BCI user interfaces, context-awareness and smart home control, there is up to now, no system that includes all these concepts in one integrated easy-to-use framework that can truly benefit individuals with severe functional disabilities by increasing independence and social inclusion. Here we integrate all these technologies in a prototype framework that does not require expert knowledge or excess time for calibration. In a first pilot-study, 3 healthy volunteers successfully operated the system using input signals from an ERD BCI and an Integra Mouse and reached average positive predictive values (PPV) of 72 and 98% respectively. Based on what we learned here we are planning to improve the system for a test with a larger number of healthy volunteers so we can soon bring the system to benefit individuals with severe functional disability. PMID:23366267
Sparse Auto-Calibration for Radar Coincidence Imaging with Gain-Phase Errors
Zhou, Xiaoli; Wang, Hongqiang; Cheng, Yongqiang; Qin, Yuliang
2015-01-01
Radar coincidence imaging (RCI) is a high-resolution staring imaging technique without the limitation of relative motion between target and radar. The sparsity-driven approaches are commonly used in RCI, while the prior knowledge of imaging models needs to be known accurately. However, as one of the major model errors, the gain-phase error exists generally, and may cause inaccuracies of the model and defocus the image. In the present report, the sparse auto-calibration method is proposed to compensate the gain-phase error in RCI. The method can determine the gain-phase error as part of the imaging process. It uses an iterative algorithm, which cycles through steps of target reconstruction and gain-phase error estimation, where orthogonal matching pursuit (OMP) and Newton’s method are used, respectively. Simulation results show that the proposed method can improve the imaging quality significantly and estimate the gain-phase error accurately. PMID:26528981
Gerdes, Lee; Gerdes, Peter; Lee, Sung W; H Tegeler, Charles
2013-03-01
Disturbances of neural oscillation patterns have been reported with many disease states. We introduce methodology for HIRREM™ (high-resolution, relational, resonance-based electroencephalic mirroring), also known as Brainwave Optimization™, a noninvasive technology to facilitate relaxation and auto-calibration of neural oscillations. HIRREM is a precision-guided technology for allostatic therapeutics, intended to help the brain calibrate its own functional set points to optimize fitness. HIRREM technology collects electroencephalic data through two-channel recordings and delivers a series of audible musical tones in near real time. Choices of tone pitch and timing are made by mathematical algorithms, principally informed by the dominant frequency in successive instants of time, to permit resonance between neural oscillatory frequencies and the musical tones. Relaxation of neural oscillations through HIRREM appears to permit auto-calibration toward greater hemispheric symmetry and more optimized proportionation of regional spectral power. To illustrate an application of HIRREM, we present data from a randomized clinical trial of HIRREM as an intervention for insomnia (n = 19). On average, there was reduction of right-dominant temporal lobe high-frequency (23-36 Hz) EEG asymmetry over the course of eight successive HIRREM sessions. There was a trend for correlation between reduction of right temporal lobe dominance and magnitude of insomnia symptom reduction. Disturbances of neural oscillation have implications for both neuropsychiatric health and downstream peripheral (somatic) physiology. The possibility of noninvasive optimization for neural oscillatory set points through HIRREM suggests potentially multitudinous roles for this technology. Research is currently ongoing to further explore its potential applications and mechanisms of action. PMID:23532171
Gerdes, Lee; Gerdes, Peter; Lee, Sung W; H Tegeler, Charles
2013-01-01
Disturbances of neural oscillation patterns have been reported with many disease states. We introduce methodology for HIRREM™ (high-resolution, relational, resonance-based electroencephalic mirroring), also known as Brainwave Optimization™, a noninvasive technology to facilitate relaxation and auto-calibration of neural oscillations. HIRREM is a precision-guided technology for allostatic therapeutics, intended to help the brain calibrate its own functional set points to optimize fitness. HIRREM technology collects electroencephalic data through two-channel recordings and delivers a series of audible musical tones in near real time. Choices of tone pitch and timing are made by mathematical algorithms, principally informed by the dominant frequency in successive instants of time, to permit resonance between neural oscillatory frequencies and the musical tones. Relaxation of neural oscillations through HIRREM appears to permit auto-calibration toward greater hemispheric symmetry and more optimized proportionation of regional spectral power. To illustrate an application of HIRREM, we present data from a randomized clinical trial of HIRREM as an intervention for insomnia (n = 19). On average, there was reduction of right-dominant temporal lobe high-frequency (23–36 Hz) EEG asymmetry over the course of eight successive HIRREM sessions. There was a trend for correlation between reduction of right temporal lobe dominance and magnitude of insomnia symptom reduction. Disturbances of neural oscillation have implications for both neuropsychiatric health and downstream peripheral (somatic) physiology. The possibility of noninvasive optimization for neural oscillatory set points through HIRREM suggests potentially multitudinous roles for this technology. Research is currently ongoing to further explore its potential applications and mechanisms of action. PMID:23532171
Huang, Feng; Lin, Wei; Duensing, George R; Reykowski, Arne
2012-09-01
Because dynamic MR images are often sparse in x-f domain, k-t space compressed sensing (k-t CS) has been proposed for highly accelerated dynamic MRI. When a multichannel coil is used for acquisition, the combination of partially parallel imaging and k-t CS can improve the accuracy of reconstruction. In this work, an efficient combination method is presented, which is called k-t sparse Generalized GRAPPA fOr Wider readout Line. One fundamental aspect of this work is to apply partially parallel imaging and k-t CS sequentially. A partially parallel imaging technique using a Generalized GRAPPA fOr Wider readout Line operator is adopted before k-t CS reconstruction to decrease the reduction factor in a computationally efficient way while preserving temporal resolution. Channel combination and relative sensitivity maps are used in the flexible virtual coil scheme to alleviate the k-t CS computational load with increasing number of channels. Using k-t FOCUSS as a specific example of k-t CS, the experiments with Cartesian and radial data sets demonstrate that k-t sparse Generalized GRAPPA fOr Wider readout Line can produce results with two times lower root-mean-square error than conventional channel-by-channel k-t CS while consuming up to seven times less computational cost. PMID:22162191
NASA Astrophysics Data System (ADS)
Vijayalekshmy, S.; Rama Iyer, S.; Beevi, Bisharathu
2015-09-01
The output power from the photovoltaic (PV) array decreases and the array exhibit multiple peaks when it is subjected to partial shading (PS). The power loss in the PV array varies with the array configuration, physical location and the shading pattern. This paper compares the relative performance of a PV array consisting of a short string of three PV modules for two different configurations. The mismatch loss, shading loss, fill factor and the power loss due to the failure in tracking of the global maximum power point, of a series string with bypass diodes and short parallel string are analysed using MATLAB/Simulink model. The performance of the system is investigated for three different conditions of solar insolation for the same shading pattern. Results indicate that there is considerable power loss due to shading in a series string during PS than in a parallel string with same number of modules.
Auto-Calibration of SOL-ACES in the EUV Spectral Region
NASA Astrophysics Data System (ADS)
Schmidtke, G.; Brunner, R.; Eberhard, D.; Hofmann, A.; Klocke, U.; Knothe, M.; Konz, W.; Riedel, W.-J.; Wolf, H.
The Sol-ACES (SOLAR Auto-Calibrating EUV/UV Spectrometers) experiment is prepared to be flown with the ESA SOLAR payload to the International Space Station as planned for the Shuttle mission E1 in August 2006. Four grazing incidence spectrometers of planar geometry cover the wavelength range from 16-220 nm with a spectral resolution from 0.5-2.3 nm. These high-efficiency spectrometers will be re-calibrated by two three-signal ionization chambers to be operated with 44 band pass filters on routine during the mission. Re-measuring the filter transmissions with the spectrometers also allows a very accurate determination of the changing second (optical) order efficiencies of the spectrometers as well as the stray light contributions to the spectral recording in different wavelength ranges. In this context the primary requirements for measurements of high radiometric accuracy will be discussed in detail. - The absorption gases of the ionization chambers are neon, xenon and a mixture of 10 % nitric oxide and 90 % xenon. As the laboratory measurements show that by this method secondary effects can be determined to a high degree resulting in very accurate irradiance measurements that is ranging from 5 to 3 % in absolute terms depending on the wavelegth range.
NASA Astrophysics Data System (ADS)
Ma, Sangback
In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The
Automatic High-Bandwidth Calibration and Reconstruction of Arbitrarily Sampled Parallel MRI
Aelterman, Jan; Naeyaert, Maarten; Gutierrez, Shandra; Luong, Hiep; Goossens, Bart; Pižurica, Aleksandra; Philips, Wilfried
2014-01-01
Today, many MRI reconstruction techniques exist for undersampled MRI data. Regularization-based techniques inspired by compressed sensing allow for the reconstruction of undersampled data that would lead to an ill-posed reconstruction problem. Parallel imaging enables the reconstruction of MRI images from undersampled multi-coil data that leads to a well-posed reconstruction problem. Autocalibrating pMRI techniques encompass pMRI techniques where no explicit knowledge of the coil sensivities is required. A first purpose of this paper is to derive a novel autocalibration approach for pMRI that allows for the estimation and use of smooth, but high-bandwidth coil profiles instead of a compactly supported kernel. These high-bandwidth models adhere more accurately to the physics of an antenna system. The second purpose of this paper is to demonstrate the feasibility of a parameter-free reconstruction algorithm that combines autocalibrating pMRI and compressed sensing. Therefore, we present several techniques for automatic parameter estimation in MRI reconstruction. Experiments show that a higher reconstruction accuracy can be had using high-bandwidth coil models and that the automatic parameter choices yield an acceptable result. PMID:24915203
van Hees, Vincent T.; Fang, Zhou; Langford, Joss; Assah, Felix; Mohammad, Anwar; da Silva, Inacio C. M.; Trenell, Michael I.; White, Tom; Wareham, Nicholas J.
2014-01-01
Wearable acceleration sensors are increasingly used for the assessment of free-living physical activity. Acceleration sensor calibration is a potential source of error. This study aims to describe and evaluate an autocalibration method to minimize calibration error using segments within the free-living records (no extra experiments needed). The autocalibration method entailed the extraction of nonmovement periods in the data, for which the measured vector magnitude should ideally be the gravitational acceleration (1 g); this property was used to derive calibration correction factors using an iterative closest-point fitting process. The reduction in calibration error was evaluated in data from four cohorts: UK (n = 921), Kuwait (n = 120), Cameroon (n = 311), and Brazil (n = 200). Our method significantly reduced calibration error in all cohorts (P < 0.01), ranging from 16.6 to 3.0 mg in the Kuwaiti cohort to 76.7 to 8.0 mg error in the Brazil cohort. Utilizing temperature sensor data resulted in a small nonsignificant additional improvement (P > 0.05). Temperature correction coefficients were highest for the z-axis, e.g., 19.6-mg offset per 5°C. Further, application of the autocalibration method had a significant impact on typical metrics used for describing human physical activity, e.g., in Brazil average wrist acceleration was 0.2 to 51% lower than uncalibrated values depending on metric selection (P < 0.01). The autocalibration method as presented helps reduce the calibration error in wearable acceleration sensor data and improves comparability of physical activity measures across study locations. Temperature ultization seems essential when temperature deviates substantially from the average temperature in the record but not for multiday summary measures. PMID:25103964
van Hees, Vincent T; Fang, Zhou; Langford, Joss; Assah, Felix; Mohammad, Anwar; da Silva, Inacio C M; Trenell, Michael I; White, Tom; Wareham, Nicholas J; Brage, Søren
2014-10-01
Wearable acceleration sensors are increasingly used for the assessment of free-living physical activity. Acceleration sensor calibration is a potential source of error. This study aims to describe and evaluate an autocalibration method to minimize calibration error using segments within the free-living records (no extra experiments needed). The autocalibration method entailed the extraction of nonmovement periods in the data, for which the measured vector magnitude should ideally be the gravitational acceleration (1 g); this property was used to derive calibration correction factors using an iterative closest-point fitting process. The reduction in calibration error was evaluated in data from four cohorts: UK (n = 921), Kuwait (n = 120), Cameroon (n = 311), and Brazil (n = 200). Our method significantly reduced calibration error in all cohorts (P < 0.01), ranging from 16.6 to 3.0 mg in the Kuwaiti cohort to 76.7 to 8.0 mg error in the Brazil cohort. Utilizing temperature sensor data resulted in a small nonsignificant additional improvement (P > 0.05). Temperature correction coefficients were highest for the z-axis, e.g., 19.6-mg offset per 5°C. Further, application of the autocalibration method had a significant impact on typical metrics used for describing human physical activity, e.g., in Brazil average wrist acceleration was 0.2 to 51% lower than uncalibrated values depending on metric selection (P < 0.01). The autocalibration method as presented helps reduce the calibration error in wearable acceleration sensor data and improves comparability of physical activity measures across study locations. Temperature ultization seems essential when temperature deviates substantially from the average temperature in the record but not for multiday summary measures. PMID:25103964
Kaufmann, Tobias; Völker, Stefan; Gunesch, Laura; Kübler, Andrea
2012-01-01
Brain–computer interfaces (BCI) based on event-related potentials (ERP) allow for selection of characters from a visually presented character-matrix and thus provide a communication channel for users with neurodegenerative disease. Although they have been topic of research for more than 20 years and were multiply proven to be a reliable communication method, BCIs are almost exclusively used in experimental settings, handled by qualified experts. This study investigates if ERP–BCIs can be handled independently by laymen without expert support, which is inevitable for establishing BCIs in end-user’s daily life situations. Furthermore we compared the classic character-by-character text entry against a predictive text entry (PTE) that directly incorporates predictive text into the character-matrix. N = 19 BCI novices handled a user-centered ERP–BCI application on their own without expert support. The software individually adjusted classifier weights and control parameters in the background, invisible to the user (auto-calibration). All participants were able to operate the software on their own and to twice correctly spell a sentence with the auto-calibrated classifier (once with PTE, once without). Our PTE increased spelling speed and, importantly, did not reduce accuracy. In sum, this study demonstrates feasibility of auto-calibrating ERP–BCI use, independently by laymen and the strong benefit of integrating predictive text directly into the character-matrix. PMID:22833713
Brauck, Katja; Maderwald, Stefan; Vogt, Florian M; Zenge, Michael; Barkhausen, Jörg; Herborn, Christoph U
2007-01-01
We sought to compare a three-dimensional, contrast-enhanced, magnetic resonance angiogram (3D CE MRA) sequence combining parallel-imaging (generalised autocalibrating partially parallel acquisitions (GRAPPA)) with a time-resolved echo-shared angiographic technique (TREAT) in an intraindividual comparison to a standard 3D MRA sequence. Four healthy volunteers (27-32 years), and 11 patients (11-82 years) with vascular pathologies of the hand were examined on a 1.5-Tesla (T) MR system (Magnetom Avanto, Siemens, Erlangen, Germany) using two multichannel receiver coils. Following automatic injection (flow rate 2.5 cc/s) of 0.1 mmol/kg gadoterate (Dotarem, Guerbet, Roissy, France), 32 consecutive 3D data sets were collected with the TREAT sequence (TR/TE: 4.02/1.31 ms, FA: 10 degrees, GRAPPA acceleration factor: R=2, TREAT factor: 5, voxel size: 1.0 x 0.7 x 1.3 mm(3)) and a T1-wwighted 3D gradient-echo sequence (TR/TE: 5.3/1.57 ms, FA: 30 degrees, GRAPPA acceleration factor: 2, voxel size: 0.71 x 0.71 x 0.71 mm(3,)). MR data sets were evaluated and compared for image quality and visualisation of vascular details. In the volunteer group, all MR imaging was successful while technical problems prevented acquisition of the standard protocol in two patients. For the corresponding segments, the number of visible segments was equal on both sequences. Overall image quality was significantly better on the standard protocol than on the TREAT protocol. TREAT MRA provided functional information in lesions with rapid blood flow, e.g. detection of feeding and draining vessels in an haemangioma. TREAT-MRA is a robust technique that combines morphological and functional information of the hand vasculature and deals with the very special physiological demands of vascular lesions, such as quick arteriovenous transit time. PMID:16710664
NASA Astrophysics Data System (ADS)
Zhang, Y. Y.; Shao, Q. X.; Ye, A. Z.; Xing, H. T.; Xia, J.
2016-02-01
Integrated water system modeling is a feasible approach to understanding severe water crises in the world and promoting the implementation of integrated river basin management. In this study, a classic hydrological model (the time variant gain model: TVGM) was extended to an integrated water system model by coupling multiple water-related processes in hydrology, biogeochemistry, water quality, and ecology, and considering the interference of human activities. A parameter analysis tool, which included sensitivity analysis, autocalibration and model performance evaluation, was developed to improve modeling efficiency. To demonstrate the model performances, the Shaying River catchment, which is the largest highly regulated and heavily polluted tributary of the Huai River basin in China, was selected as the case study area. The model performances were evaluated on the key water-related components including runoff, water quality, diffuse pollution load (or nonpoint sources) and crop yield. Results showed that our proposed model simulated most components reasonably well. The simulated daily runoff at most regulated and less-regulated stations matched well with the observations. The average correlation coefficient and Nash-Sutcliffe efficiency were 0.85 and 0.70, respectively. Both the simulated low and high flows at most stations were improved when the dam regulation was considered. The daily ammonium-nitrogen (NH4-N) concentration was also well captured with the average correlation coefficient of 0.67. Furthermore, the diffuse source load of NH4-N and the corn yield were reasonably simulated at the administrative region scale. This integrated water system model is expected to improve the simulation performances with extension to more model functionalities, and to provide a scientific basis for the implementation in integrated river basin managements.
Tegeler, Charles H.; Tegeler, Catherine L.; Cook, Jared F.; Lee, Sung W.; Pajewski, Nicholas M.
2015-01-01
Abstract Objective Increased amplitudes in high-frequency brain electrical activity are reported with menopausal hot flashes. We report outcomes associated with the use of High-resolution, relational, resonance-based, electroencephalic mirroring—a noninvasive neurotechnology for autocalibration of neural oscillations—by women with perimenopausal and postmenopausal hot flashes. Methods Twelve women with hot flashes (median age, 56 y; range, 46-69 y) underwent a median of 13 (range, 8-23) intervention sessions for a median of 9.5 days (range, 4-32). This intervention uses algorithmic analysis of brain electrical activity and near real-time translation of brain frequencies into variable tones for acoustic stimulation. Hot flash frequency and severity were recorded by daily diary. Primary outcomes included hot flash severity score, sleep, and depressive symptoms. High-frequency amplitudes (23-36 Hz) from bilateral temporal scalp recordings were measured at baseline and during serial sessions. Self-reported symptom inventories for sleep and depressive symptoms were collected. Results The median change in hot flash severity score was −0.97 (range, −3.00 to 1.00; P = 0.015). Sleep and depression scores decreased by −8.5 points (range, −20 to −1; P = 0.022) and −5.5 points (range, −32 to 8; P = 0.015), respectively. The median sum of amplitudes for the right and left temporal high-frequency brain electrical activity was 8.44 μV (range, 6.27-16.66) at baseline and decreased by a median of −2.96 μV (range, −11.05 to −0.65; P = 0.0005) by the final session. Conclusions Hot flash frequency and severity, symptoms of insomnia and depression, and temporal high-frequency brain electrical activity decrease after High-resolution, relational, resonance-based, electroencephalic mirroring. Larger controlled trials with longer follow-up are warranted. PMID:25668305
Krogh, M.; Painter, J.; Hansen, C.
1996-10-01
Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the M.
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Krogh, M.; Hansen, C.; Painter, J.; de Verdiere, G.C.
1995-05-01
Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.
Wald, Ingo; Ize, Santiago
2015-07-28
Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
Massively parallel visualization: Parallel rendering
Hansen, C.D.; Krogh, M.; White, W.
1995-12-01
This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.
Parallel machines: Parallel machine languages
Iannucci, R.A. )
1990-01-01
This book presents a framework for understanding the tradeoffs between the conventional view and the dataflow view with the objective of discovering the critical hardware structures which must be present in any scalable, general-purpose parallel computer to effectively tolerate latency and synchronization costs. The author presents an approach to scalable general purpose parallel computation. Linguistic Concerns, Compiling Issues, Intermediate Language Issues, and hardware/technological constraints are presented as a combined approach to architectural Develoement. This book presents the notion of a parallel machine language.
Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.
1995-09-01
In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.
Gorda, B.C.
1992-09-01
Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer's task easier.
Gorda, B.C.
1992-09-01
Data locality is fundamental to performance on distributed memory parallel architectures. Application programmers know this well and go to great pains to arrange data for optimal performance. Data Parallelism, a model from the Single Instruction Multiple Data (SIMD) architecture, is finding a new home on the Multiple Instruction Multiple Data (MIMD) architectures. This style of programming, distinguished by taking the computation to the data, is what programmers have been doing by hand for a long time. Recent work in this area holds the promise of making the programmer`s task easier.
Little, J.J.; Poggio, T.; Gamble, E.B. Jr.
1988-01-01
Computer algorithms have been developed for early vision processes that give separate cues to the distance from the viewer of three-dimensional surfaces, their shape, and their material properties. The MIT Vision Machine is a computer system that integrates several early vision modules to achieve high-performance recognition and navigation in unstructured environments. It is also an experimental environment for theoretical progress in early vision algorithms, their parallel implementation, and their integration. The Vision Machine consists of a movable, two-camera Eye-Head input device and an 8K Connection Machine. The authors have developed and implemented several parallel early vision algorithms that compute edge detection, stereopsis, motion, texture, and surface color in close to real time. The integration stage, based on coupled Markov random field models, leads to a cartoon-like map of the discontinuities in the scene, with partial labeling of the brightness edges in terms of their physical origin.
Parallel Information Processing.
ERIC Educational Resources Information Center
Rasmussen, Edie M.
1992-01-01
Examines parallel computer architecture and the use of parallel processors for text. Topics discussed include parallel algorithms; performance evaluation; parallel information processing; parallel access methods for text; parallel and distributed information retrieval systems; parallel hardware for text; and network models for information…
Wagner, J Y; Langemann, M; Schön, G; Kluge, S; Reuter, D A; Saugel, B
2016-05-01
The T-Line(®) system (Tensys(®) Medical Inc., San Diego, CA, USA) non-invasively estimates cardiac output (CO) using autocalibrating pulse contour analysis of the radial artery applanation tonometry-derived arterial waveform. We compared T-Line CO measurements (TL-CO) with invasively obtained CO measurements using transpulmonary thermodilution (TDCO) and calibrated pulse contour analysis (PC-CO) in patients after major gastrointestinal surgery. We compared 1) TL-CO versus TD-CO and 2) TL-CO versus PC-CO in 27 patients treated in the intensive care unit (ICU) after major gastrointestinal surgery. For the assessment of TD-CO and PC-CO we used the PiCCO(®) system (Pulsion Medical Systems SE, Feldkirchen, Germany). Per patient, we compared two sets of TD-CO and 30 minutes of PC-CO measurements with the simultaneously recorded TL-CO values using Bland-Altman analysis. The mean of differences (± standard deviation; 95% limits of agreement) between TL-CO and TD-CO was -0.8 (±1.6; -4.0 to +2.3) l/minute with a percentage error of 45%. For TL-CO versus PC-CO, we observed a mean of differences of -0.4 (±1.5; -3.4 to +2.5) l/minute with a percentage error of 43%. In ICU patients after major gastrointestinal surgery, continuous non-invasive CO measurement based on autocalibrating pulse contour analysis of the radial artery applanation tonometry-derived arterial waveform (TL-CO) is feasible in a clinical study setting. However, the agreement of TL-CO with TD-CO and PC-CO observed in our study indicates that further improvements are needed before the technology can be recommended for clinical use in these patients. PMID:27246932
2011-01-01
Introduction About 3% of people will be diagnosed with epilepsy during their lifetime, but about 70% of people with epilepsy eventually go into remission. Methods and outcomes We conducted a systematic review and aimed to answer the following clinical questions: What are the effects of starting antiepileptic drug treatment following a single seizure? What are the effects of drug monotherapy in people with partial epilepsy? What are the effects of additional drug treatments in people with drug-resistant partial epilepsy? What is the risk of relapse in people in remission when withdrawing antiepileptic drugs? What are the effects of behavioural and psychological treatments for people with epilepsy? What are the effects of surgery in people with drug-resistant temporal lobe epilepsy? We searched: Medline, Embase, The Cochrane Library, and other important databases up to July 2009 (Clinical Evidence reviews are updated periodically; please check our website for the most up-to-date version of this review). We included harms alerts from relevant organisations such as the US Food and Drug Administration (FDA) and the UK Medicines and Healthcare products Regulatory Agency (MHRA). Results We found 83 systematic reviews, RCTs, or observational studies that met our inclusion criteria. We performed a GRADE evaluation of the quality of evidence for interventions. Conclusions In this systematic review we present information relating to the effectiveness and safety of the following interventions: antiepileptic drugs after a single seizure; monotherapy for partial epilepsy using carbamazepine, gabapentin, lamotrigine, levetiracetam, phenobarbital, phenytoin, sodium valproate, or topiramate; addition of second-line drugs for drug-resistant partial epilepsy (allopurinol, eslicarbazepine, gabapentin, lacosamide, lamotrigine, levetiracetam, losigamone, oxcarbazepine, retigabine, tiagabine, topiramate, vigabatrin, or zonisamide); antiepileptic drug withdrawal for people with partial or
Li, Shu; Chan, Cheong; Stockmann, Jason P.; Tagare, Hemant; Adluru, Ganesh; Tam, Leo K.; Galiana, Gigi; Constable, R. Todd; Kozerke, Sebastian; Peters, Dana C.
2014-01-01
Purpose To investigate algebraic reconstruction technique (ART) for parallel imaging reconstruction of radial data, applied to accelerated cardiac cine. Methods A GPU-accelerated ART reconstruction was implemented and applied to simulations, point spread functions (PSF) and in twelve subjects imaged with radial cardiac cine acquisitions. Cine images were reconstructed with radial ART at multiple undersampling levels (192 Nr x Np = 96 to 16). Images were qualitatively and quantitatively analyzed for sharpness and artifacts, and compared to filtered back-projection (FBP), and conjugate gradient SENSE (CG SENSE). Results Radial ART provided reduced artifacts and mainly preserved spatial resolution, for both simulations and in vivo data. Artifacts were qualitatively and quantitatively less with ART than FBP using 48, 32, and 24 Np, although FBP provided quantitatively sharper images at undersampling levels of 48-24 Np (all p<0.05). Use of undersampled radial data for generating auto-calibrated coil-sensitivity profiles resulted in slightly reduced quality. ART was comparable to CG SENSE. GPU-acceleration increased ART reconstruction speed 15-fold, with little impact on the images. Conclusion GPU-accelerated ART is an alternative approach to image reconstruction for parallel radial MR imaging, providing reduced artifacts while mainly maintaining sharpness compared to FBP, as shown by its first application in cardiac studies. PMID:24753213
Tolerant (parallel) Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Special parallel processing workshop
1994-12-01
This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.
Iterative algorithms for large sparse linear systems on parallel computers
NASA Technical Reports Server (NTRS)
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Parallel rendering techniques for massively parallel visualization
Hansen, C.; Krogh, M.; Painter, J.
1995-07-01
As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.
Parallel algorithms and architectures
Albrecht, A.; Jung, H.; Mehlhorn, K.
1987-01-01
Contents of this book are the following: Preparata: Deterministic simulation of idealized parallel computers on more realistic ones; Convex hull of randomly chosen points from a polytope; Dataflow computing; Parallel in sequence; Towards the architecture of an elementary cortical processor; Parallel algorithms and static analysis of parallel programs; Parallel processing of combinatorial search; Communications; An O(nlogn) cost parallel algorithms for the single function coarsest partition problem; Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region; and RELACS - A recursive layout computing system. Parallel linear conflict-free subtree access.
On the parallel solution of parabolic equations
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Youcef
1989-01-01
Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
Conflict-cost based random sampling design for parallel MRI with low rank constraints
NASA Astrophysics Data System (ADS)
Kim, Wan; Zhou, Yihang; Lyu, Jingyuan; Ying, Leslie
2015-05-01
In compressed sensing MRI, it is very important to design sampling pattern for random sampling. For example, SAKE (simultaneous auto-calibrating and k-space estimation) is a parallel MRI reconstruction method using random undersampling. It formulates image reconstruction as a structured low-rank matrix completion problem. Variable density (VD) Poisson discs are typically adopted for 2D random sampling. The basic concept of Poisson disc generation is to guarantee samples are neither too close to nor too far away from each other. However, it is difficult to meet such a condition especially in the high density region. Therefore the sampling becomes inefficient. In this paper, we present an improved random sampling pattern for SAKE reconstruction. The pattern is generated based on a conflict cost with a probability model. The conflict cost measures how many dense samples already assigned are around a target location, while the probability model adopts the generalized Gaussian distribution which includes uniform and Gaussian-like distributions as special cases. Our method preferentially assigns a sample to a k-space location with the least conflict cost on the circle of the highest probability. To evaluate the effectiveness of the proposed random pattern, we compare the performance of SAKEs using both VD Poisson discs and the proposed pattern. Experimental results for brain data show that the proposed pattern yields lower normalized mean square error (NMSE) than VD Poisson discs.
Calibrationless Parallel Imaging Reconstruction Based on Structured Low-Rank Matrix Completion
Shin, Peter J.; Larson, Peder E.Z.; Ohliger, Michael A.; Elad, Michael; Pauly, John M.; Vigneron, Daniel B.; Lustig, Michael
2013-01-01
Purpose A calibrationless parallel imaging reconstruction method, termed simultaneous auto-calibrating and k-space estimation (SAKE), is presented. It is a data-driven, coil-by-coil reconstruction method that does not require a separate calibration step for estimating coil sensitivity information. Methods In SAKE, an under-sampled multi-channel dataset is structured into a single data matrix. Then the reconstruction is formulated as a structured low-rank matrix completion problem. An iterative solution that implements a projection-onto-sets algorithm with singular value thresholding is described. Results Reconstruction results are demonstrated for retrospectively and prospectively under-sampled, multi-channel Cartesian data having no calibration signals. Additionally, non-Cartesian data reconstruction is presented. Finally, improved image quality is demonstrated by combining SAKE with wavelet-based compressed sensing. Conclusion As estimation of coil sensitivity information is not needed, the proposed method could potentially benefit MR applications where acquiring accurate calibration data is limiting or not possible at all. PMID:24248734
Parallel solution of partial differential equations by extrapolation methods
Leland, Robert W.; Rollett, J. S.
2015-02-01
We have found, in the ROGE algorithm, an extrapolation process which is robust, effective and practically simple to implement. It removes the difficulty of needing to make a precise estimate of the over-relaxation parameter for Successive Over-Relaxation (SOR) type methods.
NASA Technical Reports Server (NTRS)
Dorband, John E.
1987-01-01
Massively Parallel Processor (MPP) Parallel FORTH is a derivative of FORTH-83 and Unified Software Systems' Uni-FORTH. The extension of FORTH into the realm of parallel processing on the MPP is described. With few exceptions, Parallel FORTH was made to follow the description of Uni-FORTH as closely as possible. Likewise, the parallel FORTH extensions were designed as philosophically similar to serial FORTH as possible. The MPP hardware characteristics, as viewed by the FORTH programmer, is discussed. Then a description is presented of how parallel FORTH is implemented on the MPP.
Parallel flow diffusion battery
Yeh, H.C.; Cheng, Y.S.
1984-01-01
A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
Parallel flow diffusion battery
Yeh, Hsu-Chi; Cheng, Yung-Sung
1984-08-07
A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Parallel genotypic adaptation: when evolution repeats itself
Wood, Troy E.; Burke, John M.; Rieseberg, Loren H.
2008-01-01
Until recently, parallel genotypic adaptation was considered unlikely because phenotypic differences were thought to be controlled by many genes. There is increasing evidence, however, that phenotypic variation sometimes has a simple genetic basis and that parallel adaptation at the genotypic level may be more frequent than previously believed. Here, we review evidence for parallel genotypic adaptation derived from a survey of the experimental evolution, phylogenetic, and quantitative genetic literature. The most convincing evidence of parallel genotypic adaptation comes from artificial selection experiments involving microbial populations. In some experiments, up to half of the nucleotide substitutions found in independent lineages under uniform selection are the same. Phylogenetic studies provide a means for studying parallel genotypic adaptation in non-experimental systems, but conclusive evidence may be difficult to obtain because homoplasy can arise for other reasons. Nonetheless, phylogenetic approaches have provided evidence of parallel genotypic adaptation across all taxonomic levels, not just microbes. Quantitative genetic approaches also suggest parallel genotypic evolution across both closely and distantly related taxa, but it is important to note that this approach cannot distinguish between parallel changes at homologous loci versus convergent changes at closely linked non-homologous loci. The finding that parallel genotypic adaptation appears to be frequent and occurs at all taxonomic levels has important implications for phylogenetic and evolutionary studies. With respect to phylogenetic analyses, parallel genotypic changes, if common, may result in faulty estimates of phylogenetic relationships. From an evolutionary perspective, the occurrence of parallel genotypic adaptation provides increasing support for determinism in evolution and may provide a partial explanation for how species with low levels of gene flow are held together. PMID:15881688
Eclipse Parallel Tools Platform
Watson, Gregory; DeBardeleben, Nathan; Rasmussen, Craig
2005-02-18
Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices, and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures, and basis
Parallel Adaptive Mesh Refinement
Diachin, L; Hornung, R; Plassmann, P; WIssink, A
2005-03-04
As large-scale, parallel computers have become more widely available and numerical models and algorithms have advanced, the range of physical phenomena that can be simulated has expanded dramatically. Many important science and engineering problems exhibit solutions with localized behavior where highly-detailed salient features or large gradients appear in certain regions which are separated by much larger regions where the solution is smooth. Examples include chemically-reacting flows with radiative heat transfer, high Reynolds number flows interacting with solid objects, and combustion problems where the flame front is essentially a two-dimensional sheet occupying a small part of a three-dimensional domain. Modeling such problems numerically requires approximating the governing partial differential equations on a discrete domain, or grid. Grid spacing is an important factor in determining the accuracy and cost of a computation. A fine grid may be needed to resolve key local features while a much coarser grid may suffice elsewhere. Employing a fine grid everywhere may be inefficient at best and, at worst, may make an adequately resolved simulation impractical. Moreover, the location and resolution of fine grid required for an accurate solution is a dynamic property of a problem's transient features and may not be known a priori. Adaptive mesh refinement (AMR) is a technique that can be used with both structured and unstructured meshes to adjust local grid spacing dynamically to capture solution features with an appropriate degree of resolution. Thus, computational resources can be focused where and when they are needed most to efficiently achieve an accurate solution without incurring the cost of a globally-fine grid. Figure 1.1 shows two example computations using AMR; on the left is a structured mesh calculation of a impulsively-sheared contact surface and on the right is the fuselage and volume discretization of an RAH-66 Comanche helicopter [35]. Note the
Parallel Atomistic Simulations
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Visualization and Tracking of Parallel CFD Simulations
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kremenetsky, Mark
1995-01-01
We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Parallel digital forensics infrastructure.
Liebrock, Lorie M.; Duggan, David Patrick
2009-10-01
This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.
Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A.; Seiberlich, Nicole
2015-01-01
Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the under-sampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. PMID:22696125
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan
1994-01-01
A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Parallel preconditioning techniques for sparse CG solvers
Basermann, A.; Reichel, B.; Schelthoff, C.
1996-12-31
Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.
Eclipse Parallel Tools Platform
Energy Science and Technology Software Center (ESTSC)
2005-02-18
Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices,more » and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures
Parallel scheduling algorithms
Dekel, E.; Sahni, S.
1983-01-01
Parallel algorithms are given for scheduling problems such as scheduling to minimize the number of tardy jobs, job sequencing with deadlines, scheduling to minimize earliness and tardiness penalties, channel assignment, and minimizing the mean finish time. The shared memory model of parallel computers is used to obtain fast algorithms. 26 references.
Massively parallel mathematical sieves
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Not Available
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
... Jacksonian seizure; Seizure - partial (focal); Temporal lobe seizure; Epilepsy - partial seizures ... Abou-Khalil BW, Gallagher MJ, Macdonald RL. Epilepsies. In: Daroff RB, ... 6th ed. Philadelphia, PA: Elsevier Saunders; 2012:chap 67. ...
... Jacksonian seizure; Seizure - partial (focal); Temporal lobe seizure; Epilepsy - partial seizures ... Abou-Khalil BW, Gallagher MJ, Macdonald RL. Epilepsies. In: Daroff ... Practice . 7th ed. Philadelphia, PA: Elsevier; 2016:chap 101. ...
NASA Technical Reports Server (NTRS)
Vranish, John M. (Inventor)
2010-01-01
A partial gear bearing including an upper half, comprising peak partial teeth, and a lower, or bottom, half, comprising valley partial teeth. The upper half also has an integrated roller section between each of the peak partial teeth with a radius equal to the gear pitch radius of the radially outwardly extending peak partial teeth. Conversely, the lower half has an integrated roller section between each of the valley half teeth with a radius also equal to the gear pitch radius of the peak partial teeth. The valley partial teeth extend radially inwardly from its roller section. The peak and valley partial teeth are exactly out of phase with each other, as are the roller sections of the upper and lower halves. Essentially, the end roller bearing of the typical gear bearing has been integrated into the normal gear tooth pattern.
Parallel adaptive wavelet collocation method for PDEs
Nejadmalayeri, Alireza; Vezolainen, Alexei; Brown-Dymkoski, Eric; Vasilyev, Oleg V.
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allows fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.
Parallel nearest neighbor calculations
NASA Astrophysics Data System (ADS)
Trease, Harold
We are just starting to parallelize the nearest neighbor portion of our free-Lagrange code. Our implementation of the nearest neighbor reconnection algorithm has not been parallelizable (i.e., we just flip one connection at a time). In this paper we consider what sort of nearest neighbor algorithms lend themselves to being parallelized. For example, the construction of the Voronoi mesh can be parallelized, but the construction of the Delaunay mesh (dual to the Voronoi mesh) cannot because of degenerate connections. We will show our most recent attempt to tessellate space with triangles or tetrahedrons with a new nearest neighbor construction algorithm called DAM (Dial-A-Mesh). This method has the characteristics of a parallel algorithm and produces a better tessellation of space than the Delaunay mesh. Parallel processing is becoming an everyday reality for us at Los Alamos. Our current production machines are Cray YMPs with 8 processors that can run independently or combined to work on one job. We are also exploring massive parallelism through the use of two 64K processor Connection Machines (CM2), where all the processors run in lock step mode. The effective application of 3-D computer models requires the use of parallel processing to achieve reasonable "turn around" times for our calculations.
Bilingual parallel programming
Foster, I.; Overbeek, R.
1990-01-01
Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach provides and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.
Tai, H.M.; Saeks, R.
1984-03-01
A relaxation algorithm for solving large-scale system simulation problems in parallel is proposed. The algorithm, which is composed of both a time-step parallel algorithm and a component-wise parallel algorithm, is described. The interconnected nature of the system, which is characterized by the component connection model, is fully exploited by this approach. A technique for finding an optimal number of the time steps is also described. Finally, this algorithm is illustrated via several examples in which the possible trade-offs between the speed-up ratio, efficiency, and waiting time are analyzed.
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
NASA Astrophysics Data System (ADS)
2014-10-01
Adam Nelson and Stuart Warriner, from the University of Leeds, talk with Nature Chemistry about their work to develop viable synthetic strategies for preparing new chemical structures in parallel with the identification of desirable biological activity.
ERIC Educational Resources Information Center
Rogers, Pat
1972-01-01
Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)
Simplified Parallel Domain Traversal
Erickson III, David J
2011-01-01
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep by performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.
Partitioning and parallel radiosity
NASA Astrophysics Data System (ADS)
Merzouk, S.; Winkler, C.; Paul, J. C.
1996-03-01
This paper proposes a theoretical framework, based on domain subdivision for parallel radiosity. Moreover, three various implementation approaches, taking advantage of partitioning algorithms and global shared memory architecture, are presented.
Scalable parallel communications
NASA Technical Reports Server (NTRS)
Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.
1992-01-01
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth
NASA Technical Reports Server (NTRS)
Reif, John H.
1987-01-01
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
Continuous parallel coordinates.
Heinrich, Julian; Weiskopf, Daniel
2009-01-01
Typical scientific data is represented on a grid with appropriate interpolation or approximation schemes,defined on a continuous domain. The visualization of such data in parallel coordinates may reveal patterns latently contained in the data and thus can improve the understanding of multidimensional relations. In this paper, we adopt the concept of continuous scatterplots for the visualization of spatially continuous input data to derive a density model for parallel coordinates. Based on the point-line duality between scatterplots and parallel coordinates, we propose a mathematical model that maps density from a continuous scatterplot to parallel coordinates and present different algorithms for both numerical and analytical computation of the resulting density field. In addition, we show how the 2-D model can be used to successively construct continuous parallel coordinates with an arbitrary number of dimensions. Since continuous parallel coordinates interpolate data values within grid cells, a scalable and dense visualization is achieved, which will be demonstrated for typical multi-variate scientific data. PMID:19834230
Parallel algorithms for the spectral transform method
Foster, I.T.; Worley, P.H.
1994-04-01
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, we describe these different parallel algorithms and report on computational experiments that we have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations or a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. We focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional FFTs and other parallel transforms.
Parallel architectures for iterative methods on adaptive, block structured grids
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1983-01-01
A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Parallel time integration software
Energy Science and Technology Software Center (ESTSC)
2014-07-01
This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds mustmore » come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.« less
Parallel time integration software
2014-07-01
This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A
2014-05-20
An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.
Bailey, David H.
2009-11-15
The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, although the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage
Adaptive parallel logic networks
NASA Technical Reports Server (NTRS)
Martinez, Tony R.; Vidal, Jacques J.
1988-01-01
Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Speeding up parallel processing
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
Programming parallel vision algorithms
Shapiro, L.G.
1988-01-01
Computer vision requires the processing of large volumes of data and requires parallel architectures and algorithms to be useful in real-time, industrial applications. The INSIGHT dataflow language was designed to allow encoding of vision algorithms at all levels of the computer vision paradigm. INSIGHT programs, which are relational in nature, can be translated into a graph structure that represents an architecture for solving a particular vision problem or a configuration of a reconfigurable computational network. The authors consider here INSIGHT programs that produce a parallel net architecture for solving low-, mid-, and high-level vision tasks.
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Among the highly parallel computing architectures required for advanced scientific computation, those designated 'MIMD' and 'SIMD' have yielded the best results to date. The present development status evaluation of such architectures shown neither to have attained a decisive advantage in most near-homogeneous problems' treatment; in the cases of problems involving numerous dissimilar parts, however, such currently speculative architectures as 'neural networks' or 'data flow' machines may be entailed. Data flow computers are the most practical form of MIMD fine-grained parallel computers yet conceived; they automatically solve the problem of assigning virtual processors to the real processors in the machine.
Coarrars for Parallel Processing
NASA Technical Reports Server (NTRS)
Snyder, W. Van
2011-01-01
The design of the Coarray feature of Fortran 2008 was guided by answering the question "What is the smallest change required to convert Fortran to a robust and efficient parallel language." Two fundamental issues that any parallel programming model must address are work distribution and data distribution. In order to coordinate work distribution and data distribution, methods for communication and synchronization must be provided. Although originally designed for Fortran, the Coarray paradigm has stimulated development in other languages. X10, Chapel, UPC, Titanium, and class libraries being developed for C++ have the same conceptual framework.
Physics of Partially Ionized Plasmas
NASA Astrophysics Data System (ADS)
Krishan, Vinod
2016-05-01
Figures; Preface; 1. Partially ionized plasmas here and everywhere; 2. Multifluid description of partially ionized plasmas; 3. Equilibrium of partially ionized plasmas; 4. Waves in partially ionized plasmas; 5. Advanced topics in partially ionized plasmas; 6. Research problems in partially ionized plasmas; Supplementary matter; Index.
Energy Science and Technology Software Center (ESTSC)
2004-10-21
This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.
NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Subhash, Saini; Bailey, David H.; Lasinski, T. A. (Technical Monitor)
1995-01-01
The NAS Parallel Benchmarks (NPB) were developed in 1991 at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a pencil and paper fashion i.e. the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. In this paper, we present new NPB performance results for the following systems: (a) Parallel-Vector Processors: Cray C90, Cray T'90 and Fujitsu VPP500; (b) Highly Parallel Processors: Cray T3D, IBM SP2 and IBM SP-TN2 (Thin Nodes 2); (c) Symmetric Multiprocessing Processors: Convex Exemplar SPP1000, Cray J90, DEC Alpha Server 8400 5/300, and SGI Power Challenge XL. We also present sustained performance per dollar for Class B LU, SP and BT benchmarks. We also mention NAS future plans of NPB.
High performance parallel architectures
Anderson, R.E. )
1989-09-01
In this paper the author describes current high performance parallel computer architectures. A taxonomy is presented to show computer architecture from the user programmer's point-of-view. The effects of the taxonomy upon the programming model are described. Some current architectures are described with respect to the taxonomy. Finally, some predictions about future systems are presented. 5 refs., 1 fig.
Foster, I.; Tuecke, S.
1993-01-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.
Parallel Multigrid Equation Solver
Energy Science and Technology Software Center (ESTSC)
2001-09-07
Prometheus is a fully parallel multigrid equation solver for matrices that arise in unstructured grid finite element applications. It includes a geometric and an algebraic multigrid method and has solved problems of up to 76 mullion degrees of feedom, problems in linear elasticity on the ASCI blue pacific and ASCI red machines.
Parallel Dislocation Simulator
Energy Science and Technology Software Center (ESTSC)
2006-10-30
ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.
Optical parallel selectionist systems
NASA Astrophysics Data System (ADS)
Caulfield, H. John
1993-01-01
There are at least two major classes of computers in nature and technology: connectionist and selectionist. A subset of connectionist systems (Turing Machines) dominates modern computing, although another subset (Neural Networks) is growing rapidly. Selectionist machines have unique capabilities which should allow them to do truly creative operations. It is possible to make a parallel optical selectionist system using methods describes in this paper.
Sampath, Rahul S; Sundar, Hari; Veerapaneni, Shravan
2010-01-01
We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N{sup 2}) time. The parallel time complexity estimates for our algorithms are O(N/n{sub p}) for uniform point distributions and O( (N/n{sub p}) log (N/n{sub p}) + n{sub p}log n{sub p}) for non-uniform distributions using n{sub p} CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits 'diagonal translation'. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer. Our implementation is 'kernel-independent' and can handle other 'Gaussian-type' kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.
Parallel hierarchical global illumination
Snell, Q.O.
1997-10-08
Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
Parallel hierarchical radiosity rendering
Carter, M.
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Why arthroscopic partial meniscectomy?
Lyu, Shaw-Ruey
2015-09-01
"Arthroscopic Partial Meniscectomy versus Sham Surgery for a Degenerative Meniscal Tear" published in the New England Journal of Medicine on December 26, 2013 draws the conclusion that arthroscopic partial medial meniscectomy provides no significant benefit over sham surgery in patients with a degenerative meniscal tear and no knee osteoarthritis. This result argues against the current practice of performing arthroscopic partial meniscectomy (APM) in patients with a degenerative meniscal tear. Since the number of APM performed has been increasing, the information provided by this study should lead to a change in clinical care of patients with a degenerative meniscus tear. PMID:26488013
Adapting implicit methods to parallel processors
Reeves, L.; McMillin, B.; Okunbor, D.; Riggins, D.
1994-12-31
When numerically solving many types of partial differential equations, it is advantageous to use implicit methods because of their better stability and more flexible parameter choice, (e.g. larger time steps). However, since implicit methods usually require simultaneous knowledge of the entire computational domain, these methods axe difficult to implement directly on distributed memory parallel processors. This leads to infrequent use of implicit methods on parallel/distributed systems. The usual implementation of implicit methods is inefficient due to the nature of parallel systems where it is common to take the computational domain and distribute the grid points over the processors so as to maintain a relatively even workload per processor. This creates a problem at the locations in the domain where adjacent points are not on the same processor. In order for the values at these points to be calculated, messages have to be exchanged between the corresponding processors. Without special adaptation, this will result in idle processors during part of the computation, and as the number of idle processors increases, the lower the effective speed improvement by using a parallel processor.
Partial knee replacement - slideshow
... page: //medlineplus.gov/ency/presentations/100225.htm Partial knee replacement - series To use the sharing features on ... A.M. Editorial team. Related MedlinePlus Health Topics Knee Replacement A.D.A.M., Inc. is accredited ...
Most people recover quickly and have much less pain than they did before surgery. People who have a partial knee replacement recover faster than those who have a total knee replacement. Many people are able to walk ...
Twisted partially pure spinors
NASA Astrophysics Data System (ADS)
Herrera, Rafael; Tellez, Ivan
2016-08-01
Motivated by the relationship between orthogonal complex structures and pure spinors, we define twisted partially pure spinors in order to characterize spinorially subspaces of Euclidean space endowed with a complex structure.
Extendability of parallel sections in vector bundles
NASA Astrophysics Data System (ADS)
Kirschner, Tim
2016-01-01
I address the following question: Given a differentiable manifold M, what are the open subsets U of M such that, for all vector bundles E over M and all linear connections ∇ on E, any ∇-parallel section in E defined on U extends to a ∇-parallel section in E defined on M? For simply connected manifolds M (among others) I describe the entirety of all such sets U which are, in addition, the complement of a C1 submanifold, boundary allowed, of M. This delivers a partial positive answer to a problem posed by Antonio J. Di Scala and Gianni Manno (2014). Furthermore, in case M is an open submanifold of Rn, n ≥ 2, I prove that the complement of U in M, not required to be a submanifold now, can have arbitrarily large n-dimensional Lebesgue measure.
Olmedo, Oscar; Zhang Jie
2010-07-20
Flux ropes are now generally accepted to be the magnetic configuration of coronal mass ejections (CMEs), which may be formed prior to or during solar eruptions. In this study, we model the flux rope as a current-carrying partial torus loop with its two footpoints anchored in the photosphere, and investigate its stability in the context of the torus instability (TI). Previous studies on TI have focused on the configuration of a circular torus and revealed the existence of a critical decay index of the overlying constraining magnetic field. Our study reveals that the critical index is a function of the fractional number of the partial torus, defined by the ratio between the arc length of the partial torus above the photosphere and the circumference of a circular torus of equal radius. We refer to this finding as the partial torus instability (PTI). It is found that a partial torus with a smaller fractional number has a smaller critical index, thus requiring a more gradually decreasing magnetic field to stabilize the flux rope. On the other hand, a partial torus with a larger fractional number has a larger critical index. In the limit of a circular torus when the fractional number approaches 1, the critical index goes to a maximum value. We demonstrate that the PTI helps us to understand the confinement, growth, and eventual eruption of a flux-rope CME.
Parallel Subconvolution Filtering Architectures
NASA Technical Reports Server (NTRS)
Gray, Andrew A.
2003-01-01
These architectures are based on methods of vector processing and the discrete-Fourier-transform/inverse-discrete- Fourier-transform (DFT-IDFT) overlap-and-save method, combined with time-block separation of digital filters into frequency-domain subfilters implemented by use of sub-convolutions. The parallel-processing method implemented in these architectures enables the use of relatively small DFT-IDFT pairs, while filter tap lengths are theoretically unlimited. The size of a DFT-IDFT pair is determined by the desired reduction in processing rate, rather than on the order of the filter that one seeks to implement. The emphasis in this report is on those aspects of the underlying theory and design rules that promote computational efficiency, parallel processing at reduced data rates, and simplification of the designs of very-large-scale integrated (VLSI) circuits needed to implement high-order filters and correlators.
Parallel Anisotropic Tetrahedral Adaptation
NASA Technical Reports Server (NTRS)
Park, Michael A.; Darmofal, David L.
2008-01-01
An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.
Homology, convergence and parallelism.
Ghiselin, Michael T
2016-01-01
Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. PMID:26598721
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A.; Chen, Dong; Chiu, George; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Hall, Shawn; Haring, Rudolf A.; Heidelberger, Philip; Kopcsay, Gerard V.; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan; Takken, Todd
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
NASA Technical Reports Server (NTRS)
Gryphon, Coranth D.; Miller, Mark D.
1991-01-01
PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.
Parallel multilevel preconditioners
Bramble, J.H.; Pasciak, J.E.; Xu, Jinchao.
1989-01-01
In this paper, we shall report on some techniques for the development of preconditioners for the discrete systems which arise in the approximation of solutions to elliptic boundary value problems. Here we shall only state the resulting theorems. It has been demonstrated that preconditioned iteration techniques often lead to the most computationally effective algorithms for the solution of the large algebraic systems corresponding to boundary value problems in two and three dimensional Euclidean space. The use of preconditioned iteration will become even more important on computers with parallel architecture. This paper discusses an approach for developing completely parallel multilevel preconditioners. In order to illustrate the resulting algorithms, we shall describe the simplest application of the technique to a model elliptic problem.
Xyce parallel electronic simulator.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Groh, E.F.; Lennox, D.H.
1963-04-23
This invention is concerned with a rigid assembly of parallel plates in which keyways are stamped out along the edges of the plates and a self-retaining key is inserted into aligned keyways. Spacers having similar keyways are included between adjacent plates. The entire assembly is locked into a rigid structure by fastening only the outermost plates to the ends of the keys. (AEC)
Adaptive parallel logic networks
Martinez, T.R.; Vidal, J.J.
1988-02-01
This paper presents a novel class of special purpose processors referred to as ASOCS (adaptive self-organizing concurrent systems). Intended applications include adaptive logic devices, robotics, process control, system malfunction management, and in general, applications of logic reasoning. ASOCS combines massive parallelism with self-organization to attain a distributed mechanism for adaptation. The ASOCS approach is based on an adaptive network composed of many simple computing elements (nodes) which operate in a combinational and asynchronous fashion. Problem specification (programming) is obtained by presenting to the system if-then rules expressed as Boolean conjunctions. New rules are added incrementally. In the current model, when conflicts occur, precedence is given to the most recent inputs. With each rule, desired network response is simply presented to the system, following which the network adjusts itself to maintain consistency and parsimony of representation. Data processing and adaptation form two separate phases of operation. During processing, the network acts as a parallel hardware circuit. Control of the adaptive process is distributed among the network nodes and efficiently exploits parallelism.
Trajectory optimization using parallel shooting method on parallel computer
Wirthman, D.J.; Park, S.Y.; Vadali, S.R.
1995-03-01
The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.
The Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.
Resistor Combinations for Parallel Circuits.
ERIC Educational Resources Information Center
McTernan, James P.
1978-01-01
To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)
NASA Astrophysics Data System (ADS)
Elghariani, Ali; Zoltowski, Michael D.
2012-05-01
In this paper, partial spread OFDM system has been presented and its performance has been studied when different detection techniques are employed, such as minimum mean square error (MMSE), grouped Maximum Likelihood (ML) and approximated integer quadratic programming (IQP) techniques . The performance study also includes applying two different spreading matrices, Hadamard and Vandermonde. Extensive computer simulation have been implemented and important results show that partial spread OFDM system improves the BER performance and the frequency diversity of OFDM compared to both non spread and full spread systems. The results from this paper also show that partial spreading technique combined with suboptimal detector could be a better solution for applications that require low receiver complexity and high information detectability.
Methanol partial oxidation reformer
Ahmed, S.; Kumar, R.; Krumpelt, M.
1999-08-17
A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.
Methanol partial oxidation reformer
Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael
1999-01-01
A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.
Methanol partial oxidation reformer
Ahmed, Shabbir; Kumar, Romesh; Krumpelt, Michael
2001-01-01
A partial oxidation reformer comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell.
Methanol partial oxidation reformer
Ahmed, S.; Kumar, R.; Krumpelt, M.
1999-08-24
A partial oxidation reformer is described comprising a longitudinally extending chamber having a methanol, water and an air inlet and an outlet. An igniter mechanism is near the inlets for igniting a mixture of methanol and air, while a partial oxidation catalyst in the chamber is spaced from the inlets and converts methanol and oxygen to carbon dioxide and hydrogen. Controlling the oxygen to methanol mole ratio provides continuous slightly exothermic partial oxidation reactions of methanol and air producing hydrogen gas. The liquid is preferably injected in droplets having diameters less than 100 micrometers. The reformer is useful in a propulsion system for a vehicle which supplies a hydrogen-containing gas to the negative electrode of a fuel cell. 7 figs.
Oxygen partial pressure sensor
Dees, D.W.
1994-09-06
A method for detecting oxygen partial pressure and an oxygen partial pressure sensor are provided. The method for measuring oxygen partial pressure includes contacting oxygen to a solid oxide electrolyte and measuring the subsequent change in electrical conductivity of the solid oxide electrolyte. A solid oxide electrolyte is utilized that contacts both a porous electrode and a nonporous electrode. The electrical conductivity of the solid oxide electrolyte is affected when oxygen from an exhaust stream permeates through the porous electrode to establish an equilibrium of oxygen anions in the electrolyte, thereby displacing electrons throughout the electrolyte to form an electron gradient. By adapting the two electrodes to sense a voltage potential between them, the change in electrolyte conductivity due to oxygen presence can be measured. 1 fig.
Oxygen partial pressure sensor
Dees, Dennis W.
1994-01-01
A method for detecting oxygen partial pressure and an oxygen partial pressure sensor are provided. The method for measuring oxygen partial pressure includes contacting oxygen to a solid oxide electrolyte and measuring the subsequent change in electrical conductivity of the solid oxide electrolyte. A solid oxide electrolyte is utilized that contacts both a porous electrode and a nonporous electrode. The electrical conductivity of the solid oxide electrolyte is affected when oxygen from an exhaust stream permeates through the porous electrode to establish an equilibrium of oxygen anions in the electrolyte, thereby displacing electrons throughout the electrolyte to form an electron gradient. By adapting the two electrodes to sense a voltage potential between them, the change in electrolyte conductivity due to oxygen presence can be measured.
Parallel Pascal - An extended Pascal for parallel computers
NASA Technical Reports Server (NTRS)
Reeves, A. P.
1984-01-01
Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.
Parallel Eclipse Project Checkout
NASA Technical Reports Server (NTRS)
Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.
2011-01-01
Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.
Fastpath Speculative Parallelization
NASA Astrophysics Data System (ADS)
Spear, Michael F.; Kelsey, Kirk; Bai, Tongxin; Dalessandro, Luke; Scott, Michael L.; Ding, Chen; Wu, Peng
We describe Fastpath, a system for speculative parallelization of sequential programs on conventional multicore processors. Our system distinguishes between the lead thread, which executes at almost-native speed, and speculative threads, which execute somewhat slower. This allows us to achieve nontrivial speedup, even on two-core machines. We present a mathematical model of potential speedup, parameterized by application characteristics and implementation constants. We also present preliminary results gleaned from two different Fastpath implementations, each derived from an implementation of software transactional memory.
CSM parallel structural methods research
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1989-01-01
Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.
Synchronous Parallel Kinetic Monte Carlo
Mart?nez, E; Marian, J; Kalos, M H
2006-12-14
A novel parallel kinetic Monte Carlo (kMC) algorithm formulated on the basis of perfect time synchronicity is presented. The algorithm provides an exact generalization of any standard serial kMC model and is trivially implemented in parallel architectures. We demonstrate the mathematical validity and parallel performance of the method by solving several well-understood problems in diffusion.
Roo: A parallel theorem prover
Lusk, E.L.; McCune, W.W.; Slaney, J.K.
1991-11-01
We describe a parallel theorem prover based on the Argonne theorem-proving system OTTER. The parallel system, called Roo, runs on shared-memory multiprocessors such as the Sequent Symmetry. We explain the parallel algorithm used and give performance results that demonstrate near-linear speedups on large problems.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Partial Arc Curvilinear Direct Drive Servomotor
NASA Technical Reports Server (NTRS)
Sun, Xiuhong (Inventor)
2014-01-01
A partial arc servomotor assembly having a curvilinear U-channel with two parallel rare earth permanent magnet plates facing each other and a pivoted ironless three phase coil armature winding moves between the plates. An encoder read head is fixed to a mounting plate above the coil armature winding and a curvilinear encoder scale is curved to be co-axis with the curvilinear U-channel permanent magnet track formed by the permanent magnet plates. Driven by a set of miniaturized power electronics devices closely looped with a positioning feedback encoder, the angular position and velocity of the pivoted payload is programmable and precisely controlled.
Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G
2007-04-11
The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.
Baskin, Tobias I.; Gu, Ying
2012-01-01
The extracellular matrix is constructed beyond the plasma membrane, challenging mechanisms for its control by the cell. In plants, the cell wall is highly ordered, with cellulose microfibrils aligned coherently over a scale spanning hundreds of cells. To a considerable extent, deploying aligned microfibrils determines mechanical properties of the cell wall, including strength and compliance. Cellulose microfibrils have long been seen to be aligned in parallel with an array of microtubules in the cell cortex. How do these cortical microtubules affect the cellulose synthase complex? This question has stood for as many years as the parallelism between the elements has been observed, but now an answer is emerging. Here, we review recent work establishing that the link between microtubules and microfibrils is mediated by a protein named cellulose synthase-interacting protein 1 (CSI1). The protein binds both microtubules and components of the cellulose synthase complex. In the absence of CSI1, microfibrils are synthesized but their alignment becomes uncoupled from the microtubules, an effect that is phenocopied in the wild type by depolymerizing the microtubules. The characterization of CSI1 significantly enhances knowledge of how cellulose is aligned, a process that serves as a paradigmatic example of how cells dictate the construction of their extracellular environment. PMID:22902763
Applied Parallel Metadata Indexing
Jacobi, Michael R
2012-08-01
The GPFS Archive is parallel archive is a parallel archive used by hundreds of users in the Turquoise collaboration network. It houses 4+ petabytes of data in more than 170 million files. Currently, users must navigate the file system to retrieve their data, requiring them to remember file paths and names. A better solution might allow users to tag data with meaningful labels and searach the archive using standard and user-defined metadata, while maintaining security. last summer, I developed the backend to a tool that adheres to these design goals. The backend works by importing GPFS metadata into a MongoDB cluster, which is then indexed on each attribute. This summer, the author implemented security and developed the user interfae for the search tool. To meet security requirements, each database table is associated with a single user, which only stores records that the user may read, and requires a set of credentials to access. The interface to the search tool is implemented using FUSE (Filesystem in USErspace). FUSE is an intermediate layer that intercepts file system calls and allows the developer to redefine how those calls behave. In the case of this tool, FUSE interfaces with MongoDB to issue queries and populate output. A FUSE implementation is desirable because it allows users to interact with the search tool using commands they are already familiar with. These security and interface additions are essential for a usable product.
Parallel ptychographic reconstruction
Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; Deng, Junjing; Ross, Rob; Jacobsen, Chris
2014-01-01
Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps to take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source. PMID:25607174
Partial Participation Revisited.
ERIC Educational Resources Information Center
Ferguson, Dianne L.; Baumgart, Diane
1991-01-01
This article reanalyzes the principle of partial participation in integrated educational programing for students with severe or profound disabilities. The article presents four "error patterns" in how the concept has been used, some reasons why such error patterns have occurred, and strategies for avoiding these errors. (Author/JDD)
NASA Technical Reports Server (NTRS)
Capps, Stephen; Lorandos, Jason; Akhidime, Eval; Bunch, Michael; Lund, Denise; Moore, Nathan; Murakawa, Kiosuke
1989-01-01
The purpose of this study is to investigate comprehensive design requirements associated with designing habitats for humans in a partial gravity environment, then to apply them to a lunar base design. Other potential sites for application include planetary surfaces such as Mars, variable-gravity research facilities, and a rotating spacecraft. Design requirements for partial gravity environments include locomotion changes in less than normal earth gravity; facility design issues, such as interior configuration, module diameter, and geometry; and volumetric requirements based on the previous as well as psychological issues involved in prolonged isolation. For application to a lunar base, it is necessary to study the exterior architecture and configuration to insure optimum circulation patterns while providing dual egress; radiation protection issues are addressed to provide a safe and healthy environment for the crew; and finally, the overall site is studied to locate all associated facilities in context with the habitat. Mission planning is not the purpose of this study; therefore, a Lockheed scenario is used as an outline for the lunar base application, which is then modified to meet the project needs. The goal of this report is to formulate facts on human reactions to partial gravity environments, derive design requirements based on these facts, and apply the requirements to a partial gravity situation which, for this study, was a lunar base.
Partial wave analysis using graphics processing units
NASA Astrophysics Data System (ADS)
Berger, Niklaus; Beijiang, Liu; Jike, Wang
2010-04-01
Partial wave analysis is an important tool for determining resonance properties in hadron spectroscopy. For large data samples however, the un-binned likelihood fits employed are computationally very expensive. At the Beijing Spectrometer (BES) III experiment, an increase in statistics compared to earlier experiments of up to two orders of magnitude is expected. In order to allow for a timely analysis of these datasets, additional computing power with short turnover times has to be made available. It turns out that graphics processing units (GPUs) originally developed for 3D computer games have an architecture of massively parallel single instruction multiple data floating point units that is almost ideally suited for the algorithms employed in partial wave analysis. We have implemented a framework for tensor manipulation and partial wave fits called GPUPWA. The user writes a program in pure C++ whilst the GPUPWA classes handle computations on the GPU, memory transfers, caching and other technical details. In conjunction with a recent graphics processor, the framework provides a speed-up of the partial wave fit by more than two orders of magnitude compared to legacy FORTRAN code.
A systolic array parallelizing compiler
Tseng, P.S. )
1990-01-01
This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.
Parallel node placement method by bubble simulation
NASA Astrophysics Data System (ADS)
Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang
2014-03-01
An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.
DeHart, Mark D; Williams, Mark L; Bowman, Stephen M
2010-01-01
The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement
McKay, Mike
2003-12-01
UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use of EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.
Parallel Polarization State Generation
She, Alan; Capasso, Federico
2016-01-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813
Energy Science and Technology Software Center (ESTSC)
2003-12-01
UPS (Unified Paralled Software is a collection of software tools libraries, scripts, executables) that assist in parallel programming. This consists of: o libups.a C/Fortran callable routines for message passing (utilities written on top of MPI) and file IO (utilities written on top of HDF). o libuserd-HDF.so EnSight user-defined reader for visualizing data files written with UPS File IO. o ups_libuserd_query, ups_libuserd_prep.pl, ups_libuserd_script.pl Executables/scripts to get information from data files and to simplify the use ofmore » EnSight on those data files. o ups_io_rm/ups_io_cp Manipulate data files written with UPS File IO These tools are portable to a wide variety of Unix platforms.« less
Parallel Polarization State Generation
NASA Astrophysics Data System (ADS)
She, Alan; Capasso, Federico
2016-05-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Parallel Polarization State Generation.
She, Alan; Capasso, Federico
2016-01-01
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security. PMID:27184813
Parallel tridiagonal equation solvers
NASA Technical Reports Server (NTRS)
Stone, H. S.
1974-01-01
Three parallel algorithms were compared for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC 4 and CDC STAR. For array computers similar to ILLIAC 4, cyclic odd-even reduction has the least operation count for highly structured sets of equations, and recursive doubling has the least count for relatively unstructured sets of equations. Since the difference in operation counts for these two algorithms is not substantial, their relative running times may be more related to overhead operations, which are not measured in this paper. The third algorithm, based on Buneman's Poisson solver, has more arithmetic operations than the others, and appears to be the least favorable. For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases.
Parallel Imaging Microfluidic Cytometer
Ehrlich, Daniel J.; McKenna, Brian K.; Evans, James G.; Belkina, Anna C.; Denis, Gerald V.; Sherr, David; Cheung, Man Ching
2011-01-01
By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of flow cytometry (FACS) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1-D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity and, (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in approximately 6–10 minutes, about 30-times the speed of most current FACS systems. In 1-D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times the sample throughput of CCD-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. PMID:21704835
Parallelizing OVERFLOW: Experiences, Lessons, Results
NASA Technical Reports Server (NTRS)
Jespersen, Dennis C.
1999-01-01
The computer code OVERFLOW is widely used in the aerodynamic community for the numerical solution of the Navier-Stokes equations. Current trends in computer systems and architectures are toward multiple processors and parallelism, including distributed memory. This report describes work that has been carried out by the author and others at Ames Research Center with the goal of parallelizing OVERFLOW using a variety of parallel architectures and parallelization strategies. This paper begins with a brief description of the OVERFLOW code. This description includes the basic numerical algorithm and some software engineering considerations. Next comes a description of a parallel version of OVERFLOW, OVERFLOW/PVM, using PVM (Parallel Virtual Machine). This parallel version of OVERFLOW uses the manager/worker style and is part of the standard OVERFLOW distribution. Then comes a description of a parallel version of OVERFLOW, OVERFLOW/MPI, using MPI (Message Passing Interface). This parallel version of OVERFLOW uses the SPMD (Single Program Multiple Data) style. Finally comes a discussion of alternatives to explicit message-passing in the context of parallelizing OVERFLOW.
Partially coherent ultrafast spectrography
NASA Astrophysics Data System (ADS)
Bourassin-Bouchet, C.; Couprie, M.-E.
2015-03-01
Modern ultrafast metrology relies on the postulate that the pulse to be measured is fully coherent, that is, that it can be completely described by its spectrum and spectral phase. However, synthesizing fully coherent pulses is not always possible in practice, especially in the domain of emerging ultrashort X-ray sources where temporal metrology is strongly needed. Here we demonstrate how frequency-resolved optical gating (FROG), the first and one of the most widespread techniques for pulse characterization, can be adapted to measure partially coherent pulses even down to the attosecond timescale. No modification of experimental apparatuses is required; only the processing of the measurement changes. To do so, we take our inspiration from other branches of physics where partial coherence is routinely dealt with, such as quantum optics and coherent diffractive imaging. This will have important and immediate applications, such as enabling the measurement of X-ray free-electron laser pulses despite timing jitter.
Partially integrated exhaust manifold
Hayman, Alan W; Baker, Rodney E
2015-01-20
A partially integrated manifold assembly is disclosed which improves performance, reduces cost and provides efficient packaging of engine components. The partially integrated manifold assembly includes a first leg extending from a first port and terminating at a mounting flange for an exhaust gas control valve. Multiple additional legs (depending on the total number of cylinders) are integrally formed with the cylinder head assembly and extend from the ports of the associated cylinder and terminate at an exit port flange. These additional legs are longer than the first leg such that the exit port flange is spaced apart from the mounting flange. This configuration provides increased packaging space adjacent the first leg for any valving that may be required to control the direction and destination of exhaust flow in recirculation to an EGR valve or downstream to a catalytic converter.
Partially coherent ultrafast spectrography
Bourassin-Bouchet, C.; Couprie, M.-E.
2015-01-01
Modern ultrafast metrology relies on the postulate that the pulse to be measured is fully coherent, that is, that it can be completely described by its spectrum and spectral phase. However, synthesizing fully coherent pulses is not always possible in practice, especially in the domain of emerging ultrashort X-ray sources where temporal metrology is strongly needed. Here we demonstrate how frequency-resolved optical gating (FROG), the first and one of the most widespread techniques for pulse characterization, can be adapted to measure partially coherent pulses even down to the attosecond timescale. No modification of experimental apparatuses is required; only the processing of the measurement changes. To do so, we take our inspiration from other branches of physics where partial coherence is routinely dealt with, such as quantum optics and coherent diffractive imaging. This will have important and immediate applications, such as enabling the measurement of X-ray free-electron laser pulses despite timing jitter. PMID:25744080
Partial quantum logics revisited
NASA Astrophysics Data System (ADS)
Vetterlein, Thomas
2011-01-01
Partial Boolean algebras (PBAs) were introduced by Kochen and Specker as an algebraic model reflecting the mutual relationships among quantum-physical yes-no tests. The fact that not all pairs of tests are compatible was taken into special account. In this paper, we review PBAs from two sides. First, we generalise the concept, taking into account also those yes-no tests which are based on unsharp measurements. Namely, we introduce partial MV-algebras, and we define a corresponding logic. Second, we turn to the representation theory of PBAs. In analogy to the case of orthomodular lattices, we give conditions for a PBA to be isomorphic to the PBA of closed subspaces of a complex Hilbert space. Hereby, we do not restrict ourselves to purely algebraic statements; we rather give preference to conditions involving automorphisms of a PBA. We conclude by outlining a critical view on the logico-algebraic approach to the foundational problem of quantum physics.
PMESH: A parallel mesh generator
Hardin, D.D.
1994-10-21
The Parallel Mesh Generation (PMESH) Project is a joint LDRD effort by A Division and Engineering to develop a unique mesh generation system that can construct large calculational meshes (of up to 10{sup 9} elements) on massively parallel computers. Such a capability will remove a critical roadblock to unleashing the power of massively parallel processors (MPPs) for physical analysis. PMESH will support a variety of LLNL 3-D physics codes in the areas of electromagnetics, structural mechanics, thermal analysis, and hydrodynamics.
Nonlinear GRAPPA: a kernel approach to parallel MRI reconstruction.
Chang, Yuchou; Liang, Dong; Ying, Leslie
2012-09-01
GRAPPA linearly combines the undersampled k-space signals to estimate the missing k-space signals where the coefficients are obtained by fitting to some auto-calibration signals (ACS) sampled with Nyquist rate based on the shift-invariant property. At high acceleration factors, GRAPPA reconstruction can suffer from a high level of noise even with a large number of auto-calibration signals. In this work, we propose a nonlinear method to improve GRAPPA. The method is based on the so-called kernel method which is widely used in machine learning. Specifically, the undersampled k-space signals are mapped through a nonlinear transform to a high-dimensional feature space, and then linearly combined to reconstruct the missing k-space data. The linear combination coefficients are also obtained through fitting to the ACS data but in the new feature space. The procedure is equivalent to adding many virtual channels in reconstruction. A polynomial kernel with explicit mapping functions is investigated in this work. Experimental results using phantom and in vivo data demonstrate that the proposed nonlinear GRAPPA method can significantly improve the reconstruction quality over GRAPPA and its state-of-the-art derivatives. PMID:22161975
Parallel processor engine model program
NASA Technical Reports Server (NTRS)
Mclaughlin, P.
1984-01-01
The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.
Parallel computation with the force
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1985-01-01
A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Lau, Sonie
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Parallel Programming in the Age of Ubiquitous Parallelism
NASA Astrophysics Data System (ADS)
Pingali, Keshav
2014-04-01
Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs
General classification of partially polarized partially coherent beams
NASA Astrophysics Data System (ADS)
Martinez-Herrero, Rosario; Piquero, Gemma; Mejias, Pedro M.
2003-05-01
The behavior of the so-called generalized degree of polarization of partially coherent partially polarized beams upon free propagation is investigated. On the basis of this parameter a general classification scheme of partially polarized beams is proposed. The results are applied to certain classes of fields of special interest.
Experts' Understanding of Partial Derivatives Using the Partial Derivative Machine
ERIC Educational Resources Information Center
Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.
2015-01-01
Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of…
Parallel execution model for Prolog
Fagin, B.S.
1987-01-01
One candidate language for parallel symbolic computing is Prolog. Numerous ways for executing Prolog in parallel have been proposed, but current efforts suffer from several deficiencies. Many cannot support fundamental types of concurrency in Prolog. Other models are of purely theoretical interest, ignoring implementation costs. Detailed simulation studies of execution models are scare; at present little is known about the costs and benefits of executing Prolog in parallel. In this thesis, a new parallel execution model for Prolog is presented: the PPP model or Parallel Prolog Processor. The PPP supports AND-parallelism, OR-parallelism, and intelligent backtracking. An implementation of the PPP is described, through the extension of an existing Prolog abstract machine architecture. Several examples of PPP execution are presented, and compilation to the PPP abstract instruction set is discussed. The performance effects of this model are reported, based on a simulation of a large benchmark set. The implications of these results for parallel Prolog systems are discussed, and directions for future work are indicated.
Reordering computations for parallel execution
NASA Technical Reports Server (NTRS)
Adams, L.
1985-01-01
The computations are reordered in the SOR algorithm to maintain the same asymptotic rate of convergence as the rowwise ordering to obtain parallelism at different levels. A parallel program is written to illustrate these ideas and actual machines for implementation of this program are discussed.
Parallelizing Monte Carlo with PMC
Rathkopf, J.A.; Jones, T.R.; Nessett, D.M.; Stanberry, L.C.
1994-11-01
PMC (Parallel Monte Carlo) is a system of generic interface routines that allows easy porting of Monte Carlo packages of large-scale physics simulation codes to Massively Parallel Processor (MPP) computers. By loading various versions of PMC, simulation code developers can configure their codes to run in several modes: serial, Monte Carlo runs on the same processor as the rest of the code; parallel, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on other MPP processor(s); distributed, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on a different machine. This multi-mode approach allows maintenance of a single simulation code source regardless of the target machine. PMC handles passing of messages between nodes on the MPP, passing of messages between a different machine and the MPP, distributing work between nodes, and providing independent, reproducible sequences of random numbers. Several production codes have been parallelized under the PMC system. Excellent parallel efficiency in both the distributed and parallel modes results if sufficient workload is available per processor. Experiences with a Monte Carlo photonics demonstration code and a Monte Carlo neutronics package are described.
The Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Parallel contingency statistics with Titan.
Thompson, David C.; Pebay, Philippe Pierre
2009-09-01
This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.
Problem size, parallel architecture and optimal speedup
NASA Technical Reports Server (NTRS)
Nicol, David M.; Willard, Frank H.
1987-01-01
The communication and synchronization overhead inherent in parallel processing can lead to situations where adding processors to the solution method actually increases execution time. Problem type, problem size, and architecture type all affect the optimal number of processors to employ. The numerical solution of an elliptic partial differential equation is examined in order to study the relationship between problem size and architecture. The equation's domain is discretized into n sup 2 grid points which are divided into partitions and mapped onto the individual processor memories. The relationships between grid size, stencil type, partitioning strategy, processor execution time, and communication network type are analytically quantified. In so doing, the optimal number of processors was determined to assign to the solution, and identified (1) the smallest grid size which fully benefits from using all available processors, (2) the leverage on performance given by increasing processor speed or communication network speed, and (3) the suitability of various architectures for large numerical problems.
Problem size, parallel architecture, and optimal speedup
NASA Technical Reports Server (NTRS)
Nicol, David M.; Willard, Frank H.
1988-01-01
The communication and synchronization overhead inherent in parallel processing can lead to situations where adding processors to the solution method actually increases execution time. Problem type, problem size, and architecture type all affect the optimal number of processors to employ. The numerical solution of an elliptic partial differential equation is examined in order to study the relationship between problem size and architecture. The equation's domain is discretized into n sup 2 grid points which are divided into partitions and mapped onto the individual processor memories. The relationships between grid size, stencil type, partitioning strategy, processor execution time, and communication network type are analytically quantified. In so doing, the optimal number of processors was determined to assign to the solution, and identified (1) the smallest grid size which fully benefits from using all available processors, (2) the leverage on performance given by increasing processor speed or communication network speed, and (3) the suitability of various architectures for large numerical problems.
Parallel Vegetation Stripe Formation Through Hydrologic Interactions
NASA Astrophysics Data System (ADS)
Cheng, Y.; Stieglitz, M.; Engel, V.; Turk, G.
2009-12-01
Vegetation in many parts of the world display intriguing patterns: from the regularly spaced stripes on hillsides to the irregular mosaics. However, it has long been a challenge to describe how these patterns develop. Recently, there have been successes in describing pattern development mathematically. The Klausmeir model (Klausmeir., 1999), which simulates vegetation stripes perpendicular to flow field, consists of two partial differential equations that describe plant and surface water dynamics on a gently sloping landscape. More recently, Rietkerk et al (2004) proposed a simple 2D advection-diffusion model which differs from earlier models in that it includes for hydraulic head interactions. The Rietkerk model simulates plant-water and plant-nutrient dynamics and generates vegetation patterns of reasonable scales: 'maze patterns' on flat ground and stripes perpendicular to flow on slopes. However, to date none of these theoretical studies have been able to simulate the development of regularly spaced vegetation stripes parallel to flow direction. Such vegetation patterns are, for example, characteristic of the ridge and slough system (S&R) in the Everglades. We employ the Rietkerk model to describe for the first time to our knowledge, the formation of parallel stripes from hydrologic interactions. To simulate the perpendicular stripes, Rietkerk et al only allowed for the local advection of water and nutrient in one direction. To simulate parallel stripes, we retain the basic equations of the Rietkerk model but allow for constant advection of water and nutrient in one direction to simulate slope conditions, with evapotranspiration driven advection of water and nutrient perpendicular to the downhill flow direction. In this model, the relatively higher rates of evapotranspiration on the vegetation patches compared to the non-vegetated areas create hydraulic gradients, which then drive the convergence of dissolved nutrients from the downhill flow to the growing
Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library
Energy Science and Technology Software Center (ESTSC)
2015-02-19
ParFELAG is a parallel distributed memory C++ library for numerical upscaling of finite element discretizations. It provides optimal complesity algorithms ro build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured mesh (under the assumption that the topology of the agglomerated entities is correct). Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.
Partially segmented deformable mirror
Bliss, Erlan S.; Smith, James R.; Salmon, J. Thaddeus; Monjes, Julio A.
1991-01-01
A partially segmented deformable mirror is formed with a mirror plate having a smooth and continuous front surface and a plurality of actuators to its back surface. The back surface is divided into triangular areas which are mutually separated by grooves. The grooves are deep enough to make the plate deformable and the actuators for displacing the mirror plate in the direction normal to its surface are inserted in the grooves at the vertices of the triangular areas. Each actuator includes a transducer supported by a receptacle with outer shells having outer surfaces. The vertices have inner walls which are approximately perpendicular to the mirror surface and make planar contacts with the outer surfaces of the outer shells. The adhesive which is used on these contact surfaces tends to contract when it dries but the outer shells can bend and serve to minimize the tendency of the mirror to warp.
Partially segmented deformable mirror
Bliss, E.S.; Smith, J.R.; Salmon, J.T.; Monjes, J.A.
1991-05-21
A partially segmented deformable mirror is formed with a mirror plate having a smooth and continuous front surface and a plurality of actuators to its back surface. The back surface is divided into triangular areas which are mutually separated by grooves. The grooves are deep enough to make the plate deformable and the actuators for displacing the mirror plate in the direction normal to its surface are inserted in the grooves at the vertices of the triangular areas. Each actuator includes a transducer supported by a receptacle with outer shells having outer surfaces. The vertices have inner walls which are approximately perpendicular to the mirror surface and make planar contacts with the outer surfaces of the outer shells. The adhesive which is used on these contact surfaces tends to contract when it dries but the outer shells can bend and serve to minimize the tendency of the mirror to warp. 5 figures.
Krumpelt, Michael; Ahmed, Shabbir; Kumar, Romesh; Doshi, Rajiv
2000-01-01
A two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion. The dehydrogenation portion is a group VIII metal and the oxide-ion conducting portion is selected from a ceramic oxide crystallizing in the fluorite or perovskite structure. There is also disclosed a method of forming a hydrogen rich gas from a source of hydrocarbon fuel in which the hydrocarbon fuel contacts a two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion at a temperature not less than about 400.degree. C. for a time sufficient to generate the hydrogen rich gas while maintaining CO content less than about 5 volume percent. There is also disclosed a method of forming partially oxidized hydrocarbons from ethanes in which ethane gas contacts a two-part catalyst comprising a dehydrogenation portion and an oxide-ion conducting portion for a time and at a temperature sufficient to form an oxide.
Electrical conductivity anisotropy of partially molten peridotite under shear deformation
NASA Astrophysics Data System (ADS)
Zhang, B.; Yoshino, T.; Yamazaki, D.; Manthilake, G. M.; Katsura, T.
2013-12-01
Recent ocean bottom magnetotelluric investigations have revealed a high-conductivity layer (HCL) with high anisotropy characterized by higher conductivity values in the direction parallel to the plate motion beneath the southern East Pacific Rise (Evans et al., 2005) and beneath the edge of the Cocos plate at the Middle America trench offshore of Nicaragua (Naif et al., 2013). These geophysical observations have been attributed to either hydration (water) of mantle minerals or the presence of partial melt. Currently, aligned partial melt has been regarded as the most preferable candidate for explaining the conductivity anisotropy because of the implausibility of proton conduction (Yoshino et al., 2006). In this study, we report development of the conductivity anisotropy between parallel and normal to shear direction on the shear plane in partial molten peridotite as a function of time and shear strain. Starting samples were pre-synthesized partial molten peridotite, showing homogeneous melt distribution. The partially molten peridotite samples were deformed in simple shear geometry at 1 GPa and 1723 K in a DIA-type apparatus with uniaxial deformation facility. Conductivity difference between parallel and normal to shear direction reached one order, which is equivalent to that observed beneath asthenosphere. In contrast, such anisotropic behavior was not found in the melt-free samples, suggesting that development of the conductivity anisotropy was generated under shear stress. Microstructure of the deformed partial molten peridotite shows partial melt tends to preferentially locate grain boundaries parallel to shear direction, and forms continuously thin melt layer sub-parallel to the shear direction, whereas apparently isolated distribution was observed on the section perpendicular to the shear direction. The resultant melt morphology can be approximated by tube like geometry parallel to the shear direction. This observation suggests that the development of
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Lau, Sonie; Yan, Jerry C.
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.
Parallel NPARC: Implementation and Performance
NASA Technical Reports Server (NTRS)
Townsend, S. E.
1996-01-01
Version 3 of the NPARC Navier-Stokes code includes support for large-grain (block level) parallelism using explicit message passing between a heterogeneous collection of computers. This capability has the potential for significant performance gains, depending upon the block data distribution. The parallel implementation uses a master/worker arrangement of processes. The master process assigns blocks to workers, controls worker actions, and provides remote file access for the workers. The processes communicate via explicit message passing using an interface library which provides portability to a number of message passing libraries, such as PVM (Parallel Virtual Machine). A Bourne shell script is used to simplify the task of selecting hosts, starting processes, retrieving remote files, and terminating a computation. This script also provides a simple form of fault tolerance. An analysis of the computational performance of NPARC is presented, using data sets from an F/A-18 inlet study and a Rocket Based Combined Cycle Engine analysis. Parallel speedup and overall computational efficiency were obtained for various NPARC run parameters on a cluster of IBM RS6000 workstations. The data show that although NPARC performance compares favorably with the estimated potential parallelism, typical data sets used with previous versions of NPARC will often need to be reblocked for optimum parallel performance. In one of the cases studied, reblocking increased peak parallel speedup from 3.2 to 11.8.
Parallel incremental compilation. Doctoral thesis
Gafter, N.M.
1990-06-01
The time it takes to compile a large program has been a bottleneck in the software development process. When an interactive programming environment with an incremental compiler is used, compilation speed becomes even more important, but existing incremental compilers are very slow for some types of program changes. We describe a set of techniques that enable incremental compilation to exploit fine-grained concurrency in a shared-memory multi-processor and achieve asymptotic improvement over sequential algorithms. Because parallel non-incremental compilation is a special case of parallel incremental compilation, the design of a parallel compiler is a corollary of our result. Instead of running the individual phases concurrently, our design specifies compiler phases that are mutually sequential. However, each phase is designed to exploit fine-grained parallelism. By allowing each phase to present its output as a complete structure rather than as a stream of data, we can apply techniques such as parallel prefix and parallel divide-and-conquer, and we can construct applicative data structures to achieve sublinear execution time. Parallel algorithms for each phase of a compiler are presented to demonstrate that a complete incremental compiler can achieve execution time that is asymptotically less than sequential algorithms.
EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS
F. PETRINI; W. FENG
1999-09-01
We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.
Parallel integer sorting with medium and fine-scale parallelism
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens; Inglett, Todd Alan
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations
NASA Technical Reports Server (NTRS)
Chrisochoides, Nikos
1995-01-01
We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
Solving unstructured grid problems on massively parallel computers
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1990-01-01
A highly parallel graph mapping technique that enables one to efficiently solve unstructured grid problems on massively parallel computers is presented. Many implicit and explicit methods for solving discretized partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The cost of this communication can negate the high performance promised by massively parallel computing. To eliminate this bottleneck, the graph of the irregular problem is mapped into the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. It is shown that using the heuristic mapping algorithm significantly reduces the communication time compared to a naive assignment of processes to processors.
Automating the parallel processing of fluid and structural dynamics calculations
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.; Cole, Gary L.
1987-01-01
The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilties to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.
Automating the parallel processing of fluid and structural dynamics calculations
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.; Cole, Gary L.
1987-01-01
The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.
Parallel Architecture For Robotics Computation
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1990-01-01
Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.
Multigrid on massively parallel architectures
Falgout, R D; Jones, J E
1999-09-17
The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.
Parallel inverse iteration with reorthogonalization
Fann, G.I.; Littlefield, R.J.
1993-03-01
A parallel method for finding orthogonal eigenvectors of real symmetric tridiagonal is described. The method uses inverse iteration with repeated Modified Gram-Schmidt (MGS) reorthogonalization of the unconverged iterates for clustered eigenvalues. This approach is more parallelizable than reorthogonalizing against fully converged eigenvectors, as is done by LAPACK's current DSTEIN routine. The new method is found to provide accuracy and speed comparable to DSTEIN's and to have good parallel scalability even for matrices with large clusters of eigenvalues. We present al results for residual and orthogonality tests, plus timings on IBM RS/6000 (sequential) and Intel Touchstone DELTA (parallel) computers.
Parallel inverse iteration with reorthogonalization
Fann, G.I.; Littlefield, R.J.
1993-03-01
A parallel method for finding orthogonal eigenvectors of real symmetric tridiagonal is described. The method uses inverse iteration with repeated Modified Gram-Schmidt (MGS) reorthogonalization of the unconverged iterates for clustered eigenvalues. This approach is more parallelizable than reorthogonalizing against fully converged eigenvectors, as is done by LAPACK`s current DSTEIN routine. The new method is found to provide accuracy and speed comparable to DSTEIN`s and to have good parallel scalability even for matrices with large clusters of eigenvalues. We present al results for residual and orthogonality tests, plus timings on IBM RS/6000 (sequential) and Intel Touchstone DELTA (parallel) computers.
Appendix E: Parallel Pascal development system
NASA Technical Reports Server (NTRS)
1985-01-01
The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Turbomachinery CFD on parallel computers
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.
1992-01-01
The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.
Predicting performance of parallel computations
NASA Technical Reports Server (NTRS)
Mak, Victor W.; Lundstrom, Stephen F.
1990-01-01
An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described. A parallel computation is modeled as a task system with precedence relationships expressed as a series-parallel directed acyclic graph. Resources in a concurrent system are modeled as service centers in a queuing network model. Using these two models as inputs, the method outputs predictions of expected execution time of the parallel computation and the concurrent system utilization. The method is validated against both detailed simulation and actual execution on a commercial multiprocessor. Using 100 test cases, the average error of the prediction when compared to simulation statistics is 1.7 percent, with a standard deviation of 1.5 percent; the maximum error is about 10 percent.
Parallel hierarchical method in networks
NASA Astrophysics Data System (ADS)
Malinochka, Olha; Tymchenko, Leonid
2007-09-01
This method of parallel-hierarchical Q-transformation offers new approach to the creation of computing medium - of parallel -hierarchical (PH) networks, being investigated in the form of model of neurolike scheme of data processing [1-5]. The approach has a number of advantages as compared with other methods of formation of neurolike media (for example, already known methods of formation of artificial neural networks). The main advantage of the approach is the usage of multilevel parallel interaction dynamics of information signals at different hierarchy levels of computer networks, that enables to use such known natural features of computations organization as: topographic nature of mapping, simultaneity (parallelism) of signals operation, inlaid cortex, structure, rough hierarchy of the cortex, spatially correlated in time mechanism of perception and training [5].
"Feeling" Series and Parallel Resistances.
ERIC Educational Resources Information Center
Morse, Robert A.
1993-01-01
Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)
Demonstrating Forces between Parallel Wires.
ERIC Educational Resources Information Center
Baker, Blane
2000-01-01
Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)
Parallel computation using limited resources
Sugla, B.
1985-01-01
This thesis addresses itself to the task of designing and analyzing parallel algorithms when the resources of processors, communication, and time are limited. The two parts of this thesis deal with multiprocessor systems and VLSI - the two important parallel processing environments that are prevalent today. In the first part a time-processor-communication tradeoff analysis is conducted for two kinds of problems - N input, 1 output, and N input, N output computations. In the class of problems of the second kind, the problem of prefix computation, an important problem due to the number of naturally occurring computations it can model, is studied. Finally, a general methodology is given for design of parallel algorithms that can be used to optimize a given design to a wide set of architectural variations. The second part of the thesis considers the design of parallel algorithms for the VLSI model of computation when the resource of time is severely restricted.
Parallel algorithms for message decomposition
Teng, S.H.; Wang, B.
1987-06-01
The authors consider the deterministic and random parallel complexity (time and processor) of message decoding: an essential problem in communications systems and translation systems. They present an optimal parallel algorithm to decompose prefix-coded messages and uniquely decipherable-coded messages in O(n/P) time, using O(P) processors (for all P:1 less than or equal toPless than or equal ton/log n) deterministically as well as randomly on the weakest version of parallel random access machines in which concurrent read and concurrent write to a cell in the common memory are not allowed. This is done by reducing decoding to parallel finite-state automata simulation and the prefix sums.
HEATR project: ATR algorithm parallelization
NASA Astrophysics Data System (ADS)
Deardorf, Catherine E.
1998-09-01
High Performance Computing (HPC) Embedded Application for Target Recognition (HEATR) is a project funded by the High Performance Computing Modernization Office through the Common HPC Software Support Initiative (CHSSI). The goal of CHSSI is to produce portable, parallel, multi-purpose, freely distributable, support software to exploit emerging parallel computing technologies and enable application of scalable HPC's for various critical DoD applications. Specifically, the CHSSI goal for HEATR is to provide portable, parallel versions of several existing ATR detection and classification algorithms to the ATR-user community to achieve near real-time capability. The HEATR project will create parallel versions of existing automatic target recognition (ATR) detection and classification algorithms and generate reusable code that will support porting and software development process for ATR HPC software. The HEATR Team has selected detection/classification algorithms from both the model- based and training-based (template-based) arena in order to consider the parallelization requirements for detection/classification algorithms across ATR technology. This would allow the Team to assess the impact that parallelization would have on detection/classification performance across ATR technology. A field demo is included in this project. Finally, any parallel tools produced to support the project will be refined and returned to the ATR user community along with the parallel ATR algorithms. This paper will review: (1) HPCMP structure as it relates to HEATR, (2) Overall structure of the HEATR project, (3) Preliminary results for the first algorithm Alpha Test, (4) CHSSI requirements for HEATR, and (5) Project management issues and lessons learned.
Nevzorova, Y A; Tolba, R; Trautwein, C; Liedtke, C
2015-04-01
The surgical procedure of two-thirds partial hepatectomy (PH) in rodents was first described more than 80 years ago by Higgins and Anderson. Nevertheless, this technique is still a state-of-the-art method for the community of liver researchers as it allows the in-depth analysis of signalling pathways involved in liver regeneration and hepatocarcinogenesis. The importance of PH as a key method in experimental hepatology has even increased in the last decade due to the increasing availability of genetically-modified mouse strains. Here, we propose a standard operating procedure (SOP) for the implementation of PH in mice, which is based on our experience of more than 10 years. In particular, the SOP offers all relevant background information on the PH model and provides comprehensive guidelines for planning and performing PH experiments. We provide established recommendations regarding optimal age and gender of animals, use of appropriate anaesthesia and biometric calculation of the experiments. We finally present an easy-to-follow step-by-step description of the complete surgical procedure including required materials, critical steps and postoperative management. This SOP especially takes into account the latest changes in animal welfare rules in the European Union but is still in agreement with current international regulations. In summary, this article provides comprehensive information for the legal application, design and implementation of PH experiments. PMID:25835741
Partial covariate adjusted regression
Şentürk, Damla; Nguyen, Danh V.
2008-01-01
Covariate adjusted regression (CAR) is a recently proposed adjustment method for regression analysis where both the response and predictors are not directly observed (Şentürk and Müller, 2005). The available data has been distorted by unknown functions of an observable confounding covariate. CAR provides consistent estimators for the coefficients of the regression between the variables of interest, adjusted for the confounder. We develop a broader class of partial covariate adjusted regression (PCAR) models to accommodate both distorted and undistorted (adjusted/unadjusted) predictors. The PCAR model allows for unadjusted predictors, such as age, gender and demographic variables, which are common in the analysis of biomedical and epidemiological data. The available estimation and inference procedures for CAR are shown to be invalid for the proposed PCAR model. We propose new estimators and develop new inference tools for the more general PCAR setting. In particular, we establish the asymptotic normality of the proposed estimators and propose consistent estimators of their asymptotic variances. Finite sample properties of the proposed estimators are investigated using simulation studies and the method is also illustrated with a Pima Indians diabetes data set. PMID:20126296
Architectures for reasoning in parallel
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.
1989-01-01
The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Efficiency of parallel direct optimization.
Janies, D A; Wheeler, W C
2001-03-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. PMID:12240679
The NICMOS Parallel Observing Program
NASA Astrophysics Data System (ADS)
McCarthy, Patrick
2002-07-01
We propose to manage the default set of pure parallels with NICMOS. Our experience with both our GO NICMOS parallel program and the public parallel NICMOS programs in cycle 7 prepared us to make optimal use of the parallel opportunities. The NICMOS G141 grism remains the most powerful survey tool for HAlpha emission-line galaxies at cosmologically interesting redshifts. It is particularly well suited to addressing two key uncertainties regarding the global history of star formation: the peak rate of star formation in the relatively unexplored but critical 1<= z <= 2 epoch, and the amount of star formation missing from UV continuum-based estimates due to high extinction. Our proposed deep G141 exposures will increase the sample of known HAlpha emission- line objects at z ~ 1.3 by roughly an order of magnitude. We will also obtain a mix of F110W and F160W images along random sight-lines to examine the space density and morphologies of the reddest galaxies. The nature of the extremely red galaxies remains unclear and our program of imaging and grism spectroscopy provides unique information regarding both the incidence of obscured star bursts and the build up of stellar mass at intermediate redshifts. In addition to carrying out the parallel program we will populate a public database with calibrated spectra and images, and provide limited ground- based optical and near-IR data for the deepest parallel fields.
Partial lipodystrophy in coeliac disease.
O'Mahony, D; O'Mahony, S; Whelton, M J; McKiernan, J
1990-01-01
The association of coeliac disease and partial lipodystrophy is described. The patient also had deficiencies of serum IgA and C3 complement (the latter associated with partial lipodystrophy). In addition, there was subclinical dermatitis herpetiformis confirmed by skin biopsy. The facial wasting of fully developed partial lipodystrophy may be misinterpreted as a sign of malabsorption but the facial, upper limb, and truncal lipodystrophy contrasts with normal pelvic and lower limb appearances. Images Figure 1 Figure 2 PMID:2379878
Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Trigonometric Integrals via Partial Fractions
ERIC Educational Resources Information Center
Chen, H.; Fulford, M.
2005-01-01
Parametric differentiation is used to derive the partial fractions decompositions of certain rational functions. Those decompositions enable us to integrate some new combinations of trigonometric functions.
Low partial discharge vacuum feedthrough
NASA Technical Reports Server (NTRS)
Benham, J. W.; Peck, S. R.
1979-01-01
Relatively discharge free vacuum feedthrough uses silver-plated copper conductor jacketed by carbon filled silicon semiconductor to reduce concentrated electric fields and minimize occurrence of partial discharge.
Experts' understanding of partial derivatives using the partial derivative machine
NASA Astrophysics Data System (ADS)
Roundy, David; Weber, Eric; Dray, Tevian; Bajracharya, Rabindra R.; Dorko, Allison; Smith, Emily M.; Manogue, Corinne A.
2015-12-01
[This paper is part of the Focused Collection on Upper Division Physics Courses.] Partial derivatives are used in a variety of different ways within physics. Thermodynamics, in particular, uses partial derivatives in ways that students often find especially confusing. We are at the beginning of a study of the teaching of partial derivatives, with a goal of better aligning the teaching of multivariable calculus with the needs of students in STEM disciplines. In this paper, we report on an initial study of expert understanding of partial derivatives across three disciplines: physics, engineering, and mathematics. We report on the central research question of how disciplinary experts understand partial derivatives, and how their concept images of partial derivatives differ, with a focus on experimentally measured quantities. Using the partial derivative machine (PDM), we probed expert understanding of partial derivatives in an experimental context without a known functional form. In particular, we investigated which representations were cued by the experts' interactions with the PDM. Whereas the physicists and engineers were quick to use measurements to find a numeric approximation for a derivative, the mathematicians repeatedly returned to speculation as to the functional form; although they were comfortable drawing qualitative conclusions about the system from measurements, they were reluctant to approximate the derivative through measurement. On a theoretical front, we found ways in which existing frameworks for the concept of derivative could be expanded to include numerical approximation.
The economics of parallel trade.
Danzon, P M
1998-03-01
The potential for parallel trade in the European Union (EU) has grown with the accession of low price countries and the harmonisation of registration requirements. Parallel trade implies a conflict between the principle of autonomy of member states to set their own pharmaceutical prices, the principle of free trade and the industrial policy goal of promoting innovative research and development (R&D). Parallel trade in pharmaceuticals does not yield the normal efficiency gains from trade because countries achieve low pharmaceutical prices by aggressive regulation, not through superior efficiency. In fact, parallel trade reduces economic welfare by undermining price differentials between markets. Pharmaceutical R&D is a global joint cost of serving all consumers worldwide; it accounts for roughly 30% of total costs. Optimal (welfare maximising) pricing to cover joint costs (Ramsey pricing) requires setting different prices in different markets, based on inverse demand elasticities. By contrast, parallel trade and regulation based on international price comparisons tend to force price convergence across markets. In response, manufacturers attempt to set a uniform 'euro' price. The primary losers from 'euro' pricing will be consumers in low income countries who will face higher prices or loss of access to new drugs. In the long run, even higher income countries are likely to be worse off with uniform prices, because fewer drugs will be developed. One policy option to preserve price differentials is to exempt on-patent products from parallel trade. An alternative is confidential contracting between individual manufacturers and governments to provide country-specific ex post discounts from the single 'euro' wholesale price, similar to rebates used by managed care in the US. This would preserve differentials in transactions prices even if parallel trade forces convergence of wholesale prices. PMID:10178655
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)
NASA Technical Reports Server (NTRS)
Straeter, T. A.; Markos, A. T.
1975-01-01
A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
Bounded Parallel-Batch Scheduling on Unrelated Parallel Machines
NASA Astrophysics Data System (ADS)
Miao, Cuixia; Zhang, Yuzhong; Wang, Chengfei
In this paper, we consider the bounded parallel-batch scheduling problem on unrelated parallel machines. Problems R m |B|F are NP-hard for any objective function F. For this reason, we discuss the special case with p ij = p i for i = 1, 2, ⋯ , m , j = 1, 2, ⋯ , n. We give optimal algorithms for the general scheduling to minimize total weighted completion time, makespan and the number of tardy jobs. And we design pseudo-polynomial time algorithms for the case with rejection penalty to minimize the makespan and the total weighted completion time plus the total penalty of the rejected jobs, respectively.
Parallelizing Timed Petri Net simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1993-01-01
The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
Forces and pressures in adsorbing partially directed walks
NASA Astrophysics Data System (ADS)
Janse van Rensburg, E. J.; Prellberg, T.
2016-05-01
Polymers in confined spaces lose conformational entropy. This induces a net repulsive entropic force on the walls of the confining space. A model for this phenomenon is a lattice walk between confining walls, and in this paper a model of an adsorbing partially directed walk is used. The walk is placed in a half square lattice {{{L}}}+2 with boundary \\partial {{{L}}}+2, and confined between two vertical parallel walls, which are vertical lines in the lattice, a distance w apart. The free energy of the walk is determined, as a function of w, for walks with endpoints in the confining walls and adsorbing in \\partial {{{L}}}+2. This gives the entropic force on the confining walls as a function of w. It is shown that there are zero force points in this model and the locations of these points are determined, in some cases exactly, and in other cases asymptotically.
Are Electron Partial Waves Real
NASA Astrophysics Data System (ADS)
Yenen, O.; McLaughlin, K. W.
2005-05-01
Experiments determining the partial wave content of electrons are uncommon. The standard approach to partial wave expansion of the wavefunction of electrons often ignores their spin. In this non-relativistic approximation the partial waves are labeled by their orbital angular momentum quantum number, e.g. d-waves. As our previous work has shown, this non-relativistic approximation usually fails for photoelectrons. Partial waves should be further specified by their total angular momentum. With d-waves for example, one would need to distinguish between d3/2 and d5/2 partial waves. Although energetically degenerate, fully relativistic d3/2 and d5/2 partial waves of photoelectrons have fundamentally different angular distributions. Using experimental and theoretical methods we have developed, we obtain partial wave probabilities of photoelectrons from polarization measurements of ionic fluorescence. We found that for selected states of the residual ion, there are energy regions where the photoelectron is in a single partial wave with predictable angular distributions.
PARAVT: Parallel Voronoi Tessellation code
NASA Astrophysics Data System (ADS)
Gonzalez, Roberto E.
2016-01-01
We present a new open source code for massive parallel computation of Voronoi tessellations(VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition take into account consistent boundary computation between tasks, and support periodic conditions. In addition, the code compute neighbors lists, Voronoi density and Voronoi cell volumes for each particle, and can compute density on a regular grid.
Massively parallel MRI detector arrays
NASA Astrophysics Data System (ADS)
Keil, Boris; Wald, Lawrence L.
2013-04-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays.
Massively Parallel MRI Detector Arrays
Keil, Boris; Wald, Lawrence L
2013-01-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758
Massively parallel MRI detector arrays.
Keil, Boris; Wald, Lawrence L
2013-04-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called "ultimate" SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758
Fast data parallel polygon rendering
Ortega, F.A.; Hansen, C.D.
1993-09-01
This paper describes a parallel method for polygonal rendering on a massively parallel SIMD machine. This method, based on a simple shading model, is targeted for applications which require very fast polygon rendering for extremely large sets of polygons such as is found in many scientific visualization applications. The algorithms described in this paper are incorporated into a library of 3D graphics routines written for the Connection Machine. The routines are implemented on both the CM-200 and the CM-5. This library enables a scientists to display 3D shaded polygons directly from a parallel machine without the need to transmit huge amounts of data to a post-processing rendering system.
Parallel integrated frame synchronizer chip
NASA Technical Reports Server (NTRS)
Ghuman, Parminder Singh (Inventor); Solomon, Jeffrey Michael (Inventor); Bennett, Toby Dennis (Inventor)
2000-01-01
A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.