parallel time varying: Topics by Science.gov

Sample records for parallel time varying

A Parallel Pipelined Renderer for the Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Chiueh, Tzi-Cker; Ma, Kwan-Liu

1997-01-01

This paper presents a strategy for efficiently rendering time-varying volume data sets on a distributed-memory parallel computer. Time-varying volume data take large storage space and visualizing them requires reading large files continuously or periodically throughout the course of the visualization process. Instead of using all the processors to collectively render one volume at a time, a pipelined rendering process is formed by partitioning processors into groups to render multiple volumes concurrently. In this way, the overall rendering time may be greatly reduced because the pipelined rendering tasks are overlapped with the I/O required to load each volume into a group of processors; moreover, parallelization overhead may be reduced as a result of partitioning the processors. We modify an existing parallel volume renderer to exploit various levels of rendering parallelism and to study how the partitioning of processors may lead to optimal rendering performance. Two factors which are important to the overall execution time are re-source utilization efficiency and pipeline startup latency. The optimal partitioning configuration is the one that balances these two factors. Tests on Intel Paragon computers show that in general optimal partitionings do exist for a given rendering task and result in 40-50% saving in overall rendering time.
Parallel Rendering of Large Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Garbutt, Alexander E.

2005-01-01

Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.
Parallel Processing of Broad-Band PPM Signals

NASA Technical Reports Server (NTRS)

Gray, Andrew; Kang, Edward; Lay, Norman; Vilnrotter, Victor; Srinivasan, Meera; Lee, Clement

2010-01-01

A parallel-processing algorithm and a hardware architecture to implement the algorithm have been devised for timeslot synchronization in the reception of pulse-position-modulated (PPM) optical or radio signals. As in the cases of some prior algorithms and architectures for parallel, discrete-time, digital processing of signals other than PPM, an incoming broadband signal is divided into multiple parallel narrower-band signals by means of sub-sampling and filtering. The number of parallel streams is chosen so that the frequency content of the narrower-band signals is low enough to enable processing by relatively-low speed complementary metal oxide semiconductor (CMOS) electronic circuitry. The algorithm and architecture are intended to satisfy requirements for time-varying time-slot synchronization and post-detection filtering, with correction of timing errors independent of estimation of timing errors. They are also intended to afford flexibility for dynamic reconfiguration and upgrading. The architecture is implemented in a reconfigurable CMOS processor in the form of a field-programmable gate array. The algorithm and its hardware implementation incorporate three separate time-varying filter banks for three distinct functions: correction of sub-sample timing errors, post-detection filtering, and post-detection estimation of timing errors. The design of the filter bank for correction of timing errors, the method of estimating timing errors, and the design of a feedback-loop filter are governed by a host of parameters, the most critical one, with regard to processing very broadband signals with CMOS hardware, being the number of parallel streams (equivalently, the rate-reduction parameter).
A GPU-Accelerated Approach for Feature Tracking in Time-Varying Imagery Datasets.

PubMed

Peng, Chao; Sahani, Sandip; Rushing, John

2017-10-01

We propose a novel parallel connected component labeling (CCL) algorithm along with efficient out-of-core data management to detect and track feature regions of large time-varying imagery datasets. Our approach contributes to the big data field with parallel algorithms tailored for GPU architectures. We remove the data dependency between frames and achieve pixel-level parallelism. Due to the large size, the entire dataset cannot fit into cached memory. Frames have to be streamed through the memory hierarchy (disk to CPU main memory and then to GPU memory), partitioned, and processed as batches, where each batch is small enough to fit into the GPU. To reconnect the feature regions that are separated due to data partitioning, we present a novel batch merging algorithm to extract the region connection information across multiple batches in a parallel fashion. The information is organized in a memory-efficient structure and supports fast indexing on the GPU. Our experiment uses a commodity workstation equipped with a single GPU. The results show that our approach can efficiently process a weather dataset composed of terabytes of time-varying radar images. The advantages of our approach are demonstrated by comparing to the performance of an efficient CPU cluster implementation which is being used by the weather scientists.
Method and apparatus for probing relative volume fractions

DOEpatents

Jandrasits, Walter G.; Kikta, Thomas J.

1998-01-01

A relative volume fraction probe particularly for use in a multiphase fluid system includes two parallel conductive paths defining therebetween a sample zone within the system. A generating unit generates time varying electrical signals which are inserted into one of the two parallel conductive paths. A time domain reflectometer receives the time varying electrical signals returned by the second of the two parallel conductive paths and, responsive thereto, outputs a curve of impedance versus distance. An analysis unit then calculates the area under the curve, subtracts the calculated area from an area produced when the sample zone consists entirely of material of a first fluid phase, and divides this calculated difference by the difference between an area produced when the sample zone consists entirely of material of the first fluid phase and an area produced when the sample zone consists entirely of material of a second fluid phase. The result is the volume fraction.
Method and apparatus for probing relative volume fractions

DOEpatents

Jandrasits, W.G.; Kikta, T.J.

1998-03-17

A relative volume fraction probe particularly for use in a multiphase fluid system includes two parallel conductive paths defining therebetween a sample zone within the system. A generating unit generates time varying electrical signals which are inserted into one of the two parallel conductive paths. A time domain reflectometer receives the time varying electrical signals returned by the second of the two parallel conductive paths and, responsive thereto, outputs a curve of impedance versus distance. An analysis unit then calculates the area under the curve, subtracts the calculated area from an area produced when the sample zone consists entirely of material of a first fluid phase, and divides this calculated difference by the difference between an area produced when the sample zone consists entirely of material of the first fluid phase and an area produced when the sample zone consists entirely of material of a second fluid phase. The result is the volume fraction. 9 figs.
A discrete time-varying internal model-based approach for high precision tracking of a multi-axis servo gantry.

PubMed

Zhang, Zhen; Yan, Peng; Jiang, Huan; Ye, Peiqing

2014-09-01

In this paper, we consider the discrete time-varying internal model-based control design for high precision tracking of complicated reference trajectories generated by time-varying systems. Based on a novel parallel time-varying internal model structure, asymptotic tracking conditions for the design of internal model units are developed, and a low order robust time-varying stabilizer is further synthesized. In a discrete time setting, the high precision tracking control architecture is deployed on a Voice Coil Motor (VCM) actuated servo gantry system, where numerical simulations and real time experimental results are provided, achieving the tracking errors around 3.5‰ for frequency-varying signals. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Method and apparatus for probing relative volume fractions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jandrasits, W.G.; Kikta, T.J.

1996-12-31

A relative volume fraction probe particularly for use in a multiphase fluid system includes two parallel conductive paths defining there between a sample zone within the system. A generating unit generates time varying electrical signals which are inserted into one of the two parallel conductive paths. A time domain reflectometer receives the time varying electrical signals returned by the second of the two parallel conductive paths and, responsive thereto, outputs a curve of impedance versus distance. An analysis unit then calculates the area under the curve, subtracts the calculated area from an area produced when the sample zone consists entirelymore » of material of a first fluid phase, and divides this calculated difference by the difference between an area produced when the sample zone consists entirely of material of the first fluid phase and an area produced when the sample zone consists entirely of material of a second fluid phase. The result is the volume fraction.« less
Simple reaction time to the onset of time-varying sounds.

PubMed

Schlittenlacher, Josef; Ellermeier, Wolfgang

2015-10-01

Although auditory simple reaction time (RT) is usually defined as the time elapsing between the onset of a stimulus and a recorded reaction, a sound cannot be specified by a single point in time. Therefore, the present work investigates how the period of time immediately after onset affects RT. By varying the stimulus duration between 10 and 500 msec, this critical duration was determined to fall between 32 and 40 milliseconds for a 1-kHz pure tone at 70 dB SPL. In a second experiment, the role of the buildup was further investigated by varying the rise time and its shape. The increment in RT for extending the rise time by a factor of ten was about 7 to 8 msec. There was no statistically significant difference in RT between a Gaussian and linear rise shape. A third experiment varied the modulation frequency and point of onset of amplitude-modulated tones, producing onsets at different initial levels with differently rapid increase or decrease immediately afterwards. The results of all three experiments results were explained very well by a straightforward extension of the parallel grains model (Miller and Ulrich Cogn. Psychol. 46, 101-151, 2003), a probabilistic race model employing many parallel channels. The extension of the model to time-varying sounds made the activation of such a grain depend on intensity as a function of time rather than a constant level. A second approach by mechanisms known from loudness produced less accurate predictions.
Theory of electromagnetic cyclotron wave growth in a time-varying magnetoplasma

NASA Technical Reports Server (NTRS)

Gail, William B.

1990-01-01

The effect of a time-dependent perturbation in the magnetoplasma on the wave and particle populations is investigated using the Kennel-Petchek (1966) approach. Perturbations in the cold plasma density, energetic particle distribution, and resonance condition are calculated on the basis of the ideal MHD assumption given an arbitrary compressional magnetic field perturbation. An equation is derived describing the time-dependent growth rate for parallel propagating electromagnetic cyclotron waves in a time-varying magnetoplasma with perturbations superimposed on an equilibrium configuration.
SU-F-SPS-09: Parallel MC Kernel Calculations for VMAT Plan Improvement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chamberlain, S; Roswell Park Cancer Institute, Buffalo, NY; French, S

Purpose: Adding kernels (small perturbations in leaf positions) to the existing apertures of VMAT control points may improve plan quality. We investigate the calculation of kernel doses using a parallelized Monte Carlo (MC) method. Methods: A clinical prostate VMAT DICOM plan was exported from Eclipse. An arbitrary control point and leaf were chosen, and a modified MLC file was created, corresponding to the leaf position offset by 0.5cm. The additional dose produced by this 0.5 cm × 0.5 cm kernel was calculated using the DOSXYZnrc component module of BEAMnrc. A range of particle history counts were run (varying from 3more » × 10{sup 6} to 3 × 10{sup 7}); each job was split among 1, 10, or 100 parallel processes. A particle count of 3 × 10{sup 6} was established as the lower range because it provided the minimal accuracy level. Results: As expected, an increase in particle counts linearly increases run time. For the lowest particle count, the time varied from 30 hours for the single-processor run, to 0.30 hours for the 100-processor run. Conclusion: Parallel processing of MC calculations in the EGS framework significantly decreases time necessary for each kernel dose calculation. Particle counts lower than 1 × 10{sup 6} have too large of an error to output accurate dose for a Monte Carlo kernel calculation. Future work will investigate increasing the number of parallel processes and optimizing run times for multiple kernel calculations.« less
Parallel algorithms for simulating continuous time Markov chains

NASA Technical Reports Server (NTRS)

Nicol, David M.; Heidelberger, Philip

1992-01-01

We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
Evidence against a Central Control Model of Timing in Typing.

DTIC Science & Technology

1981-12-01

timing pattern, or "motor engram ," for each common word and some common letter sequences. These timing patterns may vary from one typist to another...represented within the engram by * using a (functionally) parallel arrangement. (Terzuolo & Viviani, 1960, pp. 1101-1102) The contrast here is between a
Research on parallel load sharing principle of piezoelectric six-dimensional heavy force/torque sensor

NASA Astrophysics Data System (ADS)

Liu, Wei; Li, Ying-jun; Jia, Zhen-yuan; Zhang, Jun; Qian, Min

2011-01-01

In working process of huge heavy-load manipulators, such as the free forging machine, hydraulic die-forging press, forging manipulator, heavy grasping manipulator, large displacement manipulator, measurement of six-dimensional heavy force/torque and real-time force feedback of the operation interface are basis to realize coordinate operation control and force compliance control. It is also an effective way to raise the control accuracy and achieve highly efficient manufacturing. Facing to solve dynamic measurement problem on six-dimensional time-varying heavy load in extremely manufacturing process, the novel principle of parallel load sharing on six-dimensional heavy force/torque is put forward. The measuring principle of six-dimensional force sensor is analyzed, and the spatial model is built and decoupled. The load sharing ratios are analyzed and calculated in vertical and horizontal directions. The mapping relationship between six-dimensional heavy force/torque value to be measured and output force value is built. The finite element model of parallel piezoelectric six-dimensional heavy force/torque sensor is set up, and its static characteristics are analyzed by ANSYS software. The main parameters, which affect load sharing ratio, are analyzed. The experiments for load sharing with different diameters of parallel axis are designed. The results show that the six-dimensional heavy force/torque sensor has good linearity. Non-linearity errors are less than 1%. The parallel axis makes good effect of load sharing. The larger the diameter is, the better the load sharing effect is. The results of experiments are in accordance with the FEM analysis. The sensor has advantages of large measuring range, good linearity, high inherent frequency, and high rigidity. It can be widely used in extreme environments for real-time accurate measurement of six-dimensional time-varying huge loads on manipulators.
PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A

2012-01-01

At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM.more » To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.« less
Maximal clique enumeration with data-parallel primitives

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lessley, Brenton; Perciano, Talita; Mathai, Manish

The enumeration of all maximal cliques in an undirected graph is a fundamental problem arising in several research areas. We consider maximal clique enumeration on shared-memory, multi-core architectures and introduce an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures. We study the performance of the algorithm via experiments varying over benchmark graphs and architectures. Overall, we observe that our algorithm achieves up to a 33-time speedup and 9-time speedup over state-of-the-art distributed and serial algorithms, respectively, for graphs with higher ratios of maximal cliques to total cliques. Further, we attainmore » additional speedups on a GPU architecture, demonstrating the portable performance of our data-parallel design.« less
Parallel Coding of First- and Second-Order Stimulus Attributes by Midbrain Electrosensory Neurons

PubMed Central

McGillivray, Patrick; Vonderschen, Katrin; Fortune, Eric S.; Chacron, Maurice J.

2015-01-01

Natural stimuli often have time-varying first-order (i.e., mean) and second-order (i.e., variance) attributes that each carry critical information for perception and can vary independently over orders of magnitude. Experiments have shown that sensory systems continuously adapt their responses based on changes in each of these attributes. This adaptation creates ambiguity in the neural code as multiple stimuli may elicit the same neural response. While parallel processing of first- and second-order attributes by separate neural pathways is sufficient to remove this ambiguity, the existence of such pathways and the neural circuits that mediate their emergence have not been uncovered to date. We recorded the responses of midbrain electrosensory neurons in the weakly electric fish Apteronotus leptorhynchus to stimuli with first- and second-order attributes that varied independently in time. We found three distinct groups of midbrain neurons: the first group responded to both first- and second-order attributes, the second group responded selectively to first-order attributes, and the last group responded selectively to second-order attributes. In contrast, all afferent hindbrain neurons responded to both first- and second-order attributes. Using computational analyses, we show how inputs from a heterogeneous population of ON- and OFF-type afferent neurons are combined to give rise to response selectivity to either first- or second-order stimulus attributes in midbrain neurons. Our study thus uncovers, for the first time, generic and widely applicable mechanisms by which parallel processing of first- and second-order stimulus attributes emerges in the brain. PMID:22514313
Efficient Thread Labeling for Monitoring Programs with Nested Parallelism

NASA Astrophysics Data System (ADS)

Ha, Ok-Kyoon; Kim, Sun-Sook; Jun, Yong-Kee

It is difficult and cumbersome to detect data races occurred in an execution of parallel programs. Any on-the-fly race detection techniques using Lamport's happened-before relation needs a thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel threads. NR labeling is an efficient thread labeling scheme for the fork-join program model with nested parallelism, because its efficiency depends only on the nesting depth for every fork and join operation. This paper presents an improved NR labeling, called e-NR labeling, in which every thread generates its label by inheriting the pointer to its ancestor list from the parent threads or by updating the pointer in a constant amount of time and space. This labeling is more efficient than the NR labeling, because its efficiency does not depend on the nesting depth for every fork and join operation. Some experiments were performed with OpenMP programs having nesting depths of three or four and maximum parallelisms varying from 10,000 to 1,000,000. The results show that e-NR is 5 times faster than NR labeling and 4.3 times faster than OS labeling in the average time for creating and maintaining the thread labels. In average space required for labeling, it is 3.5 times smaller than NR labeling and 3 times smaller than OS labeling.
Plasma Generator Using Spiral Conductors

NASA Technical Reports Server (NTRS)

Szatkowski, George N. (Inventor); Dudley, Kenneth L. (Inventor); Ticatch, Larry A. (Inventor); Smith, Laura J. (Inventor); Koppen, Sandra V. (Inventor); Nguyen, Truong X. (Inventor); Ely, Jay J. (Inventor)

2016-01-01

A plasma generator includes a pair of identical spiraled electrical conductors separated by dielectric material. Both spiraled conductors have inductance and capacitance wherein, in the presence of a time-varying electromagnetic field, the spiraled conductors resonate to generate a harmonic electromagnetic field response. The spiraled conductors lie in parallel planes and partially overlap one another in a direction perpendicular to the parallel planes. The geometric centers of the spiraled conductors define endpoints of a line that is non-perpendicular with respect to the parallel planes. A voltage source coupled across the spiraled conductors applies a voltage sufficient to generate a plasma in at least a portion of the dielectric material.
Evidence for parallel consolidation of motion direction and orientation into visual short-term memory.

PubMed

Rideaux, Reuben; Apthorp, Deborah; Edwards, Mark

2015-02-12

Recent findings have indicated the capacity to consolidate multiple items into visual short-term memory in parallel varies as a function of the type of information. That is, while color can be consolidated in parallel, evidence suggests that orientation cannot. Here we investigated the capacity to consolidate multiple motion directions in parallel and reexamined this capacity using orientation. This was achieved by determining the shortest exposure duration necessary to consolidate a single item, then examining whether two items, presented simultaneously, could be consolidated in that time. The results show that parallel consolidation of direction and orientation information is possible, and that parallel consolidation of direction appears to be limited to two. Additionally, we demonstrate the importance of adequate separation between feature intervals used to define items when attempting to consolidate in parallel, suggesting that when multiple items are consolidated in parallel, as opposed to serially, the resolution of representations suffer. Finally, we used facilitation of spatial attention to show that the deterioration of item resolution occurs during parallel consolidation, as opposed to storage. © 2015 ARVO.

Survival distributions impact the power of randomized placebo-phase design and parallel groups randomized clinical trials.

PubMed

Abrahamyan, Lusine; Li, Chuan Silvia; Beyene, Joseph; Willan, Andrew R; Feldman, Brian M

2011-03-01

The study evaluated the power of the randomized placebo-phase design (RPPD)-a new design of randomized clinical trials (RCTs), compared with the traditional parallel groups design, assuming various response time distributions. In the RPPD, at some point, all subjects receive the experimental therapy, and the exposure to placebo is for only a short fixed period of time. For the study, an object-oriented simulation program was written in R. The power of the simulated trials was evaluated using six scenarios, where the treatment response times followed the exponential, Weibull, or lognormal distributions. The median response time was assumed to be 355 days for the placebo and 42 days for the experimental drug. Based on the simulation results, the sample size requirements to achieve the same level of power were different under different response time to treatment distributions. The scenario where the response times followed the exponential distribution had the highest sample size requirement. In most scenarios, the parallel groups RCT had higher power compared with the RPPD. The sample size requirement varies depending on the underlying hazard distribution. The RPPD requires more subjects to achieve a similar power to the parallel groups design. Copyright Â© 2011 Elsevier Inc. All rights reserved.
Brood parasitism of the Abert's Towhee: Timing, frequency, and effects

Treesearch

Deborah M. Finch

1983-01-01

The effects of brood parasitism by the dwarf race of the Brown-headed Cowbird (Molothrus ater obscurus) on the nesting success of the Abert's Towhee (Pipilo aberti) in the lower Colorado River valley were studied. The frequency of cowbird parasitism varied significantly with time of season. The laying season of cowbirds paralleled that of migratory songbirds, but...
Evaluation of a new parallel numerical parameter optimization algorithm for a dynamical system

NASA Astrophysics Data System (ADS)

Duran, Ahmet; Tuncel, Mehmet

2016-10-01

It is important to have a scalable parallel numerical parameter optimization algorithm for a dynamical system used in financial applications where time limitation is crucial. We use Message Passing Interface parallel programming and present such a new parallel algorithm for parameter estimation. For example, we apply the algorithm to the asset flow differential equations that have been developed and analyzed since 1989 (see [3-6] and references contained therein). We achieved speed-up for some time series to run up to 512 cores (see [10]). Unlike [10], we consider more extensive financial market situations, for example, in presence of low volatility, high volatility and stock market price at a discount/premium to its net asset value with varying magnitude, in this work. Moreover, we evaluated the convergence of the model parameter vector, the nonlinear least squares error and maximum improvement factor to quantify the success of the optimization process depending on the number of initial parameter vectors.
Distributed control system for parallel-connected DC boost converters

DOEpatents

Goldsmith, Steven

2017-08-15

The disclosed invention is a distributed control system for operating a DC bus fed by disparate DC power sources that service a known or unknown load. The voltage sources vary in v-i characteristics and have time-varying, maximum supply capacities. Each source is connected to the bus via a boost converter, which may have different dynamic characteristics and power transfer capacities, but are controlled through PWM. The invention tracks the time-varying power sources and apportions their power contribution while maintaining the DC bus voltage within the specifications. A central digital controller solves the steady-state system for the optimal duty cycle settings that achieve a desired power supply apportionment scheme for a known or predictable DC load. A distributed networked control system is derived from the central system that utilizes communications among controllers to compute a shared estimate of the unknown time-varying load through shared bus current measurements and bus voltage measurements.
VizieR Online Data Catalog: Solar wind 3D magnetohydrodynamic simulation (Chhiber+, 2017)

NASA Astrophysics Data System (ADS)

Chhiber, R.; Subedi, P.; Usmanov, A. V.; Matthaeus, W. H.; Ruffolo, D.; Goldstein, M. L.; Parashar, T. N.

2017-08-01

We use a three-dimensional magnetohydrodynamic simulation of the solar wind to calculate cosmic-ray diffusion coefficients throughout the inner heliosphere (2Rȯ-3au). The simulation resolves large-scale solar wind flow, which is coupled to small-scale fluctuations through a turbulence model. Simulation results specify background solar wind fields and turbulence parameters, which are used to compute diffusion coefficients and study their behavior in the inner heliosphere. The parallel mean free path (mfp) is evaluated using quasi-linear theory, while the perpendicular mfp is determined from nonlinear guiding center theory with the random ballistic interpretation. Several runs examine varying turbulent energy and different solar source dipole tilts. We find that for most of the inner heliosphere, the radial mfp is dominated by diffusion parallel to the mean magnetic field; the parallel mfp remains at least an order of magnitude larger than the perpendicular mfp, except in the heliospheric current sheet, where the perpendicular mfp may be a few times larger than the parallel mfp. In the ecliptic region, the perpendicular mfp may influence the radial mfp at heliocentric distances larger than 1.5au; our estimations of the parallel mfp in the ecliptic region at 1 au agree well with the Palmer "consensus" range of 0.08-0.3au. Solar activity increases perpendicular diffusion and reduces parallel diffusion. The parallel mfp mostly varies with rigidity (P) as P.33, and the perpendicular mfp is weakly dependent on P. The mfps are weakly influenced by the choice of long-wavelength power spectra. (2 data files).
A Framework to Analyze the Performance of Load Balancing Schemes for Ensembles of Stochastic Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahn, Tae-Hyuk; Sandu, Adrian; Watson, Layne T.

2015-08-01

Ensembles of simulations are employed to estimate the statistics of possible future states of a system, and are widely used in important applications such as climate change and biological modeling. Ensembles of runs can naturally be executed in parallel. However, when the CPU times of individual simulations vary considerably, a simple strategy of assigning an equal number of tasks per processor can lead to serious work imbalances and low parallel efficiency. This paper presents a new probabilistic framework to analyze the performance of dynamic load balancing algorithms for ensembles of simulations where many tasks are mapped onto each processor, andmore » where the individual compute times vary considerably among tasks. Four load balancing strategies are discussed: most-dividing, all-redistribution, random-polling, and neighbor-redistribution. Simulation results with a stochastic budding yeast cell cycle model are consistent with the theoretical analysis. It is especially significant that there is a provable global decrease in load imbalance for the local rebalancing algorithms due to scalability concerns for the global rebalancing algorithms. The overall simulation time is reduced by up to 25 %, and the total processor idle time by 85 %.« less
High voltage pulse generator. [Patent application

DOEpatents

Fasching, G.E.

1975-06-12

An improved high-voltage pulse generator is described which is especially useful in ultrasonic testing of rock core samples. An N number of capacitors are charged in parallel to V volts and at the proper instance are coupled in series to produce a high-voltage pulse of N times V volts. Rapid switching of the capacitors from the paralleled charging configuration to the series discharging configuration is accomplished by using silicon-controlled rectifiers which are chain self-triggered following the initial triggering of the first rectifier connected between the first and second capacitors. A timing and triggering circuit is provided to properly synchronize triggering pulses to the first SCR at a time when the charging voltage is not being applied to the parallel-connected charging capacitors. The output voltage can be readily increased by adding additional charging networks. The circuit allows the peak level of the output to be easily varied over a wide range by using a variable autotransformer in the charging circuit.
Efficient parallel architecture for highly coupled real-time linear system applications

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Homaifar, Abdollah; Barua, Soumavo

1988-01-01

A systematic procedure is developed for exploiting the parallel constructs of computation in a highly coupled, linear system application. An overall top-down design approach is adopted. Differential equations governing the application under consideration are partitioned into subtasks on the basis of a data flow analysis. The interconnected task units constitute a task graph which has to be computed in every update interval. Multiprocessing concepts utilizing parallel integration algorithms are then applied for efficient task graph execution. A simple scheduling routine is developed to handle task allocation while in the multiprocessor mode. Results of simulation and scheduling are compared on the basis of standard performance indices. Processor timing diagrams are developed on the basis of program output accruing to an optimal set of processors. Basic architectural attributes for implementing the system are discussed together with suggestions for processing element design. Emphasis is placed on flexible architectures capable of accommodating widely varying application specifics.
Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak

1996-01-01

Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
A parallel implementation of a multisensor feature-based range-estimation method

NASA Technical Reports Server (NTRS)

Suorsa, Raymond E.; Sridhar, Banavar

1993-01-01

There are many proposed vision based methods to perform obstacle detection and avoidance for autonomous or semi-autonomous vehicles. All methods, however, will require very high processing rates to achieve real time performance. A system capable of supporting autonomous helicopter navigation will need to extract obstacle information from imagery at rates varying from ten frames per second to thirty or more frames per second depending on the vehicle speed. Such a system will need to sustain billions of operations per second. To reach such high processing rates using current technology, a parallel implementation of the obstacle detection/ranging method is required. This paper describes an efficient and flexible parallel implementation of a multisensor feature-based range-estimation algorithm, targeted for helicopter flight, realized on both a distributed-memory and shared-memory parallel computer.
Intrinsic suppression of turbulence in linear plasma devices

NASA Astrophysics Data System (ADS)

Leddy, J.; Dudson, B.

2017-12-01

Plasma turbulence is the dominant transport mechanism for heat and particles in magnetised plasmas in linear devices and tokamaks, so the study of turbulence is important in limiting and controlling this transport. Linear devices provide an axial magnetic field that serves to confine a plasma in cylindrical geometry as it travels along the magnetic field from the source to the strike point. Due to perpendicular transport, the plasma density and temperature have a roughly Gaussian radial profile with gradients that drive instabilities, such as resistive drift-waves and Kelvin-Helmholtz. If unstable, these instabilities cause perturbations to grow resulting in saturated turbulence, increasing the cross-field transport of heat and particles. When the plasma emerges from the source, there is a time, {τ }\\parallel , that describes the lifetime of the plasma based on parallel velocity and length of the device. As the plasma moves down the device, it also moves azimuthally according to E × B and diamagnetic velocities. There is a balance point in these parallel and perpendicular times that sets the stabilisation threshold. We simulate plasmas with a variety of parallel lengths and magnetic fields to vary the parallel and perpendicular lifetimes, respectively, and find that there is a clear correlation between the saturated RMS density perturbation level and the balance between these lifetimes. The threshold of marginal stability is seen to exist where {τ }\\parallel ≈ 11{τ }\\perp . This is also associated with the product {τ }\\parallel {γ }* , where {γ }* is the drift-wave linear growth rate, indicating that the instability must exist for roughly 100 times the growth time for the instability to enter the nonlinear growth phase. We explore the root of this correlation and the implications for linear device design.
Numerical Modeling of Internal Flow Aerodynamics. Part 2: Unsteady Flows

DTIC Science & Technology

2004-01-01

fluid- structure coupling, ...). • • • • • Prediction: in this simulation, we want to assess the effect of a change in SRM geometry, propellant...surface reaches the structure ). The third characteristic time describes the slow evolution of the internal geometry. The last characteristic time...incorporates fluid- structure coupling facility, and is parallel. MOPTI® manages exchanges between two principal computational modules: • • A varying
High-performance parallel approaches for three-dimensional light detection and ranging point clouds gridding

NASA Astrophysics Data System (ADS)

Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon

2017-01-01

With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
Gyrokinetic Magnetohydrodynamics and the Associated Equilibrium

NASA Astrophysics Data System (ADS)

Lee, W. W.; Hudson, S. R.; Ma, C. H.

2017-10-01

A proposed scheme for the calculations of gyrokinetic MHD and its associated equilibrium is discussed related a recent paper on the subject. The scheme is based on the time-dependent gyrokinetic vorticity equation and parallel Ohm's law, as well as the associated gyrokinetic Ampere's law. This set of equations, in terms of the electrostatic potential, ϕ, and the vector potential, ϕ , supports both spatially varying perpendicular and parallel pressure gradients and their associated currents. The MHD equilibrium can be reached when ϕ -> 0 and A becomes constant in time, which, in turn, gives ∇ . (J|| +J⊥) = 0 and the associated magnetic islands. Examples in simple cylindrical geometry will be given. The present work is partially supported by US DoE Grant DE-AC02-09CH11466.
Parallel programming of gradient-based iterative image reconstruction schemes for optical tomography.

PubMed

Hielscher, Andreas H; Bartel, Sebastian

2004-02-01

Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
An asymptotic-preserving Lagrangian algorithm for the time-dependent anisotropic heat transport equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chacon, Luis; del-Castillo-Negrete, Diego; Hauck, Cory D.

2014-09-01

We propose a Lagrangian numerical algorithm for a time-dependent, anisotropic temperature transport equation in magnetized plasmas in the large guide field regime. The approach is based on an analytical integral formal solution of the parallel (i.e., along the magnetic field) transport equation with sources, and it is able to accommodate both local and non-local parallel heat flux closures. The numerical implementation is based on an operator-split formulation, with two straightforward steps: a perpendicular transport step (including sources), and a Lagrangian (field-line integral) parallel transport step. Algorithmically, the first step is amenable to the use of modern iterative methods, while themore » second step has a fixed cost per degree of freedom (and is therefore scalable). Accuracy-wise, the approach is free from the numerical pollution introduced by the discrete parallel transport term when the perpendicular to parallel transport coefficient ratio X ⊥ /X ∥ becomes arbitrarily small, and is shown to capture the correct limiting solution when ε = X⊥L 2 ∥/X1L 2 ⊥ → 0 (with L∥∙ L⊥ , the parallel and perpendicular diffusion length scales, respectively). Therefore, the approach is asymptotic-preserving. We demonstrate the capabilities of the scheme with several numerical experiments with varying magnetic field complexity in two dimensions, including the case of transport across a magnetic island.« less
An Efficient Fuzzy Controller Design for Parallel Connected Induction Motor Drives

NASA Astrophysics Data System (ADS)

Usha, S.; Subramani, C.

2018-04-01

Generally, an induction motors are highly non-linear and has a complex time varying dynamics. This makes the speed control of an induction motor a challenging issue in the industries. But, due to the recent trends in the power electronic devices and intelligent controllers, the speed control of the induction motor is achieved by including non-linear characteristics also. Conventionally a single inverter is used to run one induction motor in industries. In the traction applications, two or more inductions motors are operated in parallel to reduce the size and cost of induction motors. In this application, the parallel connected induction motors can be driven by a single inverter unit. The stability problems may introduce in the parallel operation under low speed operating conditions. Hence, the speed deviations should be reduce with help of suitable controllers. The speed control of the parallel connected system is performed by PID controller and fuzzy logic controller. In this paper the speed response of the induction motor for the rating of IHP, 1440 rpm, and 50Hz with these controller are compared in time domain specifications. The stability analysis of the system also performed under low speed using matlab platform. The hardware model is developed for speed control using fuzzy logic controller which exhibited superior performances over the other controller.
Depth-varying azimuthal anisotropy in the Tohoku subduction channel

NASA Astrophysics Data System (ADS)

Liu, Xin; Zhao, Dapeng

2017-09-01

We determine a detailed 3-D model of azimuthal anisotropy tomography of the Tohoku subduction zone from the Japan Trench outer-rise to the back-arc near the Japan Sea coast, using a large number of high-quality P and S wave arrival-time data of local earthquakes recorded by the dense seismic network on the Japan Islands. Depth-varying seismic azimuthal anisotropy is revealed in the Tohoku subduction channel. The shallow portion of the Tohoku megathrust zone (<30 km depth) generally exhibits trench-normal fast-velocity directions (FVDs) except for the source area of the 2011 Tohoku-oki earthquake (Mw 9.0) where the FVD is nearly trench-parallel, whereas the deeper portion of the megathrust zone (at depths of ∼30-50 km) mainly exhibits trench-parallel FVDs. Trench-normal FVDs are revealed in the mantle wedge beneath the volcanic front and the back-arc. The Pacific plate mainly exhibits trench-parallel FVDs, except for the top portion of the subducting Pacific slab where visible trench-normal FVDs are revealed. A qualitative tectonic model is proposed to interpret such anisotropic features, suggesting transposition of earlier fabrics in the oceanic lithosphere into subduction-induced new structures in the subduction channel.
Design of high-performance parallelized gene predictors in MATLAB.

PubMed

Rivard, Sylvain Robert; Mailloux, Jean-Gabriel; Beguenane, Rachid; Bui, Hung Tien

2012-04-10

This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.
Parallel algorithm of real-time infrared image restoration based on total variation theory

NASA Astrophysics Data System (ADS)

Zhu, Ran; Li, Miao; Long, Yunli; Zeng, Yaoyuan; An, Wei

2015-10-01

Image restoration is a necessary preprocessing step for infrared remote sensing applications. Traditional methods allow us to remove the noise but penalize too much the gradients corresponding to edges. Image restoration techniques based on variational approaches can solve this over-smoothing problem for the merits of their well-defined mathematical modeling of the restore procedure. The total variation (TV) of infrared image is introduced as a L1 regularization term added to the objective energy functional. It converts the restoration process to an optimization problem of functional involving a fidelity term to the image data plus a regularization term. Infrared image restoration technology with TV-L1 model exploits the remote sensing data obtained sufficiently and preserves information at edges caused by clouds. Numerical implementation algorithm is presented in detail. Analysis indicates that the structure of this algorithm can be easily implemented in parallelization. Therefore a parallel implementation of the TV-L1 filter based on multicore architecture with shared memory is proposed for infrared real-time remote sensing systems. Massive computation of image data is performed in parallel by cooperating threads running simultaneously on multiple cores. Several groups of synthetic infrared image data are used to validate the feasibility and effectiveness of the proposed parallel algorithm. Quantitative analysis of measuring the restored image quality compared to input image is presented. Experiment results show that the TV-L1 filter can restore the varying background image reasonably, and that its performance can achieve the requirement of real-time image processing.

Analysis and selection of optimal function implementations in massively parallel computer

DOEpatents

Archer, Charles Jens [Rochester, MN; Peters, Amanda [Rochester, MN; Ratterman, Joseph D [Rochester, MN

2011-05-31

An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.
Parallel Processing Systems for Passive Ranging During Helicopter Flight

NASA Technical Reports Server (NTRS)

Sridhar, Bavavar; Suorsa, Raymond E.; Showman, Robert D. (Technical Monitor)

1994-01-01

The complexity of rotorcraft missions involving operations close to the ground result in high pilot workload. In order to allow a pilot time to perform mission-oriented tasks, sensor-aiding and automation of some of the guidance and control functions are highly desirable. Images from an electro-optical sensor provide a covert way of detecting objects in the flight path of a low-flying helicopter. Passive ranging consists of processing a sequence of images using techniques based on optical low computation and recursive estimation. The passive ranging algorithm has to extract obstacle information from imagery at rates varying from five to thirty or more frames per second depending on the helicopter speed. We have implemented and tested the passive ranging algorithm off-line using helicopter-collected images. However, the real-time data and computation requirements of the algorithm are beyond the capability of any off-the-shelf microprocessor or digital signal processor. This paper describes the computational requirements of the algorithm and uses parallel processing technology to meet these requirements. Various issues in the selection of a parallel processing architecture are discussed and four different computer architectures are evaluated regarding their suitability to process the algorithm in real-time. Based on this evaluation, we conclude that real-time passive ranging is a realistic goal and can be achieved with a short time.
An Analytical Time–Domain Expression for the Net Ripple Produced by Parallel Interleaved Converters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Brian B.; Krein, Philip T.

We apply modular arithmetic and Fourier series to analyze the superposition of N interleaved triangular waveforms with identical amplitudes and duty-ratios. Here, interleaving refers to the condition when a collection of periodic waveforms with identical periods are each uniformly phase-shifted across one period. The main result is a time-domain expression which provides an exact representation of the summed and interleaved triangular waveforms, where the peak amplitude and parameters of the time-periodic component are all specified in closed-form. Analysis is general and can be used to study various applications in multi-converter systems. This model is unique not only in that itmore » reveals a simple and intuitive expression for the net ripple, but its derivation via modular arithmetic and Fourier series is distinct from prior approaches. The analytical framework is experimentally validated with a system of three parallel converters under time-varying operating conditions.« less
Optical measurement of sound using time-varying laser speckle patterns

NASA Astrophysics Data System (ADS)

Leung, Terence S.; Jiang, Shihong; Hebden, Jeremy

2011-02-01

In this work, we introduce an optical technique to measure sound. The technique involves pointing a coherent pulsed laser beam on the surface of the measurement site and capturing the time-varying speckle patterns using a CCD camera. Sound manifests itself as vibrations on the surface which induce a periodic translation of the speckle pattern over time. Using a parallel speckle detection scheme, the dynamics of the time-varying speckle patterns can be captured and processed to produce spectral information of the sound. One potential clinical application is to measure pathological sounds from the brain as a screening test. We performed experiments to demonstrate the principle of the detection scheme using head phantoms. The results show that the detection scheme can measure the spectra of single frequency sounds between 100 and 2000 Hz. The detection scheme worked equally well in both a flat geometry and an anatomical head geometry. However, the current detection scheme is too slow for use in living biological tissues which has a decorrelation time of a few milliseconds. Further improvements have been suggested.
Visualizing Special Relativity: The Field of An Electric Dipole Moving at Relativistic Speed

ERIC Educational Resources Information Center

Smith, Glenn S.

2011-01-01

The electromagnetic field is determined for a time-varying electric dipole moving with a constant velocity that is parallel to its moment. Graphics are used to visualize this field in the rest frame of the dipole and in the laboratory frame when the dipole is moving at relativistic speed. Various phenomena from special relativity are clearly…
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romano, Paul K.; Siegel, Andrew R.

The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. Lastly, when the execution times for events are allowed to vary, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration.« less
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport

DOE PAGES

Romano, Paul K.; Siegel, Andrew R.

2017-07-01

The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. Lastly, when the execution times for events are allowed to vary, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration.« less
Local search to improve coordinate-based task mapping

DOE PAGES

Balzuweit, Evan; Bunde, David P.; Leung, Vitus J.; ...

2015-10-31

We present a local search strategy to improve the coordinate-based mapping of a parallel job’s tasks to the MPI ranks of its parallel allocation in order to reduce network congestion and the job’s communication time. The goal is to reduce the number of network hops between communicating pairs of ranks. Our target is applications with a nearest-neighbor stencil communication pattern running on mesh systems with non-contiguous processor allocation, such as Cray XE and XK Systems. Utilizing the miniGhost mini-app, which models the shock physics application CTH, we demonstrate that our strategy reduces application running time while also reducing the runtimemore » variability. Furthermore, we further show that mapping quality can vary based on the selected allocation algorithm, even between allocation algorithms of similar apparent quality.« less
Storm Time Evolution of Outer Radiation Belt Relativistic Electrons by a Nearly Continuous Distribution of Chorus

NASA Astrophysics Data System (ADS)

Yang, Chang; Xiao, Fuliang; He, Yihua; Liu, Si; Zhou, Qinghua; Guo, Mingyue; Zhao, Wanli

2018-03-01

During the 13-14 November 2012 storm, Van Allen Probe A simultaneously observed a 10 h period of enhanced chorus (including quasi-parallel and oblique propagation components) and relativistic electron fluxes over a broad range of L = 3-6 and magnetic local time = 2-10 within a complete orbit cycle. By adopting a Gaussian fit to the observed wave spectra, we obtain the wave parameters and calculate the bounce-averaged diffusion coefficients. We solve the Fokker-Planck diffusion equation to simulate flux evolutions of relativistic (1.8-4.2 MeV) electrons during two intervals when Probe A passed the location L = 4.3 along its orbit. The simulating results show that chorus with combined quasi-parallel and oblique components can produce a more pronounced flux enhancement in the pitch angle range ˜45°-80°, consistent well with the observation. The current results provide the first evidence on how relativistic electron fluxes vary under the drive of almost continuously distributed chorus with both quasi-parallel and oblique components within a complete orbit of Van Allen Probe.
Temperature Control with Two Parallel Small Loop Heat Pipes for GLM Program

NASA Technical Reports Server (NTRS)

Khrustalev, Dmitry; Stouffer, Chuck; Ku, Jentung; Hamilton, Jon; Anderson, Mark

2014-01-01

The concept of temperature control of an electronic component using a single Loop Heat Pipe (LHP) is well established for Aerospace applications. Using two LHPs is often desirable for redundancy/reliability reasons or for increasing the overall heat source-sink thermal conductance. This effort elaborates on temperature controlling operation of a thermal system that includes two small ammonia LHPs thermally coupled together at the evaporator end as well as at the condenser end and operating "in parallel". A transient model of the LHP system was developed on the Thermal Desktop (TradeMark) platform to understand some fundamental details of such parallel operation of the two LHPs. Extensive thermal-vacuum testing was conducted with two thermally coupled LHPs operating simultaneously as well as with only one LHP operating at a time. This paper outlines the temperature control procedures for two LHPs operating simultaneously with widely varying sink temperatures. The test data obtained during the thermal-vacuum testing, with both LHPs running simultaneously in comparison with only one LHP operating at a time, are presented with detailed explanations.
Automated Long-Term Monitoring of Parallel Microfluidic Operations Applying a Machine Vision-Assisted Positioning Method

PubMed Central

Yip, Hon Ming; Li, John C. S.; Cui, Xin; Gao, Qiannan; Leung, Chi Chiu

2014-01-01

As microfluidics has been applied extensively in many cell and biochemical applications, monitoring the related processes is an important requirement. In this work, we design and fabricate a high-throughput microfluidic device which contains 32 microchambers to perform automated parallel microfluidic operations and monitoring on an automated stage of a microscope. Images are captured at multiple spots on the device during the operations for monitoring samples in microchambers in parallel; yet the device positions may vary at different time points throughout operations as the device moves back and forth on a motorized microscopic stage. Here, we report an image-based positioning strategy to realign the chamber position before every recording of microscopic image. We fabricate alignment marks at defined locations next to the chambers in the microfluidic device as reference positions. We also develop image processing algorithms to recognize the chamber positions in real-time, followed by realigning the chambers to their preset positions in the captured images. We perform experiments to validate and characterize the device functionality and the automated realignment operation. Together, this microfluidic realignment strategy can be a platform technology to achieve precise positioning of multiple chambers for general microfluidic applications requiring long-term parallel monitoring of cell and biochemical activities. PMID:25133248
Parallel machine architecture for production rule systems

DOEpatents

Allen, Jr., John D.; Butler, Philip L.

1989-01-01

A parallel processing system for production rule programs utilizes a host processor for storing production rule right hand sides (RHS) and a plurality of rule processors for storing left hand sides (LHS). The rule processors operate in parallel in the recognize phase of the system recognize -Act Cycle to match their respective LHS's against a stored list of working memory elements (WME) in order to find a self consistent set of WME's. The list of WME is dynamically varied during the Act phase of the system in which the host executes or fires rule RHS's for those rules for which a self-consistent set has been found by the rule processors. The host transmits instructions for creating or deleting working memory elements as dictated by the rule firings until the rule processors are unable to find any further self-consistent working memory element sets at which time the production rule system is halted.
Quantum statistics and squeezing for a microwave-driven interacting magnon system.

PubMed

Haghshenasfard, Zahra; Cottam, Michael G

2017-02-01

Theoretical studies are reported for the statistical properties of a microwave-driven interacting magnon system. Both the magnetic dipole-dipole and the exchange interactions are included and the theory is developed for the case of parallel pumping allowing for the inclusion of the nonlinear processes due to the four-magnon interactions. The method of second quantization is used to transform the total Hamiltonian from spin operators to boson creation and annihilation operators. By using the coherent magnon state representation we have studied the magnon occupation number and the statistical behavior of the system. In particular, it is shown that the nonlinearities introduced by the parallel pumping field and the four-magnon interactions lead to non-classical quantum statistical properties of the system, such as magnon squeezing. Also control of the collapse-and-revival phenomena for the time evolution of the average magnon number is demonstrated by varying the parallel pumping amplitude and the four-magnon coupling.
Application of multirate digital filter banks to wideband all-digital phase-locked loops design

NASA Technical Reports Server (NTRS)

Sadr, Ramin; Shah, Biren; Hinedi, Sami

1993-01-01

A new class of architecture for all-digital phase-locked loops (DPLL's) is presented in this article. These architectures, referred to as parallel DPLL (PDPLL), employ multirate digital filter banks (DFB's) to track signals with a lower processing rate than the Nyquist rate, without reducing the input (Nyquist) bandwidth. The PDPLL basically trades complexity for hardware-processing speed by introducing parallel processing in the receiver. It is demonstrated here that the DPLL performance is identical to that of a PDPLL for both steady-state and transient behavior. A test signal with a time-varying Doppler characteristic is used to compare the performance of both the DPLL and the PDPLL.
Application of multirate digital filter banks to wideband all-digital phase-locked loops design

NASA Astrophysics Data System (ADS)

Sadr, Ramin; Shah, Biren; Hinedi, Sami

1993-06-01

A new class of architecture for all-digital phase-locked loops (DPLL's) is presented in this article. These architectures, referred to as parallel DPLL (PDPLL), employ multirate digital filter banks (DFB's) to track signals with a lower processing rate than the Nyquist rate, without reducing the input (Nyquist) bandwidth. The PDPLL basically trades complexity for hardware-processing speed by introducing parallel processing in the receiver. It is demonstrated here that the DPLL performance is identical to that of a PDPLL for both steady-state and transient behavior. A test signal with a time-varying Doppler characteristic is used to compare the performance of both the DPLL and the PDPLL.
Application of multirate digital filter banks to wideband all-digital phase-locked loops design

NASA Astrophysics Data System (ADS)

Sadr, R.; Shah, B.; Hinedi, S.

1992-11-01

A new class of architecture for all-digital phase-locked loops (DPLL's) is presented in this article. These architectures, referred to as parallel DPLL (PDPLL), employ multirate digital filter banks (DFB's) to track signals with a lower processing rate than the Nyquist rate, without reducing the input (Nyquist) bandwidth. The PDPLL basically trades complexity for hardware-processing speed by introducing parallel processing in the receiver. It is demonstrated here that the DPLL performance is identical to that of a PDPLL for both steady-state and transient behavior. A test signal with a time-varying Doppler characteristic is used to compare the performance of both the DPLL and the PDPLL.
Exact coherent structures in an asymptotically reduced description of parallel shear flows

NASA Astrophysics Data System (ADS)

Beaume, Cédric; Knobloch, Edgar; Chini, Gregory P.; Julien, Keith

2015-02-01

A reduced description of shear flows motivated by the Reynolds number scaling of lower-branch exact coherent states in plane Couette flow (Wang J, Gibson J and Waleffe F 2007 Phys. Rev. Lett. 98 204501) is constructed. Exact time-independent nonlinear solutions of the reduced equations corresponding to both lower and upper branch states are found for a sinusoidal, body-forced shear flow. The lower branch solution is characterized by fluctuations that vary slowly along the critical layer while the upper branch solutions display a bimodal structure and are more strongly focused on the critical layer. The reduced equations provide a rational framework for investigations of subcritical spatiotemporal patterns in parallel shear flows.
Application of multirate digital filter banks to wideband all-digital phase-locked loops design

NASA Technical Reports Server (NTRS)

Sadr, R.; Shah, B.; Hinedi, S.

1992-01-01

A new class of architecture for all-digital phase-locked loops (DPLL's) is presented in this article. These architectures, referred to as parallel DPLL (PDPLL), employ multirate digital filter banks (DFB's) to track signals with a lower processing rate than the Nyquist rate, without reducing the input (Nyquist) bandwidth. The PDPLL basically trades complexity for hardware-processing speed by introducing parallel processing in the receiver. It is demonstrated here that the DPLL performance is identical to that of a PDPLL for both steady-state and transient behavior. A test signal with a time-varying Doppler characteristic is used to compare the performance of both the DPLL and the PDPLL.
A polymorphic reconfigurable emulator for parallel simulation

NASA Technical Reports Server (NTRS)

Parrish, E. A., Jr.; Mcvey, E. S.; Cook, G.

1980-01-01

Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.
Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Dimpsey, Robert Tod

1992-01-01

In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.

Self-calibrated correlation imaging with k-space variant correlation functions.

PubMed

Li, Yu; Edalati, Masoud; Du, Xingfu; Wang, Hui; Cao, Jie J

2018-03-01

Correlation imaging is a previously developed high-speed MRI framework that converts parallel imaging reconstruction into the estimate of correlation functions. The presented work aims to demonstrate this framework can provide a speed gain over parallel imaging by estimating k-space variant correlation functions. Because of Fourier encoding with gradients, outer k-space data contain higher spatial-frequency image components arising primarily from tissue boundaries. As a result of tissue-boundary sparsity in the human anatomy, neighboring k-space data correlation varies from the central to the outer k-space. By estimating k-space variant correlation functions with an iterative self-calibration method, correlation imaging can benefit from neighboring k-space data correlation associated with both coil sensitivity encoding and tissue-boundary sparsity, thereby providing a speed gain over parallel imaging that relies only on coil sensitivity encoding. This new approach is investigated in brain imaging and free-breathing neonatal cardiac imaging. Correlation imaging performs better than existing parallel imaging techniques in simulated brain imaging acceleration experiments. The higher speed enables real-time data acquisition for neonatal cardiac imaging in which physiological motion is fast and non-periodic. With k-space variant correlation functions, correlation imaging gives a higher speed than parallel imaging and offers the potential to image physiological motion in real-time. Magn Reson Med 79:1483-1494, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Graph Partitioning for Parallel Applications in Heterogeneous Grid Environments

NASA Technical Reports Server (NTRS)

Bisws, Rupak; Kumar, Shailendra; Das, Sajal K.; Biegel, Bryan (Technical Monitor)

2002-01-01

The problem of partitioning irregular graphs and meshes for parallel computations on homogeneous systems has been extensively studied. However, these partitioning schemes fail when the target system architecture exhibits heterogeneity in resource characteristics. With the emergence of technologies such as the Grid, it is imperative to study the partitioning problem taking into consideration the differing capabilities of such distributed heterogeneous systems. In our model, the heterogeneous system consists of processors with varying processing power and an underlying non-uniform communication network. We present in this paper a novel multilevel partitioning scheme for irregular graphs and meshes, that takes into account issues pertinent to Grid computing environments. Our partitioning algorithm, called MiniMax, generates and maps partitions onto a heterogeneous system with the objective of minimizing the maximum execution time of the parallel distributed application. For experimental performance study, we have considered both a realistic mesh problem from NASA as well as synthetic workloads. Simulation results demonstrate that MiniMax generates high quality partitions for various classes of applications targeted for parallel execution in a distributed heterogeneous environment.
Gyrokinetic magnetohydrodynamics and the associated equilibria

NASA Astrophysics Data System (ADS)

Lee, W. W.; Hudson, S. R.; Ma, C. H.

2017-12-01

The gyrokinetic magnetohydrodynamic (MHD) equations, related to the recent paper by W. W. Lee ["Magnetohydrodynamics for collisionless plasmas from the gyrokinetic perspective," Phys. Plasmas 23, 070705 (2016)], and their associated equilibria properties are discussed. This set of equations consists of the time-dependent gyrokinetic vorticity equation, the gyrokinetic parallel Ohm's law, and the gyrokinetic Ampere's law as well as the equations of state, which are expressed in terms of the electrostatic potential, ϕ, and the vector potential, A , and support both spatially varying perpendicular and parallel pressure gradients and the associated currents. The corresponding gyrokinetic MHD equilibria can be reached when ϕ→0 and A becomes constant in time, which, in turn, gives ∇.(J∥+J⊥)=0 and the associated magnetic islands, if they exist. Examples of simple cylindrical geometry are given. These gyrokinetic MHD equations look quite different from the conventional MHD equations, and their comparisons will be an interesting topic in the future.
Gyrokinetic magnetohydrodynamics and the associated equilibria

DOE PAGES

Lee, W. W.; Hudson, S. R.; Ma, C. H.

2017-12-27

The gyrokinetic magnetohydrodynamic (MHD) equations, related to the recent paper by W. W. Lee, and their associated equilibria properties are discussed. This set of equations consists of the time-dependent gyrokinetic vorticity equation, the gyrokinetic parallel Ohm's law, and the gyrokinetic Ampere's law as well as the equations of state, which are expressed in terms of the electrostatic potential, Φ, and the vector potential, A, and support both spatially varying perpendicular and parallel pressure gradients and the associated currents. The corresponding gyrokinetic MHD equilibria can be reached when Φ → 0 and A becomes constant in time, which, in turn, givesmore » ∇· (J ∥+J ⊥) = 0 and the associated magnetic islands, if they exist. Examples of simple cylindrical geometry are given. In conclusion, these gyrokinetic MHD equations look quite different from the conventional MHD equations, and their comparisons will be an interesting topic in the future.« less
Gyrokinetic magnetohydrodynamics and the associated equilibria

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, W. W.; Hudson, S. R.; Ma, C. H.

The gyrokinetic magnetohydrodynamic (MHD) equations, related to the recent paper by W. W. Lee, and their associated equilibria properties are discussed. This set of equations consists of the time-dependent gyrokinetic vorticity equation, the gyrokinetic parallel Ohm's law, and the gyrokinetic Ampere's law as well as the equations of state, which are expressed in terms of the electrostatic potential, Φ, and the vector potential, A, and support both spatially varying perpendicular and parallel pressure gradients and the associated currents. The corresponding gyrokinetic MHD equilibria can be reached when Φ → 0 and A becomes constant in time, which, in turn, givesmore » ∇· (J ∥+J ⊥) = 0 and the associated magnetic islands, if they exist. Examples of simple cylindrical geometry are given. In conclusion, these gyrokinetic MHD equations look quite different from the conventional MHD equations, and their comparisons will be an interesting topic in the future.« less
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH

NASA Astrophysics Data System (ADS)

Lee, D.; Gopal, S.; Mohapatra, P.

2012-07-01

We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
Concordant paleolatitudes for Neoproterozoic ophiolitic rocks of the Trinity Complex, Klamath Mountains, California

USGS Publications Warehouse

Mankinen, E.A.; Lindsley-Griffin, N.; Griffin, J.R.

2002-01-01

New paleomagnetic results from the eastern Klamath Mountains of northern California show that Neoproterozoic rocks of the Trinity ophiolitic complex and overlying Middle Devonian volcanic rocks are latitudinally concordant with cratonal North America. Combining paleomagnetic data with regional geologic and faunal evidence suggests that the Trinity Complex and related terranes of the eastern Klamath plate were linked in some fashion to the North American craton throughout that time, but that distance between them may have varied considerably. A possible model that is consistent with our paleomagnetic results and the geologic evidence is that the Trinity Complex formed and migrated parallel to paleolatitude in the basin between Laurasia and Australia-East Antarctica as the Rodinian supercontinent began to break up. It then continued to move parallel to paleolatitude at least through Middle Devonian time. Although the eastern Klamath plate served as a nucleus against which more western components of the Klamath Mountains province amalgamated, the Klamath superterrane was not accreted to North America until Early Cretaceous time.
Study of resonant processes in plasmonic nanostructures for sensor applications (Conference Presentation)

NASA Astrophysics Data System (ADS)

Pirunčík, Jiří; Kwiecien, Pavel; Fiala, Jan; Richter, Ivan

2017-05-01

This contribution is focused on the numerical studies of resonant processes in individual plasmonic nanostructures, with the attention particularly given to rectangular nanoparticles and concominant localized surface plasmon resonance processes. Relevant models for the description and anylysis of localized surface plasmon resonance are introduced, in particular: quasistatic approximation, Mie theory and in particular, a generalized (quasi)analytical approach for treating rectangularly shaped nanostructures. The parameters influencing resonant behavior of nanoparticles are analyzed with special interest in morphology and sensor applications. Results acquired with Lumerical FDTD Solutions software, using finite-difference time-domain simulation method, are shown and discussed. Simulations were mostly performed for selected nanostructures composed of finite rectangular nanowires with square cross-sections. Systematic analysis is made for single nanowires with varying length, parallel couple of nanowires with varying gap (cut -wires) and selected dolmen structures with varying gap between one nanowire transversely located with respect to parallel couple of nanowires (in both in-plane and -out-of-plane arrangements). The dependence of resonant peaks of cross-section spectral behavior (absorption, scattering, extinction) and their tunability via suitable structuring and morphology changes are primarily researched. These studies are then followed with an analysis of the effect of periodic arrangements. The results can be usable with respect to possible sensor applications.
The effect of selection environment on the probability of parallel evolution.

PubMed

Bailey, Susan F; Rodrigue, Nicolas; Kassen, Rees

2015-06-01

Across the great diversity of life, there are many compelling examples of parallel and convergent evolution-similar evolutionary changes arising in independently evolving populations. Parallel evolution is often taken to be strong evidence of adaptation occurring in populations that are highly constrained in their genetic variation. Theoretical models suggest a few potential factors driving the probability of parallel evolution, but experimental tests are needed. In this study, we quantify the degree of parallel evolution in 15 replicate populations of Pseudomonas fluorescens evolved in five different environments that varied in resource type and arrangement. We identified repeat changes across multiple levels of biological organization from phenotype, to gene, to nucleotide, and tested the impact of 1) selection environment, 2) the degree of adaptation, and 3) the degree of heterogeneity in the environment on the degree of parallel evolution at the gene-level. We saw, as expected, that parallel evolution occurred more often between populations evolved in the same environment; however, the extent of parallel evolution varied widely. The degree of adaptation did not significantly explain variation in the extent of parallelism in our system but number of available beneficial mutations correlated negatively with parallel evolution. In addition, degree of parallel evolution was significantly higher in populations evolved in a spatially structured, multiresource environment, suggesting that environmental heterogeneity may be an important factor constraining adaptation. Overall, our results stress the importance of environment in driving parallel evolutionary changes and point to a number of avenues for future work for understanding when evolution is predictable. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Magnetization Losses in Multiply Connected YBa2Cu3O6+x-Coated Conductors (Postprint)

DTIC Science & Technology

2012-02-01

time-varying magnetic field. In these samples, the superconducting layer is divided into parallel stripes segregated by nonsuperconducting grooves... superconducting bridges is superimposed on the striated film. We find that the presence of the bridges does not substantially increase the magnetization ...perature of 10 K and then the field was turned off. As the result, the magnetic flux was trapped in the superconducting stripes bright bands
Induced Eddy Currents in Simple Conductive Geometries: Mathematical Formalism Describes the Excitation of Electrical Eddy Currents in a Time-Varying Magnetic Field

DOE PAGES

Nagel, James R.

2017-12-22

In this paper, a complete mathematical formalism is introduced to describe the excitation of electrical eddy currents due to a time-varying magnetic field. The process works by applying a quasistatic approximation to Ampere's law and then segregating the magnetic field into impressed and induced terms. The result is a nonhomogeneous vector Helmholtz equation that can be analytically solved for many practical geometries. Four demonstration cases are then solved under a constant excitation field over all space—an infinite slab in one dimension, a longitudinal cylinder in two dimensions, a transverse cylinder in two dimensions, and a sphere in three dimensions. Numericalmore » simulations are also performed in parallel with analytic computations, all of which verify the accuracy of the derived expressions.« less
In situ conversion process systems utilizing wellbores in at least two regions of a formation

DOEpatents

Vinegar, Harold J [Bellaire, TX; Hsu, Chia-Fu [Granada Hills, CA

2011-09-27

A system for heating a subsurface formation is described. The system includes a plurality of elongated heaters located in a plurality of openings in the formation. At least two of the heaters are substantially parallel to each other for at least a portion of the lengths of the heaters. At least two of the heaters have first end portions in a first region of the formation and second end portions in a second region of the formation. A source of time-varying current is configured to apply time-varying current to at least two of the heaters. The first end portions of at least two heaters are configured to have substantially the same voltage applied to them. The second portions of at least two heaters are configured to have substantially the same voltage applied to them.
Induced Eddy Currents in Simple Conductive Geometries: Mathematical Formalism Describes the Excitation of Electrical Eddy Currents in a Time-Varying Magnetic Field

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nagel, James R.

In this paper, a complete mathematical formalism is introduced to describe the excitation of electrical eddy currents due to a time-varying magnetic field. The process works by applying a quasistatic approximation to Ampere's law and then segregating the magnetic field into impressed and induced terms. The result is a nonhomogeneous vector Helmholtz equation that can be analytically solved for many practical geometries. Four demonstration cases are then solved under a constant excitation field over all space—an infinite slab in one dimension, a longitudinal cylinder in two dimensions, a transverse cylinder in two dimensions, and a sphere in three dimensions. Numericalmore » simulations are also performed in parallel with analytic computations, all of which verify the accuracy of the derived expressions.« less
A parallel algorithm for the initial screening of space debris collisions prediction using the SGP4/SDP4 models and GPU acceleration

NASA Astrophysics Data System (ADS)

Lin, Mingpei; Xu, Ming; Fu, Xiaoyu

2017-05-01

Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation.
Solution of task related to control of swiss-type automatic lathe to get planes parallel to part axis

NASA Astrophysics Data System (ADS)

Tabekina, N. A.; Chepchurov, M. S.; Evtushenko, E. I.; Dmitrievsky, B. S.

2018-05-01

The work solves the problem of automation of machining process namely turning to produce parts having the planes parallel to an axis of rotation of part without using special tools. According to the results, the availability of the equipment of a high speed electromechanical drive to control the operative movements of lathe machine will enable one to get the planes parallel to the part axis. The method of getting planes parallel to the part axis is based on the mathematical model, which is presented as functional dependency between the conveying velocity of the driven element and the time. It describes the operative movements of lathe machine all over the tool path. Using the model of movement of the tool, it has been found that the conveying velocity varies from the maximum to zero value. It will allow one to carry out the reverse of the drive. The scheme of tool placement regarding the workpiece has been proposed for unidirectional movement of the driven element at high conveying velocity. The control method of CNC machines can be used for getting geometrically complex parts on the lathe without using special milling tools.
Unique parallel radiations of high-mountainous species of the genus Sedum (Crassulaceae) on the continental island of Taiwan.

PubMed

Ito, Takuro; Yu, Chih-Chieh; Nakamura, Koh; Chung, Kuo-Fang; Yang, Qin-Er; Fu, Cheng-Xin; Qi, Zhe-Chen; Kokubugata, Goro

2017-08-01

We explored the temporal and spatial diversification of the plant genus Sedum L. (Crassulaceae) in Taiwan based on molecular analysis of nrITS and cpDNA sequences from East Asian Sedum members. Our phylogenetic and ancestral area reconstruction analysis showed that Taiwanese Sedum comprised two lineages that independently migrated from Japan and Eastern China. Furthermore, the genetic distances among species in these two clades were smaller than those of other East Asian Sedum clades, and the Taiwanese members of each clade occupy extremely varied habitats with similar niches in high-mountain regions. These data indicate that species diversification occurred in parallel in the two Taiwanese Sedum lineages, and that these parallel radiations could have occurred within the small continental island of Taiwan. Moreover, the estimated time of divergence for Taiwanese Sedum indicates that the two radiations might have been correlated to the formation of mountains in Taiwan during the early Pleistocene. We suggest that these parallel radiations may be attributable to the geographical dynamics of Taiwan and specific biological features of Sedum that allow them to adapt to new ecological niches. Copyright © 2017 Elsevier Inc. All rights reserved.
A model of the magnetosheath magnetic field during magnetic clouds

NASA Astrophysics Data System (ADS)

Turc, L.; Fontaine, D.; Savoini, P.; Kilpua, E. K. J.

2014-02-01

Magnetic clouds (MCs) are huge interplanetary structures which originate from the Sun and have a paramount importance in driving magnetospheric storms. Before reaching the magnetosphere, MCs interact with the Earth's bow shock. This may alter their structure and therefore modify their expected geoeffectivity. We develop a simple 3-D model of the magnetosheath adapted to MCs conditions. This model is the first to describe the interaction of MCs with the bow shock and their propagation inside the magnetosheath. We find that when the MC encounters the Earth centrally and with its axis perpendicular to the Sun-Earth line, the MC's magnetic structure remains mostly unchanged from the solar wind to the magnetosheath. In this case, the entire dayside magnetosheath is located downstream of a quasi-perpendicular bow shock. When the MC is encountered far from its centre, or when its axis has a large tilt towards the ecliptic plane, the MC's structure downstream of the bow shock differs significantly from that upstream. Moreover, the MC's structure also differs from one region of the magnetosheath to another and these differences vary with time and space as the MC passes by. In these cases, the bow shock configuration is mainly quasi-parallel. Strong magnetic field asymmetries arise in the magnetosheath; the sign of the magnetic field north-south component may change from the solar wind to some parts of the magnetosheath. We stress the importance of the Bx component. We estimate the regions where the magnetosheath and magnetospheric magnetic fields are anti-parallel at the magnetopause (i.e. favourable to reconnection). We find that the location of anti-parallel fields varies with time as the MCs move past Earth's environment, and that they may be situated near the subsolar region even for an initially northward magnetic field upstream of the bow shock. Our results point out the major role played by the bow shock configuration in modifying or keeping the structure of the MCs unchanged. Note that this model is not restricted to MCs, it can be used to describe the magnetosheath magnetic field under an arbitrary slowly varying interplanetary magnetic field.
Possible applications of time domain reflectometry in planetary exploration missions

NASA Technical Reports Server (NTRS)

Heckendorn, S.

1982-01-01

The use of a time domain reflectometer (TDR) for planetary exploration is considered. Determination of the apparent dielectric constant and hence, the volumetric water content of frozen and unfrozen soils using the TDR is described. Earth-based tests were performed on a New York state sandy soil and a Wyoming Bentonite. Use of both a cylindrical coaxial transmission line and a parallel transmission line as probes was evaluated. The water content of the soils was varied and the apparent dielectric constant measured in both frozen and unfrozen states. Advantages and disadvantages of the technique are discussed.
Transverse-displacement stabilizer for passive magnetic bearing systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Post, Richard F

The invention provides a way re-center a rotor's central longitudinal rotational axis with a desired system longitudinal axis. A pair of planar semicircular permanent magnets are pieced together to form a circle. The flux from each magnet is pointed in in opposite directions that are both parallel with the rotational axis. A stationary shorted circular winding the plane of which is perpendicular to the system longitudinal axis and the center of curvature of the circular winding is positioned on the system longitudinal axis. Upon rotation of the rotor, when a transverse displacement of the rotational axis occurs relative to themore » system longitudinal axis, the winding will experience a time-varying magnetic flux such that an alternating current that is proportional to the displacement will flow in the winding. Such time-varying magnetic flux will provide a force that will bring the rotor back to its centered position about the desired axis.« less
Coeval emplacement and orogen-parallel transport of gold in oblique convergent orogens

NASA Astrophysics Data System (ADS)

Upton, Phaedra; Craw, Dave

2016-12-01

Varying amounts of gold mineralisation is occurring in all young and active collisional mountain belts. Concurrently, these syn-orogenic hydrothermal deposits are being eroded and transported to form placer deposits. Local extension occurs in convergent orogens, especially oblique orogens, and facilitates emplacement of syn-orogenic gold-bearing deposits with or without associated magmatism. Numerical modelling has shown that extension results from directional variations in movement rates along the rock transport trajectory during convergence, and is most pronounced for highly oblique convergence with strong crustal rheology. On-going uplift during orogenesis exposes gold deposits to erosion, transport, and localised placer concentration. Drainage patterns in variably oblique convergent orogenic belts typically have an orogen-parallel or sub-parallel component; the details of which varies with convergence obliquity and the vagaries of underlying geological controls. This leads to lateral transport of eroded syn-orogenic gold on a range of scales, up to > 100 km. The presence of inherited crustal blocks with contrasting rheology in oblique orogenic collision zones can cause perturbations in drainage patterns, but numerical modelling suggests that orogen-parallel drainage is still a persistent and robust feature. The presence of an inherited block of weak crust enhances the orogen-parallel drainage by imposition of localised subsidence zones elongated along a plate boundary. Evolution and reorientation of orogen-parallel drainage can sever links between gold placer deposits and their syn-orogenic sources. Many of these modelled features of syn-orogenic gold emplacement and varying amounts of orogen-parallel detrital gold transport can be recognised in the Miocene to Recent New Zealand oblique convergent orogen. These processes contribute little gold to major placer goldfields, which require more long-term recycling and placer gold concentration. Most eroded syn-orogenic gold becomes diluted by abundant lithic debris in rivers and sedimentary basins except where localised concentration occurs, especially on beaches.

Dynamic Cerebral Autoregulation Changes during Sub-Maximal Handgrip Maneuver

PubMed Central

Nogueira, Ricardo C.; Bor-Seng-Shu, Edson; Santos, Marcelo R.; Negrão, Carlos E.; Teixeira, Manoel J.; Panerai, Ronney B.

2013-01-01

Purpose We investigated the effect of handgrip (HG) maneuver on time-varying estimates of dynamic cerebral autoregulation (CA) using the autoregressive moving average technique. Methods Twelve healthy subjects were recruited to perform HG maneuver during 3 minutes with 30% of maximum contraction force. Cerebral blood flow velocity, end-tidal CO2 pressure (PETCO2), and noninvasive arterial blood pressure (ABP) were continuously recorded during baseline, HG and recovery. Critical closing pressure (CrCP), resistance area-product (RAP), and time-varying autoregulation index (ARI) were obtained. Results PETCO2 did not show significant changes during HG maneuver. Whilst ABP increased continuously during the maneuver, to 27% above its baseline value, CBFV raised to a plateau approximately 15% above baseline. This was sustained by a parallel increase in RAP, suggestive of myogenic vasoconstriction, and a reduction in CrCP that could be associated with metabolic vasodilation. The time-varying ARI index dropped at the beginning and end of the maneuver (p<0.005), which could be related to corresponding alert reactions or to different time constants of the myogenic, metabolic and/or neurogenic mechanisms. Conclusion Changes in dynamic CA during HG suggest a complex interplay of regulatory mechanisms during static exercise that should be considered when assessing the determinants of cerebral blood flow and metabolism. PMID:23967113
Three is much more than two in coarsening dynamics of cyclic competitions

NASA Astrophysics Data System (ADS)

Mitarai, Namiko; Gunnarson, Ivar; Pedersen, Buster Niels; Rosiek, Christian Anker; Sneppen, Kim

2016-04-01

The classical game of rock-paper-scissors has inspired experiments and spatial model systems that address the robustness of biological diversity. In particular, the game nicely illustrates that cyclic interactions allow multiple strategies to coexist for long-time intervals. When formulated in terms of a one-dimensional cellular automata, the spatial distribution of strategies exhibits coarsening with algebraically growing domain size over time, while the two-dimensional version allows domains to break and thereby opens the possibility for long-time coexistence. We consider a quasi-one-dimensional implementation of the cyclic competition, and study the long-term dynamics as a function of rare invasions between parallel linear ecosystems. We find that increasing the complexity from two to three parallel subsystems allows a transition from complete coarsening to an active steady state where the domain size stays finite. We further find that this transition happens irrespective of whether the update is done in parallel for all sites simultaneously or done randomly in sequential order. In both cases, the active state is characterized by localized bursts of dislocations, followed by longer periods of coarsening. In the case of the parallel dynamics, we find that there is another phase transition between the active steady state and the coarsening state within the three-line system when the invasion rate between the subsystems is varied. We identify the critical parameter for this transition and show that the density of active boundaries has critical exponents that are consistent with the directed percolation universality class. On the other hand, numerical simulations with the random sequential dynamics suggest that the system may exhibit an active steady state as long as the invasion rate is finite.
Cerebral blood flow variations in CNS lupus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kushner, M.J.; Tobin, M.; Fazekas, F.

1990-01-01

We studied the patterns of cerebral blood flow (CBF), over time, in patients with systemic lupus erythematosus and varying neurologic manifestations including headache, stroke, psychosis, and encephalopathy. For 20 paired xenon-133 CBF measurements, CBF was normal during CNS remissions, regardless of the symptoms. CBF was significantly depressed during CNS exacerbations. The magnitude of change in CBF varied with the neurologic syndrome. CBF was least affected in patients with nonspecific symptoms such as headache or malaise, whereas patients with encephalopathy or psychosis exhibited the greatest reductions in CBF. In 1 patient with affective psychosis, without clinical or CT evidence of cerebralmore » ischemia, serial SPECT studies showed resolution of multifocal cerebral perfusion defects which paralleled clinical recovery.« less
TOMS and SBUV Data: Comparison to 3D Chemical-Transport Model Results

NASA Technical Reports Server (NTRS)

Stolarski, Richard S.; Douglass, Anne R.; Steenrod, Steve; Frith, Stacey

2003-01-01

We have updated our merged ozone data (MOD) set using the TOMS data from the new version 8 algorithm. We then analyzed these data for contributions from solar cycle, volcanoes, QBO, and halogens using a standard statistical time series model. We have recently completed a hindcast run of our 3D chemical-transport model for the same years. This model uses off-line winds from the finite-volume GCM, a full stratospheric photochemistry package, and time-varying forcing due to halogens, solar uv, and volcanic aerosols. We will report on a parallel analysis of these model results using the same statistical time series technique as used for the MOD data.
The Nature of Phonological Encoding During Spoken Word Retrieval.

ERIC Educational Resources Information Center

Sullivan, Michael P.; Riffel, Brian

1999-01-01

Examined whether phonological selection occurs sequentially or in parallel. College students named picture primes and targets, with varied response stimulus intervals between primes and targets. Results were consistent with Dell's (1988) two-stage sequential model of encoding, which shows an initial parallel activation within a lexical network…
A Data Type for Efficient Representation of Other Data Types

NASA Technical Reports Server (NTRS)

James, Mark

2008-01-01

A self-organizing, monomorphic data type denoted a sequence has been conceived to address certain concerns that arise in programming parallel computers. A sequence in the present sense can be regarded abstractly as a vector, set, bag, queue, or other construct. Heretofore, in programming a parallel computer, it has been necessary for the programmer to state explicitly, at the outset, what parts of the program and the underlying data structures must be represented in parallel form. Not only is this requirement not optimal from the perspective of implementation; it entails an additional requirement that the programmer have intimate understanding of the underlying parallel structure. The present sequence data type overcomes both the implementation and parallel structure obstacles. In so doing, the sequence data type provides unified means by which the programmer can represent a data structure for natural and automatic decomposition to a parallel computing architecture. Sequences exhibit the behavioral and structural characteristics of vectors, but the underlying representations are automatically synthesized from combinations of programmers advice and execution use metrics. Sequences can vary bidirectionally between sparseness and density, making them excellent choices for many kinds of algorithms. The novelty and benefit of this behavior lies in the fact that it can relieve programmers of the details of implementations. The creation of a sequence enables decoupling of a conceptual representation from an implementation. The underlying representation of a sequence is a hybrid of representations composed of vectors, linked lists, connected blocks, and hash tables. The internal structure of a sequence can automatically change from time to time on the basis of how it is being used. Those portions of a sequence where elements have not been added or removed can be as efficient as vectors. As elements are inserted and removed in a given portion, then different methods are utilized to provide both an access and memory strategy that is optimized for that portion and the use to which it is put.
Multiphase complete exchange on Paragon, SP2 and CS-2

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.

1995-01-01

The overhead of interprocessor communication is a major factor in limiting the performance of parallel computer systems. The complete exchange is the severest communication pattern in that it requires each processor to send a distinct message to every other processor. This pattern is at the heart of many important parallel applications. On hypercubes, multiphase complete exchange has been developed and shown to provide optimal performance over varying message sizes. Most commercial multicomputer systems do not have a hypercube interconnect. However, they use special purpose hardware and dedicated communication processors to achieve very high performance communication and can be made to emulate the hypercube quite well. Multiphase complete exchange has been implemented on three contemporary parallel architectures: the Intel Paragon, IBM SP2 and Meiko CS-2. The essential features of these machines are described and their basic interprocessor communication overheads are discussed. The performance of multiphase complete exchange is evaluated on each machine. It is shown that the theoretical ideas developed for hypercubes are also applicable in practice to these machines and that multiphase complete exchange can lead to major savings in execution time over traditional solutions.
Optimizing SIEM Throughput on the Cloud Using Parallelization.

PubMed

Alam, Masoom; Ihsan, Asif; Khan, Muazzam A; Javaid, Qaisar; Khan, Abid; Manzoor, Jawad; Akhundzada, Adnan; Khan, Muhammad Khurram; Farooq, Sajid

2016-01-01

Processing large amounts of data in real time for identifying security issues pose several performance challenges, especially when hardware infrastructure is limited. Managed Security Service Providers (MSSP), mostly hosting their applications on the Cloud, receive events at a very high rate that varies from a few hundred to a couple of thousand events per second (EPS). It is critical to process this data efficiently, so that attacks could be identified quickly and necessary response could be initiated. This paper evaluates the performance of a security framework OSTROM built on the Esper complex event processing (CEP) engine under a parallel and non-parallel computational framework. We explain three architectures under which Esper can be used to process events. We investigated the effect on throughput, memory and CPU usage in each configuration setting. The results indicate that the performance of the engine is limited by the number of events coming in rather than the queries being processed. The architecture where 1/4th of the total events are submitted to each instance and all the queries are processed by all the units shows best results in terms of throughput, memory and CPU usage.
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by semi-randomly varying routing policies for different packets

DOEpatents

Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

2010-11-23

A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Nodes vary a choice of routing policy for routing data in the network in a semi-random manner, so that similarly situated packets are not always routed along the same path. Semi-random variation of the routing policy tends to avoid certain local hot spots of network activity, which might otherwise arise using more consistent routing determinations. Preferably, the originating node chooses a routing policy for a packet, and all intermediate nodes in the path route the packet according to that policy. Policies may be rotated on a round-robin basis, selected by generating a random number, or otherwise varied.
Minimum envelope roughness pulse design for reduced amplifier distortion in parallel excitation.

PubMed

Grissom, William A; Kerr, Adam B; Stang, Pascal; Scott, Greig C; Pauly, John M

2010-11-01

Parallel excitation uses multiple transmit channels and coils, each driven by independent waveforms, to afford the pulse designer an additional spatial encoding mechanism that complements gradient encoding. In contrast to parallel reception, parallel excitation requires individual power amplifiers for each transmit channel, which can be cost prohibitive. Several groups have explored the use of low-cost power amplifiers for parallel excitation; however, such amplifiers commonly exhibit nonlinear memory effects that distort radio frequency pulses. This is especially true for pulses with rapidly varying envelopes, which are common in parallel excitation. To overcome this problem, we introduce a technique for parallel excitation pulse design that yields pulses with smoother envelopes. We demonstrate experimentally that pulses designed with the new technique suffer less amplifier distortion than unregularized pulses and pulses designed with conventional regularization.
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romano, Paul K.; Siegel, Andrew R.

The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of two parameters: the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size in order to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, however, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration. For some problems, this implies that vector effciencies over 50% may not be attainable. While there are many factors impacting performance of an event-based algorithm that are not captured by our model, it nevertheless provides insights into factors that may be limiting in a real implementation.« less
An in-line spectrophotometer on a centrifugal microfluidic platform for real-time protein determination and calibration.

PubMed

Ding, Zhaoxiong; Zhang, Dongying; Wang, Guanghui; Tang, Minghui; Dong, Yumin; Zhang, Yixin; Ho, Ho-Pui; Zhang, Xuping

2016-09-21

In this paper, an in-line, low-cost, miniature and portable spectrophotometric detection system is presented and used for fast protein determination and calibration in centrifugal microfluidics. Our portable detection system is configured with paired emitter and detector diodes (PEDD), where the light beam between both LEDs is collimated with enhanced system tolerance. It is the first time that a physical model of PEDD is clearly presented, which could be modelled as a photosensitive RC oscillator. A portable centrifugal microfluidic system that contains a wireless port in real-time communication with a smartphone has been built to show that PEDD is an effective strategy for conducting rapid protein bioassays with detection performance comparable to that of a UV-vis spectrophotometer. The choice of centrifugal microfluidics offers the unique benefits of highly parallel fluidic actuation at high accuracy while there is no need for a pump, as inertial forces are present within the entire spinning disc and accurately controlled by varying the spinning speed. As a demonstration experiment, we have conducted the Bradford assay for bovine serum albumin (BSA) concentration calibration from 0 to 2 mg mL(-1). Moreover, a novel centrifugal disc with a spiral microchannel is proposed for automatic distribution and metering of the sample to all the parallel reactions at one time. The reported lab-on-a-disc scheme with PEDD detection may offer a solution for high-throughput assays, such as protein density calibration, drug screening and drug solubility measurement that require the handling of a large number of reactions in parallel.
MOST: A Powerful Tool to Reveal the True Nature of the Mysterious Dust-Forming Wolf-Rayet Binary CV Ser

NASA Astrophysics Data System (ADS)

David-Uraz, A.; Moffat, A. F. J.; Chené, A.-N.; MOST Collaboration

2012-12-01

The WR + O binary CV Ser has been a source of mystery since it was shown that its atmospheric eclipses change with time over decades, in addition to its sporadic dust production. However, the first high-precision time-dependent photometric observations obtained with the MOST space telescope in 2009 show two consecutive eclipses over the 29 day orbit, with varying depths. A subsequent MOST run in 2010 showed a somewhat asymmetric eclipse profile. Parallel optical spectroscopy was obtained from the Observatoire du Mont-Mégantic (2009 and 2010) and from the Dominion Astrophysical Observatory (2009).
Correlation of generation interval and scale of large-scale submarine landslides using 3D seismic data off Shimokita Peninsula, Northeast Japan

NASA Astrophysics Data System (ADS)

Nakamura, Yuki; Ashi, Juichiro; Morita, Sumito

2016-04-01

To clarify timing and scale of past submarine landslides is important to understand formation processes of the landslides. The study area is in a part of continental slope of the Japan Trench, where a number of large-scale submarine landslide (slump) deposits have been identified in Pliocene and Quaternary formations by analysing METI's 3D seismic data "Sanrikuoki 3D" off Shimokita Peninsula (Morita et al., 2011). As structural features, swarm of parallel dikes which are likely dewatering paths formed accompanying the slumping deformation, and slip directions are basically perpendicular to the parallel dikes. Therefore, parallel dikes are good indicator for estimation of slip directions. Slip direction of each slide was determined one kilometre grid in the survey area of 40 km x 20 km. The remarkable slip direction varies from Pliocene to Quaternary in the survey area. Parallel dike structure is also available for the distinguishment of the slump deposit and normal deposit on time slice images. By tracing outline of slump deposits at each depth, we identified general morphology of the overall slump deposits, and calculated the volume of the extracted slump deposits so as to estimate the scale of each event. We investigated temporal and spatial variation of depositional pattern of the slump deposits. Calculating the generation interval of the slumps, some periodicity is likely recognized, especially large slump do not occur in succession. Additionally, examining the relationship of the cumulative volume and the generation interval, certain correlation is observed in Pliocene and Quaternary. Key words: submarine landslides, 3D seismic data, Shimokita Peninsula
Laboratory glassware rack for seismic safety

NASA Technical Reports Server (NTRS)

Cohen, M. M. (Inventor)

1985-01-01

A rack for laboratory bottles and jars for chemicals and medicines has been designed to provide the maximum strength and security to the glassware in the event of a significant earthquake. The rack preferably is rectangular and may be made of a variety of chemically resistant materials including polypropylene, polycarbonate, and stainless steel. It comprises a first plurality of parallel vertical walls, and a second plurality of parallel vertical walls, perpendicular to the first. These intersecting vertical walls comprise a self-supporting structure without a bottom which sits on four legs. The top surface of the rack is formed by the top edges of all the vertical walls, which are not parallel but are skewed in three dimensions. These top edges form a grid matrix having a number of intersections of the vertical walls which define a number of rectangular compartments having varying widths and lengths and varying heights.
Interaction of a finite-length ion beam with a background plasma - Reflected ions at the quasi-parallel bow shock

NASA Technical Reports Server (NTRS)

Onsager, T. G.; Winske, D.; Thomsen, M. F.

1991-01-01

The coupling of a finite-length, field-aligned, ion beam with a uniform background plasma is investigated using one-dimensional hybrid computer simulations. The finite-length beam is used to study the interaction between the incident solar wind and ions reflected from the earth's quasi-parallel bow shock, where the reflection process may vary with time. The coupling between the reflected ions and the solar wind is relevant to ion heating at the bow shock and possibly to the formation of hot, flow anomalies and re-formation of the shock itself. Consistent with linear theory, the waves which dominate the interaction are the electromagnetic right-hand polarized resonant and nonresonant modes. However, in addition to the instability growth rates, the length of time that the waves are in contact with the beam is also an important factor in determining which wave mode will dominate the interaction. It is found that interaction will result in strong coupling, where a significant fraction of the available free energy is converted into thermal energy in a short time, provided the beam is sufficiently dense or sufficiently long.
High accuracy mantle convection simulation through modern numerical methods - II: realistic models and problems

NASA Astrophysics Data System (ADS)

Heister, Timo; Dannberg, Juliane; Gassmöller, Rene; Bangerth, Wolfgang

2017-08-01

Computations have helped elucidate the dynamics of Earth's mantle for several decades already. The numerical methods that underlie these simulations have greatly evolved within this time span, and today include dynamically changing and adaptively refined meshes, sophisticated and efficient solvers, and parallelization to large clusters of computers. At the same time, many of the methods - discussed in detail in a previous paper in this series - were developed and tested primarily using model problems that lack many of the complexities that are common to the realistic models our community wants to solve today. With several years of experience solving complex and realistic models, we here revisit some of the algorithm designs of the earlier paper and discuss the incorporation of more complex physics. In particular, we re-consider time stepping and mesh refinement algorithms, evaluate approaches to incorporate compressibility, and discuss dealing with strongly varying material coefficients, latent heat, and how to track chemical compositions and heterogeneities. Taken together and implemented in a high-performance, massively parallel code, the techniques discussed in this paper then allow for high resolution, 3-D, compressible, global mantle convection simulations with phase transitions, strongly temperature dependent viscosity and realistic material properties based on mineral physics data.
Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations

PubMed Central

Hallock, Michael J.; Stone, John E.; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida

2014-01-01

Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli. Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems. PMID:24882911
Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations.

PubMed

Hallock, Michael J; Stone, John E; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida

2014-05-01

Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli . Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems.
Nonlinear Dynamics of a Multistage Gear Transmission System with Multi-Clearance

NASA Astrophysics Data System (ADS)

Xiang, Ling; Zhang, Yue; Gao, Nan; Hu, Aijun; Xing, Jingtang

The nonlinear torsional model of a multistage gear transmission system which consists of a planetary gear and two parallel gear stages is established with time-varying meshing stiffness, comprehensive gear error and multi-clearance. The nonlinear dynamic responses are analyzed by applying the reference of backlash bifurcation parameters. The motions of the system on the change of backlash are identified through global bifurcation diagram, largest Lyapunov exponent (LLE), FFT spectra, Poincaré maps, the phase diagrams and time series. The numerical results demonstrate that the system exhibits rich features of nonlinear dynamics such as the periodic motion, nonperiodic states and chaotic states. It is found that the sun-planet backlash has more complex effect on the system than the ring-planet backlash. The motions of the system with backlash of parallel gear are diverse including some different multi-periodic motions. Furthermore, the state of the system can change from chaos into quasi-periodic behavior, which means that the dynamic behavior of the system is composed of more stable components with the increase of the backlash. Correspondingly, the parameters of the system should be designed properly and controlled timely for better operation and enhancing the life of the system.

Anatomically constrained neural network models for the categorization of facial expression

NASA Astrophysics Data System (ADS)

McMenamin, Brenton W.; Assadi, Amir H.

2004-12-01

The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
Anatomically constrained neural network models for the categorization of facial expression

NASA Astrophysics Data System (ADS)

McMenamin, Brenton W.; Assadi, Amir H.

2005-01-01

The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
Test Reliability at the Individual Level

PubMed Central

Hu, Yueqin; Nesselroade, John R.; Erbacher, Monica K.; Boker, Steven M.; Burt, S. Alexandra; Keel, Pamela K.; Neale, Michael C.; Sisk, Cheryl L.; Klump, Kelly

2016-01-01

Reliability has a long history as one of the key psychometric properties of a test. However, a given test might not measure people equally reliably. Test scores from some individuals may have considerably greater error than others. This study proposed two approaches using intraindividual variation to estimate test reliability for each person. A simulation study suggested that the parallel tests approach and the structural equation modeling approach recovered the simulated reliability coefficients. Then in an empirical study, where forty-five females were measured daily on the Positive and Negative Affect Schedule (PANAS) for 45 consecutive days, separate estimates of reliability were generated for each person. Results showed that reliability estimates of the PANAS varied substantially from person to person. The methods provided in this article apply to tests measuring changeable attributes and require repeated measures across time on each individual. This article also provides a set of parallel forms of PANAS. PMID:28936107
Parallel Polarization State Generation

NASA Astrophysics Data System (ADS)

She, Alan; Capasso, Federico

2016-05-01

The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
AFFINE-CORRECTED PARADISE: FREE-BREATHING PATIENT-ADAPTIVE CARDIAC MRI WITH SENSITIVITY ENCODING

PubMed Central

Sharif, Behzad; Bresler, Yoram

2013-01-01

We propose a real-time cardiac imaging method with parallel MRI that allows for free breathing during imaging and does not require cardiac or respiratory gating. The method is based on the recently proposed PARADISE (Patient-Adaptive Reconstruction and Acquisition Dynamic Imaging with Sensitivity Encoding) scheme. The new acquisition method adapts the PARADISE k-t space sampling pattern according to an affine model of the respiratory motion. The reconstruction scheme involves multi-channel time-sequential imaging with time-varying channels. All model parameters are adapted to the imaged patient as part of the experiment and drive both data acquisition and cine reconstruction. Simulated cardiac MRI experiments using the realistic NCAT phantom show high quality cine reconstructions and robustness to modeling inaccuracies. PMID:24390159
Learning, memory, and the role of neural network architecture.

PubMed

Hermundstad, Ann M; Brown, Kevin S; Bassett, Danielle S; Carlson, Jean M

2011-06-01

The performance of information processing systems, from artificial neural networks to natural neuronal ensembles, depends heavily on the underlying system architecture. In this study, we compare the performance of parallel and layered network architectures during sequential tasks that require both acquisition and retention of information, thereby identifying tradeoffs between learning and memory processes. During the task of supervised, sequential function approximation, networks produce and adapt representations of external information. Performance is evaluated by statistically analyzing the error in these representations while varying the initial network state, the structure of the external information, and the time given to learn the information. We link performance to complexity in network architecture by characterizing local error landscape curvature. We find that variations in error landscape structure give rise to tradeoffs in performance; these include the ability of the network to maximize accuracy versus minimize inaccuracy and produce specific versus generalizable representations of information. Parallel networks generate smooth error landscapes with deep, narrow minima, enabling them to find highly specific representations given sufficient time. While accurate, however, these representations are difficult to generalize. In contrast, layered networks generate rough error landscapes with a variety of local minima, allowing them to quickly find coarse representations. Although less accurate, these representations are easily adaptable. The presence of measurable performance tradeoffs in both layered and parallel networks has implications for understanding the behavior of a wide variety of natural and artificial learning systems.
Evaluation of parallel milliliter-scale stirred-tank bioreactors for the study of biphasic whole-cell biocatalysis with ionic liquids.

PubMed

Dennewald, Danielle; Hortsch, Ralf; Weuster-Botz, Dirk

2012-01-01

As clear structure-activity relationships are still rare for ionic liquids, preliminary experiments are necessary for the process development of biphasic whole-cell processes involving these solvents. To reduce the time investment and the material costs, the process development of such biphasic reaction systems would profit from a small-scale high-throughput platform. Exemplarily, the reduction of 2-octanone to (R)-2-octanol by a recombinant Escherichia coli in a biphasic ionic liquid/water system was studied in a miniaturized stirred-tank bioreactor system allowing the parallel operation of up to 48 reactors at the mL-scale. The results were compared to those obtained in a 20-fold larger stirred-tank reactor. The maximum local energy dissipation was evaluated at the larger scale and compared to the data available for the small-scale reactors, to verify if similar mass transfer could be obtained at both scales. Thereafter, the reaction kinetics and final conversions reached in different reactions setups were analysed. The results were in good agreement between both scales for varying ionic liquids and for ionic liquid volume fractions up to 40%. The parallel bioreactor system can thus be used for the process development of the majority of biphasic reaction systems involving ionic liquids, reducing the time and resource investment during the process development of this type of applications. Copyright © 2011. Published by Elsevier B.V.
Relation Between Roughness of Interface and Adherence of Porcelain Enamel to Steel

NASA Technical Reports Server (NTRS)

Richmond, J C; Moore, D G; Kirkpatrick, H B; Harrison, W N

1954-01-01

Porcelain-enamel ground coats were prepared and applied under conditions that gave various degrees of adherence between enamel and a low-carbon steel (enameling iron). The variations in adherence were produced by (a) varying the amount of cobalt-oxide addition in the frit, (b) varying the type of metallic-oxide addition in the frit, keeping the amount constant at 0.8 weight percent, (c) varying the surface treatment of the metal before application of the enamel, by pickling, sandblasting, and polishing, and (d) varying the time of firing of the enamel containing 0.8 percent of cobalt oxide. Specimens of each enamel were given the standard adherence test of the Porcelain Enamel Institute. Metallographic sections were made on which the roughness of interface was evaluated by counting the number of anchor points (undercuts) per centimeter of specimen length and also by measuring the length of the interface and expressing results as the ratio of this length to the length of a straight line parallel to the over-all direction of the interface.
Solid-phase proximity ligation assays for individual or parallel protein analyses with readout via real-time PCR or sequencing.

PubMed

Nong, Rachel Yuan; Wu, Di; Yan, Junhong; Hammond, Maria; Gu, Gucci Jijuan; Kamali-Moghaddam, Masood; Landegren, Ulf; Darmanis, Spyros

2013-06-01

Solid-phase proximity ligation assays share properties with the classical sandwich immunoassays for protein detection. The proteins captured via antibodies on solid supports are, however, detected not by single antibodies with detectable functions, but by pairs of antibodies with attached DNA strands. Upon recognition by these sets of three antibodies, pairs of DNA strands brought in proximity are joined by ligation. The ligated reporter DNA strands are then detected via methods such as real-time PCR or next-generation sequencing (NGS). We describe how to construct assays that can offer improved detection specificity by virtue of recognition by three antibodies, as well as enhanced sensitivity owing to reduced background and amplified detection. Finally, we also illustrate how the assays can be applied for parallel detection of proteins, taking advantage of the oligonucleotide ligation step to avoid background problems that might arise with multiplexing. The protocol for the singleplex solid-phase proximity ligation assay takes ~5 h. The multiplex version of the assay takes 7-8 h depending on whether quantitative PCR (qPCR) or sequencing is used as the readout. The time for the sequencing-based protocol includes the library preparation but not the actual sequencing, as times may vary based on the choice of sequencing platform.
Controller Strategies for Automation Tool Use under Varying Levels of Trajectory Prediction Uncertainty

NASA Technical Reports Server (NTRS)

Morey, Susan; Prevot, Thomas; Mercer, Joey; Martin, Lynne; Bienert, Nancy; Cabrall, Christopher; Hunt, Sarah; Homola, Jeffrey; Kraut, Joshua

2013-01-01

A human-in-the-loop simulation was conducted to examine the effects of varying levels of trajectory prediction uncertainty on air traffic controller workload and performance, as well as how strategies and the use of decision support tools change in response. This paper focuses on the strategies employed by two controllers from separate teams who worked in parallel but independently under identical conditions (airspace, arrival traffic, tools) with the goal of ensuring schedule conformance and safe separation for a dense arrival flow in en route airspace. Despite differences in strategy and methods, both controllers achieved high levels of schedule conformance and safe separation. Overall, results show that trajectory uncertainties introduced by wind and aircraft performance prediction errors do not affect the controllers' ability to manage traffic. Controller strategies were fairly robust to changes in error, though strategies were affected by the amount of delay to absorb (scheduled time of arrival minus estimated time of arrival). Using the results and observations, this paper proposes an ability to dynamically customize the display of information including delay time based on observed error to better accommodate different strategies and objectives.
Applications of New Surrogate Global Optimization Algorithms including Efficient Synchronous and Asynchronous Parallelism for Calibration of Expensive Nonlinear Geophysical Simulation Models.

NASA Astrophysics Data System (ADS)

Shoemaker, C. A.; Pang, M.; Akhtar, T.; Bindel, D.

2016-12-01

New parallel surrogate global optimization algorithms are developed and applied to objective functions that are expensive simulations (possibly with multiple local minima). The algorithms can be applied to most geophysical simulations, including those with nonlinear partial differential equations. The optimization does not require simulations be parallelized. Asynchronous (and synchronous) parallel execution is available in the optimization toolbox "pySOT". The parallel algorithms are modified from serial to eliminate fine grained parallelism. The optimization is computed with open source software pySOT, a Surrogate Global Optimization Toolbox that allows user to pick the type of surrogate (or ensembles), the search procedure on surrogate, and the type of parallelism (synchronous or asynchronous). pySOT also allows the user to develop new algorithms by modifying parts of the code. In the applications here, the objective function takes up to 30 minutes for one simulation, and serial optimization can take over 200 hours. Results from Yellowstone (NSF) and NCSS (Singapore) supercomputers are given for groundwater contaminant hydrology simulations with applications to model parameter estimation and decontamination management. All results are compared with alternatives. The first results are for optimization of pumping at many wells to reduce cost for decontamination of groundwater at a superfund site. The optimization runs with up to 128 processors. Superlinear speed up is obtained for up to 16 processors, and efficiency with 64 processors is over 80%. Each evaluation of the objective function requires the solution of nonlinear partial differential equations to describe the impact of spatially distributed pumping and model parameters on model predictions for the spatial and temporal distribution of groundwater contaminants. The second application uses an asynchronous parallel global optimization for groundwater quality model calibration. The time for a single objective function evaluation varies unpredictably, so efficiency is improved with asynchronous parallel calculations to improve load balancing. The third application (done at NCSS) incorporates new global surrogate multi-objective parallel search algorithms into pySOT and applies it to a large watershed calibration problem.
222Rn transport in a fractured crystalline rock aquifer: Results from numerical simulations

USGS Publications Warehouse

Folger, P.F.; Poeter, E.; Wanty, R.B.; Day, W.; Frishman, D.

1997-01-01

Dissolved 222Rn concentrations in ground water from a small wellfield underlain by fractured Middle Proterozoic Pikes Peak Granite southwest of Denver, Colorado range from 124 to 840 kBq m-3 (3360-22700 pCi L-1). Numerical simulations of flow and transport between two wells show that differences in equivalent hydraulic aperture of transmissive fractures, assuming a simplified two-fracture system and the parallel-plate model, can account for the different 222Rn concentrations in each well under steady-state conditions. Transient flow and transport simulations show that 222Rn concentrations along the fracture profile are influenced by 222Rn concentrations in the adjoining fracture and depend on boundary conditions, proximity of the pumping well to the fracture intersection, transmissivity of the conductive fractures, and pumping rate. Non-homogeneous distribution (point sources) of 222Rn parent radionuclides, uranium and 226Ra, can strongly perturb the dissolved 222Rn concentrations in a fracture system. Without detailed information on the geometry and hydraulic properties of the connected fracture system, it may be impossible to distinguish the influence of factors controlling 222Rn distribution or to determine location of 222Rn point sources in the field in areas where ground water exhibits moderate 222Rn concentrations. Flow and transport simulations of a hypothetical multifracture system consisting of ten connected fractures, each 10 m in length with fracture apertures ranging from 0.1 to 1.0 mm, show that 222Rn concentrations at the pumping well can vary significantly over time. Assuming parallel-plate flow, transmissivities of the hypothetical system vary over four orders of magnitude because transmissivity varies with the cube of fracture aperture. The extreme hydraulic heterogeneity of the simple hypothetical system leads to widely ranging 222Rn values, even assuming homogeneous distribution of uranium and 226Ra along fracture walls. Consequently, it is concluded that 222Rn concentrations vary, not only with the geometric and stress factors noted above, but also according to local fracture aperture distribution, local groundwater residence time, and flux of 222Rn from parent radionuclides along fracture walls.
TU-AB-BRC-12: Optimized Parallel MonteCarlo Dose Calculations for Secondary MU Checks

DOE Office of Scientific and Technical Information (OSTI.GOV)

French, S; Nazareth, D; Bellor, M

Purpose: Secondary MU checks are an important tool used during a physics review of a treatment plan. Commercial software packages offer varying degrees of theoretical dose calculation accuracy, depending on the modality involved. Dose calculations of VMAT plans are especially prone to error due to the large approximations involved. Monte Carlo (MC) methods are not commonly used due to their long run times. We investigated two methods to increase the computational efficiency of MC dose simulations with the BEAMnrc code. Distributed computing resources, along with optimized code compilation, will allow for accurate and efficient VMAT dose calculations. Methods: The BEAMnrcmore » package was installed on a high performance computing cluster accessible to our clinic. MATLAB and PYTHON scripts were developed to convert a clinical VMAT DICOM plan into BEAMnrc input files. The BEAMnrc installation was optimized by running the VMAT simulations through profiling tools which indicated the behavior of the constituent routines in the code, e.g. the bremsstrahlung splitting routine, and the specified random number generator. This information aided in determining the most efficient compiling parallel configuration for the specific CPU’s available on our cluster, resulting in the fastest VMAT simulation times. Our method was evaluated with calculations involving 10{sup 8} – 10{sup 9} particle histories which are sufficient to verify patient dose using VMAT. Results: Parallelization allowed the calculation of patient dose on the order of 10 – 15 hours with 100 parallel jobs. Due to the compiler optimization process, further speed increases of 23% were achieved when compared with the open-source compiler BEAMnrc packages. Conclusion: Analysis of the BEAMnrc code allowed us to optimize the compiler configuration for VMAT dose calculations. In future work, the optimized MC code, in conjunction with the parallel processing capabilities of BEAMnrc, will be applied to provide accurate and efficient secondary MU checks.« less
Transport of cosmic-ray protons in intermittent heliospheric turbulence: Model and simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alouani-Bibi, Fathallah; Le Roux, Jakobus A., E-mail: fb0006@uah.edu

The transport of charged energetic particles in the presence of strong intermittent heliospheric turbulence is computationally analyzed based on known properties of the interplanetary magnetic field and solar wind plasma at 1 astronomical unit. The turbulence is assumed to be static, composite, and quasi-three-dimensional with a varying energy distribution between a one-dimensional Alfvénic (slab) and a structured two-dimensional component. The spatial fluctuations of the turbulent magnetic field are modeled either as homogeneous with a Gaussian probability distribution function (PDF), or as intermittent on large and small scales with a q-Gaussian PDF. Simulations showed that energetic particle diffusion coefficients both parallelmore » and perpendicular to the background magnetic field are significantly affected by intermittency in the turbulence. This effect is especially strong for parallel transport where for large-scale intermittency results show an extended phase of subdiffusive parallel transport during which cross-field transport diffusion dominates. The effects of intermittency are found to depend on particle rigidity and the fraction of slab energy in the turbulence, yielding a perpendicular to parallel mean free path ratio close to 1 for large-scale intermittency. Investigation of higher order transport moments (kurtosis) indicates that non-Gaussian statistical properties of the intermittent turbulent magnetic field are present in the parallel transport, especially for low rigidity particles at all times.« less
Analysis of biases from parallel observations of co-located manual and automatic weather stations in Indonesia

NASA Astrophysics Data System (ADS)

Sopaheluwakan, Ardhasena; Fajariana, Yuaning; Satyaningsih, Ratna; Aprilina, Kharisma; Astuti Nuraini, Tri; Ummiyatul Badriyah, Imelda; Lukita Sari, Dyah; Haryoko, Urip

2017-04-01

Inhomogeneities are often found in long records of climate data. These can occur because of various reasons, among others such as relocation of observation site, changes in observation method, and the transition to automated instruments. Changes to these automated systems are inevitable, and it is taking place worldwide in many of the National Meteorological Services. However this shift of observational practice must be done cautiously and a sufficient period of parallel observation of co-located manual and automated systems should take place as suggested by the World Meteorological Organization. With a sufficient parallel observation period, biases between the two systems can be analyzed. In this study we analyze the biases of a yearlong parallel observation of manual and automatic weather stations in 30 locations in Indonesia. The location of the sites spans from east to west of approximately 45 longitudinal degrees covering different climate characteristics and geographical settings. We study measurements taken by both sensors for temperature and rainfall parameters. We found that the biases from both systems vary from place to place and are more dependent to the setting of the instrument rather than to the climatic and geographical factors. For instance, daytime observations of the automatic weather stations are found to be consistently higher than the manual observation, and vice versa night time observations of the automatic weather stations are lower than the manual observation.
Observations of Magnetosphere-Ionosphere Coupling Processes in Jupiter's Downward Auroral Current Region

NASA Astrophysics Data System (ADS)

Clark, G. B.; Mauk, B.; Allegrini, F.; Bagenal, F.; Bolton, S. J.; Bunce, E. J.; Connerney, J. E. P.; Ebert, R. W.; Gershman, D. J.; Gladstone, R.; Haggerty, D. K.; Hospodarsky, G. B.; Kotsiaros, S.; Kollmann, P.; Kurth, W. S.; Levin, S.; McComas, D. J.; Paranicas, C.; Rymer, A. M.; Saur, J.; Szalay, J. R.; Tetrick, S.; Valek, P. W.

2017-12-01

Our view and understanding of Jupiter's auroral regions are ever-changing as Juno continues to map out this region with every auroral pass. For example, since last year's Fall AGU and the release of publications regarding the first perijove orbit, the Juno particles and fields teams have found direct evidence of parallel potential drops in addition to the stochastic broad energy distributions associated with the downward current auroral acceleration region. In this region, which appears to exist in an altitude range of 1.5-3 Jovian radii, the potential drops can reach as high as several megavolts. Associated with these potentials are anti-planetward electron angle beams, energetic ion conics and precipitating protons, oxygen and sulfur. Sometimes the potentials within the downward current region are structured such that they look like the inverted-V type distributions typically found in Earth's upward current region. This is true for both the ion and electron energy distributions. Other times, the parallel potentials appear to be intermittent or spatially structured in a way such that they do not look like the canonical diverging electrostatic potential structure. Furthermore, the parallel potentials vary grossly in spatial/temporal scale, peak voltage and associated parallel current density. Here, we present a comprehensive study of these structures in Jupiter's downward current region focusing on energetic particle measurements from Juno-JEDI.
A nonvoxel-based dose convolution/superposition algorithm optimized for scalable GPU architectures.

PubMed

Neylon, J; Sheng, K; Yu, V; Chen, Q; Low, D A; Kupelian, P; Santhanam, A

2014-10-01

Real-time adaptive planning and treatment has been infeasible due in part to its high computational complexity. There have been many recent efforts to utilize graphics processing units (GPUs) to accelerate the computational performance and dose accuracy in radiation therapy. Data structure and memory access patterns are the key GPU factors that determine the computational performance and accuracy. In this paper, the authors present a nonvoxel-based (NVB) approach to maximize computational and memory access efficiency and throughput on the GPU. The proposed algorithm employs a ray-tracing mechanism to restructure the 3D data sets computed from the CT anatomy into a nonvoxel-based framework. In a process that takes only a few milliseconds of computing time, the algorithm restructured the data sets by ray-tracing through precalculated CT volumes to realign the coordinate system along the convolution direction, as defined by zenithal and azimuthal angles. During the ray-tracing step, the data were resampled according to radial sampling and parallel ray-spacing parameters making the algorithm independent of the original CT resolution. The nonvoxel-based algorithm presented in this paper also demonstrated a trade-off in computational performance and dose accuracy for different coordinate system configurations. In order to find the best balance between the computed speedup and the accuracy, the authors employed an exhaustive parameter search on all sampling parameters that defined the coordinate system configuration: zenithal, azimuthal, and radial sampling of the convolution algorithm, as well as the parallel ray spacing during ray tracing. The angular sampling parameters were varied between 4 and 48 discrete angles, while both radial sampling and parallel ray spacing were varied from 0.5 to 10 mm. The gamma distribution analysis method (γ) was used to compare the dose distributions using 2% and 2 mm dose difference and distance-to-agreement criteria, respectively. Accuracy was investigated using three distinct phantoms with varied geometries and heterogeneities and on a series of 14 segmented lung CT data sets. Performance gains were calculated using three 256 mm cube homogenous water phantoms, with isotropic voxel dimensions of 1, 2, and 4 mm. The nonvoxel-based GPU algorithm was independent of the data size and provided significant computational gains over the CPU algorithm for large CT data sizes. The parameter search analysis also showed that the ray combination of 8 zenithal and 8 azimuthal angles along with 1 mm radial sampling and 2 mm parallel ray spacing maintained dose accuracy with greater than 99% of voxels passing the γ test. Combining the acceleration obtained from GPU parallelization with the sampling optimization, the authors achieved a total performance improvement factor of >175 000 when compared to our voxel-based ground truth CPU benchmark and a factor of 20 compared with a voxel-based GPU dose convolution method. The nonvoxel-based convolution method yielded substantial performance improvements over a generic GPU implementation, while maintaining accuracy as compared to a CPU computed ground truth dose distribution. Such an algorithm can be a key contribution toward developing tools for adaptive radiation therapy systems.
A nonvoxel-based dose convolution/superposition algorithm optimized for scalable GPU architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Neylon, J., E-mail: jneylon@mednet.ucla.edu; Sheng, K.; Yu, V.

Purpose: Real-time adaptive planning and treatment has been infeasible due in part to its high computational complexity. There have been many recent efforts to utilize graphics processing units (GPUs) to accelerate the computational performance and dose accuracy in radiation therapy. Data structure and memory access patterns are the key GPU factors that determine the computational performance and accuracy. In this paper, the authors present a nonvoxel-based (NVB) approach to maximize computational and memory access efficiency and throughput on the GPU. Methods: The proposed algorithm employs a ray-tracing mechanism to restructure the 3D data sets computed from the CT anatomy intomore » a nonvoxel-based framework. In a process that takes only a few milliseconds of computing time, the algorithm restructured the data sets by ray-tracing through precalculated CT volumes to realign the coordinate system along the convolution direction, as defined by zenithal and azimuthal angles. During the ray-tracing step, the data were resampled according to radial sampling and parallel ray-spacing parameters making the algorithm independent of the original CT resolution. The nonvoxel-based algorithm presented in this paper also demonstrated a trade-off in computational performance and dose accuracy for different coordinate system configurations. In order to find the best balance between the computed speedup and the accuracy, the authors employed an exhaustive parameter search on all sampling parameters that defined the coordinate system configuration: zenithal, azimuthal, and radial sampling of the convolution algorithm, as well as the parallel ray spacing during ray tracing. The angular sampling parameters were varied between 4 and 48 discrete angles, while both radial sampling and parallel ray spacing were varied from 0.5 to 10 mm. The gamma distribution analysis method (γ) was used to compare the dose distributions using 2% and 2 mm dose difference and distance-to-agreement criteria, respectively. Accuracy was investigated using three distinct phantoms with varied geometries and heterogeneities and on a series of 14 segmented lung CT data sets. Performance gains were calculated using three 256 mm cube homogenous water phantoms, with isotropic voxel dimensions of 1, 2, and 4 mm. Results: The nonvoxel-based GPU algorithm was independent of the data size and provided significant computational gains over the CPU algorithm for large CT data sizes. The parameter search analysis also showed that the ray combination of 8 zenithal and 8 azimuthal angles along with 1 mm radial sampling and 2 mm parallel ray spacing maintained dose accuracy with greater than 99% of voxels passing the γ test. Combining the acceleration obtained from GPU parallelization with the sampling optimization, the authors achieved a total performance improvement factor of >175 000 when compared to our voxel-based ground truth CPU benchmark and a factor of 20 compared with a voxel-based GPU dose convolution method. Conclusions: The nonvoxel-based convolution method yielded substantial performance improvements over a generic GPU implementation, while maintaining accuracy as compared to a CPU computed ground truth dose distribution. Such an algorithm can be a key contribution toward developing tools for adaptive radiation therapy systems.« less
Simultaneous fluoroscopic and nuclear imaging: impact of collimator choice on nuclear image quality.

PubMed

van der Velden, Sandra; Beijst, Casper; Viergever, Max A; de Jong, Hugo W A M

2017-01-01

X-ray-guided oncological interventions could benefit from the availability of simultaneously acquired nuclear images during the procedure. To this end, a real-time, hybrid fluoroscopic and nuclear imaging device, consisting of an X-ray c-arm combined with gamma imaging capability, is currently being developed (Beijst C, Elschot M, Viergever MA, de Jong HW. Radiol. 2015;278:232-238). The setup comprises four gamma cameras placed adjacent to the X-ray tube. The four camera views are used to reconstruct an intermediate three-dimensional image, which is subsequently converted to a virtual nuclear projection image that overlaps with the X-ray image. The purpose of the present simulation study is to evaluate the impact of gamma camera collimator choice (parallel hole versus pinhole) on the quality of the virtual nuclear image. Simulation studies were performed with a digital image quality phantom including realistic noise and resolution effects, with a dynamic frame acquisition time of 1 s and a total activity of 150 MBq. Projections were simulated for 3, 5, and 7 mm pinholes and for three parallel hole collimators (low-energy all-purpose (LEAP), low-energy high-resolution (LEHR) and low-energy ultra-high-resolution (LEUHR)). Intermediate reconstruction was performed with maximum likelihood expectation-maximization (MLEM) with point spread function (PSF) modeling. In the virtual projection derived therefrom, contrast, noise level, and detectability were determined and compared with the ideal projection, that is, as if a gamma camera were located at the position of the X-ray detector. Furthermore, image deformations and spatial resolution were quantified. Additionally, simultaneous fluoroscopic and nuclear images of a sphere phantom were acquired with a physical prototype system and compared with the simulations. For small hot spots, contrast is comparable for all simulated collimators. Noise levels are, however, 3 to 8 times higher in pinhole geometries than in parallel hole geometries. This results in higher contrast-to-noise ratios for parallel hole geometries. Smaller spheres can thus be detected with parallel hole collimators than with pinhole collimators (17 mm vs 28 mm). Pinhole geometries show larger image deformations than parallel hole geometries. Spatial resolution varied between 1.25 cm for the 3 mm pinhole and 4 cm for the LEAP collimator. The simulation method was successfully validated by the experiments with the physical prototype. A real-time hybrid fluoroscopic and nuclear imaging device is currently being developed. Image quality of nuclear images obtained with different collimators was compared in terms of contrast, noise, and detectability. Parallel hole collimators showed lower noise and better detectability than pinhole collimators. © 2016 American Association of Physicists in Medicine.
Parallel Calculation of Sensitivity Derivatives for Aircraft Design using Automatic Differentiation

NASA Technical Reports Server (NTRS)

Bischof, c. H.; Green, L. L.; Haigler, K. J.; Knauff, T. L., Jr.

1994-01-01

Sensitivity derivative (SD) calculation via automatic differentiation (AD) typical of that required for the aerodynamic design of a transport-type aircraft is considered. Two ways of computing SD via code generated by the ADIFOR automatic differentiation tool are compared for efficiency and applicability to problems involving large numbers of design variables. A vector implementation on a Cray Y-MP computer is compared with a coarse-grained parallel implementation on an IBM SP1 computer, employing a Fortran M wrapper. The SD are computed for a swept transport wing in turbulent, transonic flow; the number of geometric design variables varies from 1 to 60 with coupling between a wing grid generation program and a state-of-the-art, 3-D computational fluid dynamics program, both augmented for derivative computation via AD. For a small number of design variables, the Cray Y-MP implementation is much faster. As the number of design variables grows, however, the IBM SP1 becomes an attractive alternative in terms of compute speed, job turnaround time, and total memory available for solutions with large numbers of design variables. The coarse-grained parallel implementation also can be moved easily to a network of workstations.

Optimizing SIEM Throughput on the Cloud Using Parallelization

PubMed Central

Alam, Masoom; Ihsan, Asif; Javaid, Qaisar; Khan, Abid; Manzoor, Jawad; Akhundzada, Adnan; Khan, M Khurram; Farooq, Sajid

2016-01-01

Processing large amounts of data in real time for identifying security issues pose several performance challenges, especially when hardware infrastructure is limited. Managed Security Service Providers (MSSP), mostly hosting their applications on the Cloud, receive events at a very high rate that varies from a few hundred to a couple of thousand events per second (EPS). It is critical to process this data efficiently, so that attacks could be identified quickly and necessary response could be initiated. This paper evaluates the performance of a security framework OSTROM built on the Esper complex event processing (CEP) engine under a parallel and non-parallel computational framework. We explain three architectures under which Esper can be used to process events. We investigated the effect on throughput, memory and CPU usage in each configuration setting. The results indicate that the performance of the engine is limited by the number of events coming in rather than the queries being processed. The architecture where 1/4th of the total events are submitted to each instance and all the queries are processed by all the units shows best results in terms of throughput, memory and CPU usage. PMID:27851762
Experimental Study of the Roles of Mechanical and Hydrologic Properties in the Initiation of Natural Hydraulic Fractures

NASA Astrophysics Data System (ADS)

French, M. E.; Goodwin, L. B.; Boutt, D. F.; Lilydahl, H.

2008-12-01

Natural hydraulic fractures (NHFs) are inferred to form where pore fluid pressure exceeds the least compressive stress; i.e., where the hydraulic fracture criterion is met. Although it has been shown that mechanical heterogeneities serve as nuclei for NHFs, the relative roles of mechanical anisotropy and hydrologic properties in initiating NHFs in porous granular media have not been fully explored. We designed an experimental protocol that produces a pore fluid pressure high enough to exceed the hydraulic fracture criterion, allowing us to initiate NHFs in the laboratory. Initially, cylindrical samples 13 cm long and 5 cm in diameter are saturated, σ1 is radial, and σ3 is axial. By dropping the end load (σ3) and pore fluid pressure simultaneously at the end caps, we produce a large pore fluid pressure gradient parallel to the long axis of the sample. This allows us to meet the hydraulic fracture criterion without separating the sample from its end caps. The time over which the pore fluid remains elevated is a function of hydraulic diffusivity. An initial test with a low diffusivity sandstone produced NHFs parallel to bedding laminae that were optimally oriented for failure. To evaluate the relative importance of mechanical heterogeneities such as bedding versus hydraulic properties, we are currently investigating variably cemented St. Peter sandstone. This quartz arenite exhibits a wide range of primary structures, from well developed bedding laminae to locally massive sandstone. Diagenesis has locally accentuated these structures, causing degree of cementation to vary with bedding, and the sandstone locally exhibits concretions that form elliptical rather than tabular heterogeneities. Bulk permeability varies from k=10-12 m2 to k=10-15 m2 and porosity varies from 5% to 28% in this suite of samples. Variations in a single sample are smaller, with permeability varying no more than an order of magnitude within a single core. Air minipermeameter and tracer tests document this variability at the cm scale. Experiments will be performed with σ3 and the pore pressure gradient both perpendicular and parallel to sub-cm scale bedding. The results of these tests will be compared to those of structurally homogeneous samples and samples with elliptical heterogeneities.
Automated problem scheduling and reduction of synchronization delay effects

NASA Technical Reports Server (NTRS)

Saltz, Joel H.

1987-01-01

It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. A framework is presented for partitioning very sparse triangular systems of linear equations that is designed to produce favorable preformance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because: (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acrylic graphs, and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation. Simple expressions are derived that describe how to schedule computational work with varying degrees of granularity. The Encore Multimax was used as a hardware simulator to investigate the performance effects of using the partitioning techniques presented in shared memory architectures with varying relative synchronization costs.
Software Design for Real-Time Systems on Parallel Computers: Formal Specifications.

DTIC Science & Technology

1996-04-01

This research investigated the important issues related to the analysis and design of real - time systems targeted to parallel architectures. In...particular, the software specification models for real - time systems on parallel architectures were evaluated. A survey of current formal methods for...uniprocessor real - time systems specifications was conducted to determine their extensibility in specifying real - time systems on parallel architectures. In
Monte Carlo simulation on the effect of different approaches to thalassaemia on gene frequency.

PubMed

Habibzadeh, F; Yadollahie, M

2006-01-01

We used computer simulation to determine variation in gene, heterozygous and homozygous frequencies induced by 4 different approaches to thalassaemia. These were: supportive therapy only; treat homozygous patients with a hypothetical modality phenotypically only; abort all homozygous fetuses; and prevent marriage between gene carriers. Gene frequency becomes constant with the second or the fourth strategy, and falls over time with the first or the third strategy. Heterozygous frequency varies in parallel with gene frequency. Using the first strategy, homozygous frequency falls over time; with the second strategy it becomes constant; and with the third and fourth strategies it falls to zero after the first generation. No matter which strategy is used, the population gene frequency, in the worst case, will remain constant over time.
Sequential Transition Patterns of Preschoolers' Social Interactions during Child-Initiated Play: Is Parallel-Aware Play a Bidirectional Bridge to Other Play States?

ERIC Educational Resources Information Center

Robinson, Clyde C.; Anderson, Genan T.; Porter, Christin L.; Hart, Craig, H.; Wouden-Miller, Melissa

2003-01-01

Explored the simultaneous sequential transition patterns of preschoolers' social play within classroom settings. Found that the proportion of social-play states did not vary during play episodes even when accounting for type of activity center, gender, and SES. Found a reciprocal relationship between parallel-aware and other social-play states…
Deformation, crystal preferred orientations, and seismic anisotropy in the Earth's D″ layer

NASA Astrophysics Data System (ADS)

Tommasi, Andréa; Goryaeva, Alexandra; Carrez, Philippe; Cordier, Patrick; Mainprice, David

2018-06-01

We use a forward multiscale model that couples atomistic modeling of intracrystalline plasticity mechanisms (dislocation glide ± twinning) in MgSiO3 post-perovskite (PPv) and periclase (MgO) at lower mantle pressures and temperatures to polycrystal plasticity simulations to predict crystal preferred orientations (CPO) development and seismic anisotropy in D″. We model the CPO evolution in aggregates of 70% PPv and 30% MgO submitted to simple shear, axial shortening, and along corner-flow streamlines, which simulate changes in flow orientation similar to those expected at the transition between a downwelling and flow parallel to the core-mantle boundary (CMB) within D″ or between CMB-parallel flow and upwelling at the borders of the large low shear wave velocity provinces (LLSVP) in the lowermost mantle. Axial shortening results in alignment of PPv [010] axes with the shortening direction. Simple shear produces PPv CPO with a monoclinic symmetry that rapidly rotates towards parallelism between the dominant [100](010) slip system and the macroscopic shear. These predictions differ from MgSiO3 post-perovskite textures formed in diamond-anvil cell experiments, but agree with those obtained in simple shear and compression experiments using CaIrO3 post-perovskite. Development of CPO in PPv and MgO results in seismic anisotropy in D″. For shear parallel to the CMB, at low strain, the inclination of ScS, Sdiff, and SKKS fast polarizations and delay times vary depending on the propagation direction. At moderate and high shear strains, all S-waves are polarized nearly horizontally. Downwelling flow produces Sdiff, ScS, and SKKS fast polarization directions and birefringence that vary gradually as a function of the back-azimuth from nearly parallel to inclined by up to 70° to CMB and from null to ∼5%. Change in the flow to shear parallel to the CMB results in dispersion of the CPO, weakening of the anisotropy, and strong azimuthal variation of the S-wave splitting up to 250 km from the corner. Transition from horizontal shear to upwelling also produces weakening of the CPO and complex seismic anisotropy patterns, with dominantly inclined fast ScS and SKKS polarizations, over most of the upwelling path. Models that take into account twinning in PPv explain most observations of seismic anisotropy in D″, but heterogeneity of the flow at scales <1000 km is needed to comply with the seismological evidence for low apparent birefringence in D″.
Quasi-parallel whistler mode waves observed by THEMIS during near-earth dipolarizations

NASA Astrophysics Data System (ADS)

Le Contel, O.; Roux, A.; Jacquey, C.; Robert, P.; Berthomier, M.; Chust, T.; Grison, B.; Angelopoulos, V.; Sibeck, D.; Chaston, C. C.; Cully, C. M.; Ergun, B.; Glassmeier, K.-H.; Auster, U.; McFadden, J.; Carlson, C.; Larson, D.; Bonnell, J. W.; Mende, S.; Russell, C. T.; Donovan, E.; Mann, I.; Singer, H.

2009-06-01

We report on quasi-parallel whistler emissions detected by the near-earth satellites of the THEMIS mission before, during, and after local dipolarization. These emissions are associated with an electron temperature anisotropy α=T⊥e/T||e>1 consistent with the linear theory of whistler mode anisotropy instability. When the whistler mode emissions are observed the measured electron anisotropy varies inversely with β||e (the ratio of the electron parallel pressure to the magnetic pressure) as predicted by Gary and Wang (1996). Narrow band whistler emissions correspond to the small α existing before dipolarization whereas the broad band emissions correspond to large α observed during and after dipolarization. The energy in the whistler mode is leaving the current sheet and is propagating along the background magnetic field, towards the Earth. A simple time-independent description based on the Liouville's theorem indicates that the electron temperature anisotropy decreases with the distance along the magnetic field from the equator. Once this variation of α is taken into account, the linear theory predicts an equatorial origin for the whistler mode. The linear theory is also consistent with the observed bandwidth of wave emissions. Yet, the anisotropy required to be fully consistent with the observations is somewhat larger than the measured one. Although the discrepancy remains within the instrumental error bars, this could be due to time-dependent effects which have been neglected. The possible role of the whistler waves in the substorm process is discussed.
Parallel CE/SE Computations via Domain Decomposition

NASA Technical Reports Server (NTRS)

Himansu, Ananda; Jorgenson, Philip C. E.; Wang, Xiao-Yen; Chang, Sin-Chung

2000-01-01

This paper describes the parallelization strategy and achieved parallel efficiency of an explicit time-marching algorithm for solving conservation laws. The Space-Time Conservation Element and Solution Element (CE/SE) algorithm for solving the 2D and 3D Euler equations is parallelized with the aid of domain decomposition. The parallel efficiency of the resultant algorithm on a Silicon Graphics Origin 2000 parallel computer is checked.
Ethnicity and Changing Functional Health in Middle and Late Life: A Person-Centered Approach

PubMed Central

Xu, Xiao; Bennett, Joan M.; Ye, Wen; Quiñones, Ana R.

2010-01-01

Objectives. Following a person-centered approach, this research aims to depict distinct courses of disability and to ascertain how the probabilities of experiencing these trajectories vary across Black, Hispanic, and White middle-aged and older Americans. Methods. Data came from the 1995–2006 Health and Retirement Study, which involved a national sample of 18,486 Americans older than 50 years of age. Group-based semiparametric mixture models (Proc Traj) were used for data analysis. Results. Five trajectories were identified: (a) excellent functional health (61%), (b) good functional health with small increasing disability (25%), (c) accelerated increase in disability (7%), (d) high but stable disability (4%), and (e) persistent severe impairment (3%). However, when time-varying covariates (e.g., martial status and health conditions) were controlled, only 3 trajectories emerged: (a) healthy functioning (53%), moderate functional decrement (40%), and (c) large functional decrement (8%). Black and Hispanic Americans had significantly higher probabilities than White Americans in experiencing poor functional health trajectories, with Blacks at greater risks than Hispanics. Conclusions. Parallel to the concepts of successful aging, usual aging, and pathological aging, there exist distinct courses of changing functional health over time. The mechanisms underlying changes in disability may vary between Black and Hispanic Americans. PMID:20008483
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

DOE PAGES

Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

2013-01-01

Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
Scaling predictive modeling in drug development with cloud computing.

PubMed

Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola

2015-01-26

Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
Post-extension shortening strains preserved in calcites of the Keweenawan rift

DOE Office of Scientific and Technical Information (OSTI.GOV)

Donnelly, K.; Craddock, J.; McGovern, M.

1993-02-01

The Keweenawan rift is part of failed triple junction system that underlies Lake Superior and the Michigan Basin. The rift experienced extensional stresses dating about 1.1 Ga, which were followed by compressional stresses from about 1,060 Ma to < 350 Ma. Associated with the rift are two thrust faults: the Douglas (dipping southeast) and the Keweenawan-Lake Owen (dipping northwest). To determine the direction of rifting, calcite twins were used to calculate strain ellipsoids (Groshong method) which are indicative of the intensity and direction of the stress applied to a rocks in a region at a given time. Rock samples whichmore » contain significant calcite within the zone of rifting were collected, slabbed, and made into thin sections. Calcite appears as amygdule, vein, and cement filings, as well as limestones. Analyses show that different calcite types show different stain orientations. Two principle directions of sub-horizontal shortening are present: one parallel to rift, and one normal to the rift, indicating that rifting motion varied out the time in which different calcite types were deposited. Shortening parallel to the rift is seen predominantly on the western margin while shortening normal to the rift is seen predominantly on the eastern margin.« less
Trend Switching Processes in Financial Markets

NASA Astrophysics Data System (ADS)

Preis, Tobias; Stanley, H. Eugene

For an intriguing variety of switching processes in nature, the underlying complex system abruptly changes at a specific point from one state to another in a highly discontinuous fashion. Financial market fluctuations are characterized by many abrupt switchings creating increasing trends ("bubble formation") and decreasing trends ("bubble collapse"), on time scales ranging from macroscopic bubbles persisting for hundreds of days to microscopic bubbles persisting only for very short time scales. Our analysis is based on a German DAX Future data base containing 13,991,275 transactions recorded with a time resolution of 10- 2 s. For a parallel analysis, we use a data base of all S&P500 stocks providing 2,592,531 daily closing prices. We ask whether these ubiquitous switching processes have quantifiable features independent of the time horizon studied. We find striking scale-free behavior of the volatility after each switching occurs. We interpret our findings as being consistent with time-dependent collective behavior of financial market participants. We test the possible universality of our result by performing a parallel analysis of fluctuations in transaction volume and time intervals between trades. We show that these financial market switching processes have features similar to those present in phase transitions. We find that the well-known catastrophic bubbles that occur on large time scales - such as the most recent financial crisis - are no outliers but in fact single dramatic representatives caused by the formation of upward and downward trends on time scales varying over nine orders of magnitude from the very large down to the very small.
Variable Anisotropic Brain Electrical Conductivities in Epileptogenic Foci

PubMed Central

Mandelkern, M.; Bui, D.; Salamon, N.; Vinters, H. V.; Mathern, G. W.

2010-01-01

Source localization models assume brain electrical conductivities are isotropic at about 0.33 S/m. These assumptions have not been confirmed ex vivo in humans. This study determined bidirectional electrical conductivities from pediatric epilepsy surgery patients. Electrical conductivities perpendicular and parallel to the pial surface of neocortex and subcortical white matter (n = 15) were measured using the 4-electrode technique and compared with clinical variables. Mean (±SD) electrical conductivities were 0.10 ± 0.01 S/m, and varied by 243% from patient to patient. Perpendicular and parallel conductivities differed by 45%, and the larger values were perpendicular to the pial surface in 47% and parallel in 40% of patients. A perpendicular principal axis was associated with normal, while isotropy and parallel principal axes were linked with epileptogenic lesions by MRI. Electrical conductivities were decreased in patients with cortical dysplasia compared with non-dysplasia etiologies. The electrical conductivity values of freshly excised human brain tissues were approximately 30% of assumed values, varied by over 200% from patient to patient, and had erratic anisotropic and isotropic shapes if the MRI showed a lesion. Understanding brain electrical conductivity and ways to non-invasively measure them are probably necessary to enhance the ability to localize EEG sources from epilepsy surgery patients. PMID:20440549
A hybrid computational approach for efficient Alzheimer's disease classification based on heterogeneous data.

PubMed

Ding, Xuemei; Bucholc, Magda; Wang, Haiying; Glass, David H; Wang, Hui; Clarke, Dave H; Bjourson, Anthony John; Dowey, Le Roy C; O'Kane, Maurice; Prasad, Girijesh; Maguire, Liam; Wong-Lin, KongFatt

2018-06-27

There is currently a lack of an efficient, objective and systemic approach towards the classification of Alzheimer's disease (AD), due to its complex etiology and pathogenesis. As AD is inherently dynamic, it is also not clear how the relationships among AD indicators vary over time. To address these issues, we propose a hybrid computational approach for AD classification and evaluate it on the heterogeneous longitudinal AIBL dataset. Specifically, using clinical dementia rating as an index of AD severity, the most important indicators (mini-mental state examination, logical memory recall, grey matter and cerebrospinal volumes from MRI and active voxels from PiB-PET brain scans, ApoE, and age) can be automatically identified from parallel data mining algorithms. In this work, Bayesian network modelling across different time points is used to identify and visualize time-varying relationships among the significant features, and importantly, in an efficient way using only coarse-grained data. Crucially, our approach suggests key data features and their appropriate combinations that are relevant for AD severity classification with high accuracy. Overall, our study provides insights into AD developments and demonstrates the potential of our approach in supporting efficient AD diagnosis.
Rail-type gas switch with preionization by an additional corona discharge

NASA Astrophysics Data System (ADS)

Belozerov, O. S.; Krastelev, E. G.

2017-05-01

Results of an experimental research of a rail-type gas switch with preionization by an additional negative corona discharge are presented. The most of measurements were performed for an air insulated two-electrode switch assembled of cylindrical electrodes of 22 mm diameter and 100 mm length, arranged parallel to each other, with a spark gap between them varying from 6 to 15 mm. A set of 1 to 5 needles connected to a negative cylindrical electrode and located aside of them were used for corona discharges. The needle positions, allowing an effecient stabilization of the pulsed breakdown voltage and preventing the a transition of the corona discharge in a spark form, were found. It was shown that the gas preionization by the UV-radiation of the parallel corona discharge provides a stable operation of the switch with low variations of the pulsed breakdown voltage, not exceeding 1% for a given voltage rise-time tested within the range from 40 ns to 5 µs.
Magnetic orientation of nontronite clay in aqueous dispersions and its effect on water diffusion.

PubMed

Abrahamsson, Christoffer; Nordstierna, Lars; Nordin, Matias; Dvinskikh, Sergey V; Nydén, Magnus

2015-01-01

The diffusion rate of water in dilute clay dispersions depends on particle concentration, size, shape, aggregation and water-particle interactions. As nontronite clay particles magnetically align parallel to the magnetic field, directional self-diffusion anisotropy can be created within such dispersion. Here we study water diffusion in exfoliated nontronite clay dispersions by diffusion NMR and time-dependant 1H-NMR-imaging profiles. The dispersion clay concentration was varied between 0.3 and 0.7 vol%. After magnetic alignment of the clay particles in these dispersions a maximum difference of 20% was measured between the parallel and perpendicular self-diffusion coefficients in the dispersion with 0.7 vol% clay. A method was developed to measure water diffusion within the dispersion in the absence of a magnetic field (random clay orientation) as this is not possible with standard diffusion NMR. However, no significant difference in self-diffusion coefficient between random and aligned dispersions could be observed. Copyright © 2014 Elsevier Inc. All rights reserved.
Divergence across diet, time and populations rules out parallel evolution in the gut microbiomes of Trinidadian guppies.

PubMed

Sullam, Karen E; Rubin, Benjamin E R; Dalton, Christopher M; Kilham, Susan S; Flecker, Alexander S; Russell, Jacob A

2015-07-01

Diverse microbial consortia profoundly influence animal biology, necessitating an understanding of microbiome variation in studies of animal adaptation. Yet, little is known about such variability among fish, in spite of their importance in aquatic ecosystems. The Trinidadian guppy, Poecilia reticulata, is an intriguing candidate to test microbiome-related hypotheses on the drivers and consequences of animal adaptation, given the recent parallel origins of a similar ecotype across streams. To assess the relationships between the microbiome and host adaptation, we used 16S rRNA amplicon sequencing to characterize gut bacteria of two guppy ecotypes with known divergence in diet, life history, physiology and morphology collected from low-predation (LP) and high-predation (HP) habitats in four Trinidadian streams. Guts were populated by several recurring, core bacteria that are related to other fish associates and rarely detected in the environment. Although gut communities of lab-reared guppies differed from those in the wild, microbiome divergence between ecotypes from the same stream was evident under identical rearing conditions, suggesting host genetic divergence can affect associations with gut bacteria. In the field, gut communities varied over time, across streams and between ecotypes in a stream-specific manner. This latter finding, along with PICRUSt predictions of metagenome function, argues against strong parallelism of the gut microbiome in association with LP ecotype evolution. Thus, bacteria cannot be invoked in facilitating the heightened reliance of LP guppies on lower-quality diets. We argue that the macroevolutionary microbiome convergence seen across animals with similar diets may be a signature of secondary microbial shifts arising some time after host-driven adaptation.
Divergence across diet, time and populations rules out parallel evolution in the gut microbiomes of Trinidadian guppies

PubMed Central

Sullam, Karen E; Rubin, Benjamin ER; Dalton, Christopher M; Kilham, Susan S; Flecker, Alexander S; Russell, Jacob A

2015-01-01

Diverse microbial consortia profoundly influence animal biology, necessitating an understanding of microbiome variation in studies of animal adaptation. Yet, little is known about such variability among fish, in spite of their importance in aquatic ecosystems. The Trinidadian guppy, Poecilia reticulata, is an intriguing candidate to test microbiome-related hypotheses on the drivers and consequences of animal adaptation, given the recent parallel origins of a similar ecotype across streams. To assess the relationships between the microbiome and host adaptation, we used 16S rRNA amplicon sequencing to characterize gut bacteria of two guppy ecotypes with known divergence in diet, life history, physiology and morphology collected from low-predation (LP) and high-predation (HP) habitats in four Trinidadian streams. Guts were populated by several recurring, core bacteria that are related to other fish associates and rarely detected in the environment. Although gut communities of lab-reared guppies differed from those in the wild, microbiome divergence between ecotypes from the same stream was evident under identical rearing conditions, suggesting host genetic divergence can affect associations with gut bacteria. In the field, gut communities varied over time, across streams and between ecotypes in a stream-specific manner. This latter finding, along with PICRUSt predictions of metagenome function, argues against strong parallelism of the gut microbiome in association with LP ecotype evolution. Thus, bacteria cannot be invoked in facilitating the heightened reliance of LP guppies on lower-quality diets. We argue that the macroevolutionary microbiome convergence seen across animals with similar diets may be a signature of secondary microbial shifts arising some time after host-driven adaptation. PMID:25575311

Sensitivity analysis for mistakenly adjusting for mediators in estimating total effect in observational studies.

PubMed

Wang, Tingting; Li, Hongkai; Su, Ping; Yu, Yuanyuan; Sun, Xiaoru; Liu, Yi; Yuan, Zhongshang; Xue, Fuzhong

2017-11-20

In observational studies, epidemiologists often attempt to estimate the total effect of an exposure on an outcome of interest. However, when the underlying diagram is unknown and limited knowledge is available, dissecting bias performances is essential to estimating the total effect of an exposure on an outcome when mistakenly adjusting for mediators under logistic regression. Through simulation, we focused on six causal diagrams concerning different roles of mediators. Sensitivity analysis was conducted to assess the bias performances of varying across exposure-mediator effects and mediator-outcome effects when adjusting for the mediator. Based on the causal relationships in the real world, we compared the biases of varying across the effects of exposure-mediator with those of varying across the effects of mediator-outcome when adjusting for the mediator. The magnitude of the bias was defined by the difference between the estimated effect (using logistic regression) and the total effect of the exposure on the outcome. In four scenarios (a single mediator, two series mediators, two independent parallel mediators or two correlated parallel mediators), the biases of varying across the effects of exposure-mediator were greater than those of varying across the effects of mediator-outcome when adjusting for the mediator. In contrast, in two other scenarios (a single mediator or two independent parallel mediators in the presence of unobserved confounders), the biases of varying across the effects of exposure-mediator were less than those of varying across the effects of mediator-outcome when adjusting for the mediator. The biases were more sensitive to the variation of effects of exposure-mediator than the effects of mediator-outcome when adjusting for the mediator in the absence of unobserved confounders, while the biases were more sensitive to the variation of effects of mediator-outcome than those of exposure-mediator in the presence of an unobserved confounder. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations

NASA Technical Reports Server (NTRS)

Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos

2009-01-01

A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
Development and parallelization of a direct numerical simulation to study the formation and transport of nanoparticle clusters in a viscous fluid

NASA Astrophysics Data System (ADS)

Sloan, Gregory James

The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
Parallel steady state studies on a milliliter scale accelerate fed-batch bioprocess design for recombinant protein production with Escherichia coli.

PubMed

Schmideder, Andreas; Cremer, Johannes H; Weuster-Botz, Dirk

2016-11-01

In general, fed-batch processes are applied for recombinant protein production with Escherichia coli (E. coli). However, state of the art methods for identifying suitable reaction conditions suffer from severe drawbacks, i.e. direct transfer of process information from parallel batch studies is often defective and sequential fed-batch studies are time-consuming and cost-intensive. In this study, continuously operated stirred-tank reactors on a milliliter scale were applied to identify suitable reaction conditions for fed-batch processes. Isopropyl β-d-1-thiogalactopyranoside (IPTG) induction strategies were varied in parallel-operated stirred-tank bioreactors to study the effects on the continuous production of the recombinant protein photoactivatable mCherry (PAmCherry) with E. coli. Best-performing induction strategies were transferred from the continuous processes on a milliliter scale to liter scale fed-batch processes. Inducing recombinant protein expression by dynamically increasing the IPTG concentration to 100 µM led to an increase in the product concentration of 21% (8.4 g L -1 ) compared to an implemented high-performance production process with the most frequently applied induction strategy by a single addition of 1000 µM IPGT. Thus, identifying feasible reaction conditions for fed-batch processes in parallel continuous studies on a milliliter scale was shown to be a powerful, novel method to accelerate bioprocess design in a cost-reducing manner. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:1426-1435, 2016. © 2016 American Institute of Chemical Engineers.
A microfluidic needle for sampling and delivery of chemical signals by segmented flows

NASA Astrophysics Data System (ADS)

Feng, Shilun; Liu, Guozhen; Jiang, Lianmei; Zhu, Yonggang; Goldys, Ewa M.; Inglis, David W.

2017-10-01

We have developed a microfluidic needle-like device that can extract and deliver nanoliter samples. The device consists of a T-junction to form segmented flows, parallel channels to and from the needle tip, and seven hydrophilic capillaries at the tip that form a phase-extraction region. The main microchannel is hydrophobic and carries segmented flows of water-in-oil. The hydrophilic capillaries transport the aqueous phase with a nearly zero pressure gradient but require a pressure gradient of 19 kPa for mineral oil to invade and flow through. Using this device, we demonstrate the delivery of nanoliter droplets and demonstrate sampling through the formation of droplets at the tip of our device. During sampling, we recorded the fluorescence intensities of the droplets formed at the tip while varying the concentration of dye outside the tip. We measured a chemical signal response time of approximately 3 s. The linear relationship between the recorded fluorescence intensity of samples and the external dye concentration (10-40 μg/ml) indicates that this device is capable of performing quantitative, real-time measurements of rapidly varying chemical signals.
Parallel-In-Time For Moving Meshes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Falgout, R. D.; Manteuffel, T. A.; Southworth, B.

2016-02-04

With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is appliedmore » to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.« less
Method of and apparatus for collecting solar radiation utilizing variable curvature cylindrical reflectors

DOEpatents

Treytl, William J.; Slemmons, Arthur J.; Andeen, Gerry B.

1979-01-01

A heliostat apparatus includes a frame which is rotatable about an axis which is parallel to the aperture plane of an elongate receiver. A plurality of flat flexible mirror elements are mounted to the frame between several parallel, uniformly spaced resilient beams which are pivotally connected at their ends to the frame. Channels are mounted to the sides of the beams for supporting the edges of the mirror elements. Each of the beams has a longitudinally varying configuration designed to bow into predetermined, generally circular curvatures of varying radii when the center of the beam is deflected relative to the pivotally connected ends of the beams. All of the parallel resilient beams are simultaneously deflected by a cam shaft assembly extending through openings in the centers of the beams, whereby the mirror elements together form an upwardly concave, cylindrical reflecting surface. The heliostat is rotated about its axis to track the apparent diurnal movement of the sun, while the reflecting surface is substantially simultaneously bowed into a cylindrical trough having a radius adapted to focus incident light at the plane of the receiver aperture.
A GPU Parallelization of the Absolute Nodal Coordinate Formulation for Applications in Flexible Multibody Dynamics

DTIC Science & Technology

2012-02-17

to be solved. Disclaimer: Reference herein to any specific commercial company , product, process, or service by trade name, trademark...data processing rather than data caching and control flow. To make use of this computational power, NVIDIA introduced a general purpose parallel...GPU implementations were run on an Intel Nehalem Xeon E5520 2.26GHz processor with an NVIDIA Tesla C2070 graphics card for varying numbers of
Electron parallel closures for various ion charge numbers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ji, Jeong-Young, E-mail: j.ji@usu.edu; Held, Eric D.; Kim, Sang-Kyeun

2016-03-15

Electron parallel closures for the ion charge number Z = 1 [J.-Y. Ji and E. D. Held, Phys. Plasmas 21, 122116 (2014)] are extended for 1 ≤ Z ≤ 10. Parameters are computed for various Z with the same form of the Z = 1 kernels adopted. The parameters are smoothly varying in Z and hence can be used to interpolate parameters and closures for noninteger, effective ion charge numbers.
Is parallel trade in medicines compatible with the single European market?

PubMed

Senior, I

1992-01-01

For many years the varying methods of price control of medicines by national governments in the European Community (and elsewhere) have resulted in wide variations in prices. Parallel traders buy products in low pricing Community countries and sell them, generally relabelled or repackaged, in high pricing Community countries. This practice diverts sales revenue and profits from the manufacturers to the traders, distributors, pharmacists and, in some measure, to the sickness funds and to some patients. While parallel trade appeals to those who gain financially, its basis is a market distortion that poses a significant threat to the future of the research-based pharmaceutical industry.
Discontinuous Galerkin Finite Element Method for Parabolic Problems

NASA Technical Reports Server (NTRS)

Kaneko, Hideaki; Bey, Kim S.; Hou, Gene J. W.

2004-01-01

In this paper, we develop a time and its corresponding spatial discretization scheme, based upon the assumption of a certain weak singularity of parallel ut(t) parallel Lz(omega) = parallel ut parallel2, for the discontinuous Galerkin finite element method for one-dimensional parabolic problems. Optimal convergence rates in both time and spatial variables are obtained. A discussion of automatic time-step control method is also included.
Enhancing PC Cluster-Based Parallel Branch-and-Bound Algorithms for the Graph Coloring Problem

NASA Astrophysics Data System (ADS)

Taoka, Satoshi; Takafuji, Daisuke; Watanabe, Toshimasa

A branch-and-bound algorithm (BB for short) is the most general technique to deal with various combinatorial optimization problems. Even if it is used, computation time is likely to increase exponentially. So we consider its parallelization to reduce it. It has been reported that the computation time of a parallel BB heavily depends upon node-variable selection strategies. And, in case of a parallel BB, it is also necessary to prevent increase in communication time. So, it is important to pay attention to how many and what kind of nodes are to be transferred (called sending-node selection strategy). In this paper, for the graph coloring problem, we propose some sending-node selection strategies for a parallel BB algorithm by adopting MPI for parallelization and experimentally evaluate how these strategies affect computation time of a parallel BB on a PC cluster network.
An intelligent allocation algorithm for parallel processing

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Homaifar, Abdollah; Ananthram, Kishan G.

1988-01-01

The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network.
Turbine airfoil to shroud attachment method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Campbell, Christian X; Kulkarni, Anand A; James, Allister W

2014-12-23

Bi-casting a platform (50) onto an end portion (42) of a turbine airfoil (31) after forming a coating of a fugitive material (56) on the end portion. After bi-casting the platform, the coating is dissolved and removed to relieve differential thermal shrinkage stress between the airfoil and platform. The thickness of the coating is varied around the end portion in proportion to varying amounts of local differential process shrinkage. The coating may be sprayed (76A, 76B) onto the end portion in opposite directions parallel to a chord line (41) of the airfoil or parallel to a mid-platform length (80) ofmore » the platform to form respective layers tapering in thickness from the leading (32) and trailing (34) edges along the suction side (36) of the airfoil.« less
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Phonological coding during reading.

PubMed

Leinenger, Mallorie

2014-11-01

The exact role that phonological coding (the recoding of written, orthographic information into a sound based code) plays during silent reading has been extensively studied for more than a century. Despite the large body of research surrounding the topic, varying theories as to the time course and function of this recoding still exist. The present review synthesizes this body of research, addressing the topics of time course and function in tandem. The varying theories surrounding the function of phonological coding (e.g., that phonological codes aid lexical access, that phonological codes aid comprehension and bolster short-term memory, or that phonological codes are largely epiphenomenal in skilled readers) are first outlined, and the time courses that each maps onto (e.g., that phonological codes come online early [prelexical] or that phonological codes come online late [postlexical]) are discussed. Next the research relevant to each of these proposed functions is reviewed, discussing the varying methodologies that have been used to investigate phonological coding (e.g., response time methods, reading while eye-tracking or recording EEG and MEG, concurrent articulation) and highlighting the advantages and limitations of each with respect to the study of phonological coding. In response to the view that phonological coding is largely epiphenomenal in skilled readers, research on the use of phonological codes in prelingually, profoundly deaf readers is reviewed. Finally, implications for current models of word identification (activation-verification model, Van Orden, 1987; dual-route model, e.g., M. Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001; parallel distributed processing model, Seidenberg & McClelland, 1989) are discussed. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
Phonological coding during reading

PubMed Central

Leinenger, Mallorie

2014-01-01

The exact role that phonological coding (the recoding of written, orthographic information into a sound based code) plays during silent reading has been extensively studied for more than a century. Despite the large body of research surrounding the topic, varying theories as to the time course and function of this recoding still exist. The present review synthesizes this body of research, addressing the topics of time course and function in tandem. The varying theories surrounding the function of phonological coding (e.g., that phonological codes aid lexical access, that phonological codes aid comprehension and bolster short-term memory, or that phonological codes are largely epiphenomenal in skilled readers) are first outlined, and the time courses that each maps onto (e.g., that phonological codes come online early (pre-lexical) or that phonological codes come online late (post-lexical)) are discussed. Next the research relevant to each of these proposed functions is reviewed, discussing the varying methodologies that have been used to investigate phonological coding (e.g., response time methods, reading while eyetracking or recording EEG and MEG, concurrent articulation) and highlighting the advantages and limitations of each with respect to the study of phonological coding. In response to the view that phonological coding is largely epiphenomenal in skilled readers, research on the use of phonological codes in prelingually, profoundly deaf readers is reviewed. Finally, implications for current models of word identification (activation-verification model (Van Order, 1987), dual-route model (e.g., Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001), parallel distributed processing model (Seidenberg & McClelland, 1989)) are discussed. PMID:25150679
Emergency Medicine Resident Physicians’ Perceptions of Electronic Documentation and Workflow

PubMed Central

Neri, P.M.; Redden, L.; Poole, S.; Pozner, C.N.; Horsky, J.; Raja, A.S.; Poon, E.; Schiff, G.

2015-01-01

Summary Objective To understand emergency department (ED) physicians’ use of electronic documentation in order to identify usability and workflow considerations for the design of future ED information system (EDIS) physician documentation modules. Methods We invited emergency medicine resident physicians to participate in a mixed methods study using task analysis and qualitative interviews. Participants completed a simulated, standardized patient encounter in a medical simulation center while documenting in the test environment of a currently used EDIS. We recorded the time on task, type and sequence of tasks performed by the participants (including tasks performed in parallel). We then conducted semi-structured interviews with each participant. We analyzed these qualitative data using the constant comparative method to generate themes. Results Eight resident physicians participated. The simulation session averaged 17 minutes and participants spent 11 minutes on average on tasks that included electronic documentation. Participants performed tasks in parallel, such as history taking and electronic documentation. Five of the 8 participants performed a similar workflow sequence during the first part of the session while the remaining three used different workflows. Three themes characterize electronic documentation: (1) physicians report that location and timing of documentation varies based on patient acuity and workload, (2) physicians report a need for features that support improved efficiency; and (3) physicians like viewing available patient data but struggle with integration of the EDIS with other information sources. Conclusion We confirmed that physicians spend much of their time on documentation (65%) during an ED patient visit. Further, we found that resident physicians did not all use the same workflow and approach even when presented with an identical standardized patient scenario. Future EHR design should consider these varied workflows while trying to optimize efficiency, such as improving integration of clinical data. These findings should be tested quantitatively in a larger, representative study. PMID:25848411
Linear Parameter Varying Identification of Dynamic Joint Stiffness during Time-Varying Voluntary Contractions

PubMed Central

Golkar, Mahsa A.; Sobhani Tehrani, Ehsan; Kearney, Robert E.

2017-01-01

Dynamic joint stiffness is a dynamic, nonlinear relationship between the position of a joint and the torque acting about it, which can be used to describe the biomechanics of the joint and associated limb(s). This paper models and quantifies changes in ankle dynamic stiffness and its individual elements, intrinsic and reflex stiffness, in healthy human subjects during isometric, time-varying (TV) contractions of the ankle plantarflexor muscles. A subspace, linear parameter varying, parallel-cascade (LPV-PC) algorithm was used to identify the model from measured input position perturbations and output torque data using voluntary torque as the LPV scheduling variable (SV). Monte-Carlo simulations demonstrated that the algorithm is accurate, precise, and robust to colored measurement noise. The algorithm was then used to examine stiffness changes associated with TV isometric contractions. The SV was estimated from the Soleus EMG using a Hammerstein model of EMG-torque dynamics identified from unperturbed trials. The LPV-PC algorithm identified (i) a non-parametric LPV impulse response function (LPV IRF) for intrinsic stiffness and (ii) a LPV-Hammerstein model for reflex stiffness consisting of a LPV static nonlinearity followed by a time-invariant state-space model of reflex dynamics. The results demonstrated that: (a) intrinsic stiffness, in particular ankle elasticity, increased significantly and monotonically with activation level; (b) the gain of the reflex pathway increased from rest to around 10–20% of subject's MVC and then declined; and (c) the reflex dynamics were second order. These findings suggest that in healthy human ankle, reflex stiffness contributes most at low muscle contraction levels, whereas, intrinsic contributions monotonically increase with activation level. PMID:28579954

Parallel Computation and Visualization of Three-dimensional, Time-dependent, Thermal Convective Flows

NASA Technical Reports Server (NTRS)

Wang, P.; Li, P.

1998-01-01

A high-resolution numerical study on parallel systems is reported on three-dimensional, time-dependent, thermal convective flows. A parallel implentation on the finite volume method with a multigrid scheme is discussed, and a parallel visualization systemm is developed on distributed systems for visualizing the flow.
Robust time-shifted spoke pulse design in the presence of large B0 variations with simultaneous reduction of through-plane dephasing, B1+ effects, and the specific absorption rate using parallel transmission.

PubMed

Guérin, Bastien; Stockmann, Jason P; Baboli, Mehran; Torrado-Carvajal, Angel; Stenger, Andrew V; Wald, Lawrence L

2016-08-01

To design parallel transmission spokes pulses with time-shifted profiles for joint mitigation of intensity variations due to B1+ effects, signal loss due to through-plane dephasing, and the specific absorption rate (SAR) at 7T. We derived a slice-averaged small tip angle (SA-STA) approximation of the magnetization signal at echo time that depends on the B1+ transmit profiles, the through-slice B0 gradient and the amplitude and time-shifts of the spoke waveforms. We minimize a magnitude least-squares objective based on this signal equation using a fast interior-point approach with analytical expressions of the Jacobian and Hessian. Our algorithm runs in less than three minutes for the design of two-spoke pulses subject to hundreds of local SAR constraints. On a B0/B1+ head phantom, joint optimization of the channel-dependent time-shifts and spoke amplitudes allowed signal recovery in high-B0 regions at no increase of SAR. Although the method creates uniform magnetization profiles (ie, uniform intensity), the flip angle varies across the image, which makes it ill-suited to T1-weighted applications. The SA-STA approach presented in this study is best suited to T2*-weighted applications with long echo times that require signal recovery around high B0 regions. Magn Reson Med 76:540-554, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Impact of automatization in temperature series in Spain and comparison with the POST-AWS dataset

NASA Astrophysics Data System (ADS)

Aguilar, Enric; López-Díaz, José Antonio; Prohom Duran, Marc; Gilabert, Alba; Luna Rico, Yolanda; Venema, Victor; Auchmann, Renate; Stepanek, Petr; Brandsma, Theo

2016-04-01

Climate data records are most of the times affected by inhomogeneities. Especially inhomogeneities introducing network-wide biases are sometimes related to changes happening almost simultaneously in an entire network. Relative homogenization is difficult in these cases, especially at the daily scale. A good example of this is the substitution of manual observations (MAN) by automatic weather stations (AWS). Parallel measurements (i.e. records taken at the same time with the old (MAN) and new (AWS) sensors can provide an idea of the bias introduced and help to evaluate the suitability of different correction approaches. We present here a quality controlled dataset compiled under the DAAMEC Project, comprising 46 stations across Spain and over 85,000 parallel measurements (AWS-MAN) of daily maximum and minimum temperature. We study the differences between both sensors and compare it with the available metadata to account for internal inhomogeneities. The differences between both systems vary much across stations, with patterns more related to their particular settings than to climatic/geographical reasons. The typical median biases (AWS-MAN) by station (comprised between the interquartile range) oscillate between -0.2°C and 0.4 in daily maximum temperature and between -0.4°C and 0.2°C in daily minimum temperature. These and other results are compared with a larger network, the Parallel Observations Scientific Team, a working group of the International Surface Temperatures Initiative (ISTI-POST) dataset, which comprises our stations, as well as others from different countries in America, Asia and Europe.
ONR Far East Scientific Bulletin. Volume 12, Number 3, July-September 1987,

DTIC Science & Technology

1987-09-01

populated, some- sediment transport. times right down to the beaches. Sea Wave data for a 5-year period walls have been built for the last 30 or have...The average wave power varies season that the sea walls just do not from 15.5 kW/m (i.e., 1.34 hp/m work. In order to combat the problems, parallel to...to 1980. Dr. Burt’s current interest is in air- sea interaction. P . N P nSN nM’lVa ME nMln n °. .’V 4,i 4 .4 ’% %’ " , * .., , STATUS OF FRACTURE
Structural signature of a brittle-to-ductile transition in self-assembled networks.

PubMed

Ramos, Laurence; Laperrousaz, Arnaud; Dieudonné, Philippe; Ligoure, Christian

2011-09-30

We study the nonlinear rheology of a novel class of transient networks, made of surfactant micelles of tunable morphology reversibly linked by block copolymers. We couple rheology and time-resolved structural measurements, using synchrotron radiation, to characterize the highly nonlinear viscoelastic regime. We propose the fluctuations of the degree of alignment of the micelles under shear as a probe to identify a fracture process. We show a clear signature of a brittle-to-ductile transition in transient gels, as the morphology of the micelles varies, and provide a parallel between the fracture of solids and the fracture under shear of viscoelastic fluids.
Viscous-enstrophy scaling law for Navier-Stokes reconnection

NASA Astrophysics Data System (ADS)

Kerr, Robert M.

2017-11-01

Simulations of perturbed, helical trefoil vortex knots and anti-parallel vortices find ν-independent collapse of temporally scaled (√{ ν} Z) - 1 / 2, Z enstrophy, between when the loops first touch at tΓ, and when reconnection ends at tx for the viscosity ν varying by 256. Due to mathematical bounds upon higher-order norms, this collapse requires that the domain increase as ν decreases, possibly to allow large-scale negative helicity to grow as compensation for small-scale positive helicity and enstrophy growth. This mechanism could be a step towards explaining how smooth solutions of the Navier-Stokes can generate finite-energy dissipation in a finite time as ν -> 0 .
Ultrasonic Doppler measurement of renal artery blood flow

NASA Technical Reports Server (NTRS)

Freund, W. R.; Beaver, W. L.; Meindl, J. D.

1976-01-01

Studies were made of (1) blood flow redistribution during lower body negative pressure (LBNP), (2) the profile of blood flow across the mitral annulus of the heart (both perpendicular and parallel to the commissures), (3) testing and evaluation of a number of pulsed Doppler systems, (4) acute calibration of perivascular Doppler transducers, (5) redesign of the mitral flow transducers to improve reliability and ease of construction, and (6) a frequency offset generator designed for use in distinguishing forward and reverse components of blood flow by producing frequencies above and below the offset frequency. Finally methodology was developed and initial results were obtained from a computer analysis of time-varying Doppler spectra.
Parallel Evolution of Copy-Number Variation across Continents in Drosophila melanogaster

PubMed Central

Schrider, Daniel R.; Hahn, Matthew W.; Begun, David J.

2016-01-01

Genetic differentiation across populations that is maintained in the presence of gene flow is a hallmark of spatially varying selection. In Drosophila melanogaster, the latitudinal clines across the eastern coasts of Australia and North America appear to be examples of this type of selection, with recent studies showing that a substantial portion of the D. melanogaster genome exhibits allele frequency differentiation with respect to latitude on both continents. As of yet there has been no genome-wide examination of differentiated copy-number variants (CNVs) in these geographic regions, despite their potential importance for phenotypic variation in Drosophila and other taxa. Here, we present an analysis of geographic variation in CNVs in D. melanogaster. We also present the first genomic analysis of geographic variation for copy-number variation in the sister species, D. simulans, in order to investigate patterns of parallel evolution in these close relatives. In D. melanogaster we find hundreds of CNVs, many of which show parallel patterns of geographic variation on both continents, lending support to the idea that they are influenced by spatially varying selection. These findings support the idea that polymorphic CNVs contribute to local adaptation in D. melanogaster. In contrast, we find very few CNVs in D. simulans that are geographically differentiated in parallel on both continents, consistent with earlier work suggesting that clinal patterns are weaker in this species. PMID:26809315
Parallel Gene Expression Differences between Low and High Latitude Populations of Drosophila melanogaster and D. simulans

PubMed Central

Zhao, Li; Wit, Janneke; Svetec, Nicolas; Begun, David J.

2015-01-01

Gene expression variation within species is relatively common, however, the role of natural selection in the maintenance of this variation is poorly understood. Here we investigate low and high latitude populations of Drosophila melanogaster and its sister species, D. simulans, to determine whether the two species show similar patterns of population differentiation, consistent with a role for spatially varying selection in maintaining gene expression variation. We compared at two temperatures the whole male transcriptome of D. melanogaster and D. simulans sampled from Panama City (Panama) and Maine (USA). We observed a significant excess of genes exhibiting differential expression in both species, consistent with parallel adaptation to heterogeneous environments. Moreover, the majority of genes showing parallel expression differentiation showed the same direction of differential expression in the two species and the magnitudes of expression differences between high and low latitude populations were correlated across species, further bolstering the conclusion that parallelism for expression phenotypes results from spatially varying selection. However, the species also exhibited important differences in expression phenotypes. For example, the genomic extent of genotype × environment interaction was much more common in D. melanogaster. Highly differentiated SNPs between low and high latitudes were enriched in the 3’ UTRs and CDS of the geographically differently expressed genes in both species, consistent with an important role for cis-acting variants in driving local adaptation for expression-related phenotypes. PMID:25950438
Parallel Gene Expression Differences between Low and High Latitude Populations of Drosophila melanogaster and D. simulans.

PubMed

Zhao, Li; Wit, Janneke; Svetec, Nicolas; Begun, David J

2015-05-01

Gene expression variation within species is relatively common, however, the role of natural selection in the maintenance of this variation is poorly understood. Here we investigate low and high latitude populations of Drosophila melanogaster and its sister species, D. simulans, to determine whether the two species show similar patterns of population differentiation, consistent with a role for spatially varying selection in maintaining gene expression variation. We compared at two temperatures the whole male transcriptome of D. melanogaster and D. simulans sampled from Panama City (Panama) and Maine (USA). We observed a significant excess of genes exhibiting differential expression in both species, consistent with parallel adaptation to heterogeneous environments. Moreover, the majority of genes showing parallel expression differentiation showed the same direction of differential expression in the two species and the magnitudes of expression differences between high and low latitude populations were correlated across species, further bolstering the conclusion that parallelism for expression phenotypes results from spatially varying selection. However, the species also exhibited important differences in expression phenotypes. For example, the genomic extent of genotype × environment interaction was much more common in D. melanogaster. Highly differentiated SNPs between low and high latitudes were enriched in the 3' UTRs and CDS of the geographically differently expressed genes in both species, consistent with an important role for cis-acting variants in driving local adaptation for expression-related phenotypes.
Estimation of liquid volume fraction using ultrasound transit time spectroscopy

NASA Astrophysics Data System (ADS)

Al-Qahtani, Saeed M.; Langton, Christian M.

2016-12-01

It has recently been proposed that the propagation of an ultrasound wave through complex structures, consisting of two-materials of differing ultrasound velocity, may be considered as an array of parallel ‘sonic rays’, the transit time of each determined by their relative proportion; being a minimum (t min) in entire higher velocity material, and a maximum (t max) in entire lower velocity material. An ultrasound transit time spectrum (UTTS) describes the proportion of sonic rays at an individual transit time. It has previously been demonstrated that the solid volume fraction of a solid:liquid composite, specifically acrylic step-wedges immersed in water, may be reliably estimated from the UTTS. The aim of this research was to investigate the hypothesis that the volume fraction of a two-component liquid mixture, of unequal ultrasound velocity, may also be estimated by UTTS. A through-transmission technique incorporating two 1 MHz ultrasound transducers within a horizontally-aligned cylindrical tube-housing was utilised, the proportion of silicone oil to water being varied from 0% to 100%. The liquid volume fraction was estimated from the UTTS at each composition, the coefficient of determination (R 2%) being 98.9 ± 0.7%. The analysis incorporated a novel signal amplitude normalisation technique to compensate for absorption within the silicone oil. It is therefore envisaged that the parallel sonic ray concept and the derived UTTS may be further applied to the quantification of liquid mixture composition assessment.
[CMACPAR an modified parallel neuro-controller for control processes].

PubMed

Ramos, E; Surós, R

1999-01-01

CMACPAR is a Parallel Neurocontroller oriented to real time systems as for example Control Processes. Its characteristics are mainly a fast learning algorithm, a reduced number of calculations, great generalization capacity, local learning and intrinsic parallelism. This type of neurocontroller is used in real time applications required by refineries, hydroelectric centers, factories, etc. In this work we present the analysis and the parallel implementation of a modified scheme of the Cerebellar Model CMAC for the n-dimensional space projection using a mean granularity parallel neurocontroller. The proposed memory management allows for a significant memory reduction in training time and required memory size.
A real-time MPEG software decoder using a portable message-passing library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

1995-12-31

We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
A PIPO Boost Converter with Low Ripple and Medium Current Application

NASA Astrophysics Data System (ADS)

Bandri, S.; Sofian, A.; Ismail, F.

2018-04-01

This paper presents a Parallel Input Parallel Output (PIPO) boost converter is proposed to gain power ability of converter, and reduce current inductors. The proposed technique will distribute current for n-parallel inductor and switching component. Four parallel boost converters implement on input voltage 20.5Vdc to generate output voltage 28.8Vdc. The PIPO boost converter applied phase shift pulse width modulation which will compare with conventional PIPO boost converters by using a similar pulse for every switching component. The current ripple reduction shows an advantage PIPO boost converter then conventional boost converter. Varies loads and duty cycle will be simulated and analyzed to verify the performance of PIPO boost converter. Finally, the unbalance of current inductor is able to be verified on four area of duty cycle in less than 0.6.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Parallelization of elliptic solver for solving 1D Boussinesq model

NASA Astrophysics Data System (ADS)

Tarwidi, D.; Adytia, D.

2018-03-01

In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
Localization of basic fibroblast growth factor binding sites in the chick embryonic neural retina.

PubMed

Cirillo, A; Arruti, C; Courtois, Y; Jeanny, J C

1990-12-01

We have investigated the localization of basic fibroblast growth factor (bFGF) binding sites during the development of the neural retina in the chick embryo. The specificity of the affinity of bFGF for its receptors was assessed by competition experiments with unlabelled growth factor or with heparin, as well as by heparitinase treatment of the samples. Two different types of binding sites were observed in the neural retina by light-microscopic autoradiography. The first type, localized mainly to basement membranes, was highly sensitive to heparitinase digestion and to competition with heparin. It was not developmentally regulated. The second type of binding site, resistant to heparin competition, appeared to be associated with retinal cells from the earliest stages studied (3-day-old embryo, stages 21-22 of Hamburger and Hamilton). Its distribution was found to vary during embryonic development, paralleling layering of the neural retina. Binding of bFGF to the latter sites was observed throughout the retinal neuroepithelium at early stages but displayed a distinct pattern at the time when the inner and outer plexiform layers were formed. During the development of the inner plexiform layer, a banded pattern of bFGF binding was observed. These bands, lying parallel to the vitreal surface, seemed to codistribute with the synaptic bands existing in the inner plexiform layer. The presence of intra-retinal bFGF binding sites whose distribution varies with embryonic development suggests a regulatory mechanism involving differential actions of bFGF on neural retinal cells.
Parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Amin-Javaheri, Masoud; Orin, David E.

1989-01-01

The development of an O(log2N) parallel algorithm for the manipulator inertia matrix is presented. It is based on the most efficient serial algorithm which uses the composite rigid body method. Recursive doubling is used to reformulate the linear recurrence equations which are required to compute the diagonal elements of the matrix. It results in O(log2N) levels of computation. Computation of the off-diagonal elements involves N linear recurrences of varying-size and a new method, which avoids redundant computation of position and orientation transforms for the manipulator, is developed. The O(log2N) algorithm is presented in both equation and graphic forms which clearly show the parallelism inherent in the algorithm.
Airborne electromagnetic detection of shallow seafloor topographic features, including resolution of multiple sub-parallel seafloor ridges

NASA Astrophysics Data System (ADS)

Vrbancich, Julian; Boyd, Graham

2014-05-01

The HoistEM helicopter time-domain electromagnetic (TEM) system was flown over waters in Backstairs Passage, South Australia, in 2003 to test the bathymetric accuracy and hence the ability to resolve seafloor structure in shallow and deeper waters (extending to ~40 m depth) that contain interesting seafloor topography. The topography that forms a rock peak (South Page) in the form of a mini-seamount that barely rises above the water surface was accurately delineated along its ridge from the start of its base (where the seafloor is relatively flat) in ~30 m water depth to its peak at the water surface, after an empirical correction was applied to the data to account for imperfect system calibration, consistent with earlier studies using the same HoistEM system. A much smaller submerged feature (Threshold Bank) of ~9 m peak height located in waters of 35 to 40 m depth was also accurately delineated. These observations when checked against known water depths in these two regions showed that the airborne TEM system, following empirical data correction, was effectively operating correctly. The third and most important component of the survey was flown over the Yatala Shoals region that includes a series of sub-parallel seafloor ridges (resembling large sandwaves rising up to ~20 m from the seafloor) that branch out and gradually decrease in height as the ridges spread out across the seafloor. These sub-parallel ridges provide an interesting topography because the interpreted water depths obtained from 1D inversion of TEM data highlight the limitations of the EM footprint size in resolving both the separation between the ridges (which vary up to ~300 m) and the height of individual ridges (which vary up to ~20 m), and possibly also the limitations of assuming a 1D model in areas where the topography is quasi-2D/3D.
Human Exposure to Electromagnetic Fields from Parallel Wireless Power Transfer Systems.

PubMed

Wen, Feng; Huang, Xueliang

2017-02-08

The scenario of multiple wireless power transfer (WPT) systems working closely, synchronously or asynchronously with phase difference often occurs in power supply for household appliances and electric vehicles in parking lots. Magnetic field leakage from the WPT systems is also varied due to unpredictable asynchronous working conditions. In this study, the magnetic field leakage from parallel WPT systems working with phase difference is predicted, and the induced electric field and specific absorption rate (SAR) in a human body standing in the vicinity are also evaluated. Computational results are compared with the restrictions prescribed in the regulations established to limit human exposure to time-varying electromagnetic fields (EMFs). The results show that the middle region between the two WPT coils is safer for the two WPT systems working in-phase, and the peripheral regions are safer around the WPT systems working anti-phase. Thin metallic plates larger than the WPT coils can shield the magnetic field leakage well, while smaller ones may worsen the situation. The orientation of the human body will influence the maximum magnitude of induced electric field and its distribution within the human body. The induced electric field centralizes in the trunk, groin, and genitals with only one exception: when the human body is standing right at the middle of the two WPT coils working in-phase, the induced electric field focuses on lower limbs. The SAR value in the lungs always seems to be greater than in other organs, while the value in the liver is minimal. Human exposure to EMFs meets the guidelines of the International Committee on Non-Ionizing Radiation Protection (ICNIRP), specifically reference levels with respect to magnetic field and basic restrictions on induced electric fields and SAR, as the charging power is lower than 3.1 kW and 55.5 kW, respectively. These results are positive with respect to the safe applications of parallel WPT systems working simultaneously.

Human Exposure to Electromagnetic Fields from Parallel Wireless Power Transfer Systems

PubMed Central

Wen, Feng; Huang, Xueliang

2017-01-01

The scenario of multiple wireless power transfer (WPT) systems working closely, synchronously or asynchronously with phase difference often occurs in power supply for household appliances and electric vehicles in parking lots. Magnetic field leakage from the WPT systems is also varied due to unpredictable asynchronous working conditions. In this study, the magnetic field leakage from parallel WPT systems working with phase difference is predicted, and the induced electric field and specific absorption rate (SAR) in a human body standing in the vicinity are also evaluated. Computational results are compared with the restrictions prescribed in the regulations established to limit human exposure to time-varying electromagnetic fields (EMFs). The results show that the middle region between the two WPT coils is safer for the two WPT systems working in-phase, and the peripheral regions are safer around the WPT systems working anti-phase. Thin metallic plates larger than the WPT coils can shield the magnetic field leakage well, while smaller ones may worsen the situation. The orientation of the human body will influence the maximum magnitude of induced electric field and its distribution within the human body. The induced electric field centralizes in the trunk, groin, and genitals with only one exception: when the human body is standing right at the middle of the two WPT coils working in-phase, the induced electric field focuses on lower limbs. The SAR value in the lungs always seems to be greater than in other organs, while the value in the liver is minimal. Human exposure to EMFs meets the guidelines of the International Committee on Non-Ionizing Radiation Protection (ICNIRP), specifically reference levels with respect to magnetic field and basic restrictions on induced electric fields and SAR, as the charging power is lower than 3.1 kW and 55.5 kW, respectively. These results are positive with respect to the safe applications of parallel WPT systems working simultaneously. PMID:28208709
Characterization of Harmonic Signal Acquisition with Parallel Dipole and Multipole Detectors

NASA Astrophysics Data System (ADS)

Park, Sung-Gun; Anderson, Gordon A.; Bruce, James E.

2018-04-01

Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) is a powerful instrument for the study of complex biological samples due to its high resolution and mass measurement accuracy. However, the relatively long signal acquisition periods needed to achieve high resolution can serve to limit applications of FTICR-MS. The use of multiple pairs of detector electrodes enables detection of harmonic frequencies present at integer multiples of the fundamental cyclotron frequency, and the obtained resolving power for a given acquisition period increases linearly with the order of harmonic signal. However, harmonic signal detection also increases spectral complexity and presents challenges for interpretation. In the present work, ICR cells with independent dipole and harmonic detection electrodes and preamplifiers are demonstrated. A benefit of this approach is the ability to independently acquire fundamental and multiple harmonic signals in parallel using the same ions under identical conditions, enabling direct comparison of achieved performance as parameters are varied. Spectra from harmonic signals showed generally higher resolving power than spectra acquired with fundamental signals and equal signal duration. In addition, the maximum observed signal to noise (S/N) ratio from harmonic signals exceeded that of fundamental signals by 50 to 100%. Finally, parallel detection of fundamental and harmonic signals enables deconvolution of overlapping harmonic signals since observed fundamental frequencies can be used to unambiguously calculate all possible harmonic frequencies. Thus, the present application of parallel fundamental and harmonic signal acquisition offers a general approach to improve utilization of harmonic signals to yield high-resolution spectra with decreased acquisition time. [Figure not available: see fulltext.
A fluorescence-based centrifugal microfluidic system for parallel detection of multiple allergens

NASA Astrophysics Data System (ADS)

Chen, Q. L.; Ho, H. P.; Cheung, K. L.; Kong, S. K.; Suen, Y. K.; Kwan, Y. W.; Li, W. J.; Wong, C. K.

2010-02-01

This paper reports a robust polymer based centrifugal microfluidic analysis system that can provide parallel detection of multiple allergens in vitro. Many commercial food products (milk, bean, pollen, etc.) may introduce allergy to people. A low-cost device for rapid detection of allergens is highly desirable. With this as the objective, we have studied the feasibility of using a rotating disk device incorporating centrifugal microfluidics for performing actuationfree and multi-analyte detection of different allergen species with minimum sample usage and fast response time. Degranulation in basophils or mast cells is an indicator to demonstrate allergic reaction. In this connection, we used acridine orange (AO) to demonstrate degranulation in KU812 human basophils. It was found that the AO was released from granules when cells were stimulated by ionomycin, thus signifying the release of histamine which accounts for allergy symptoms [1-2]. Within this rotating optical platform, major microfluidic components including sample reservoirs, reaction chambers, microchannel and flow-control compartments are integrated into a single bio-compatible polydimethylsiloxane (PDMS) substrate. The flow sequence and reaction time can be controlled precisely. Sequentially through varying the spinning speed, the disk may perform a variety of steps on sample loading, reaction and detection. Our work demonstrates the feasibility of using centrifugation as a possible immunoassay system in the future.
GPU-based ultra-fast dose calculation using a finite size pencil beam model.

PubMed

Gu, Xuejun; Choi, Dongju; Men, Chunhua; Pan, Hubert; Majumdar, Amitava; Jiang, Steve B

2009-10-21

Online adaptive radiation therapy (ART) is an attractive concept that promises the ability to deliver an optimal treatment in response to the inter-fraction variability in patient anatomy. However, it has yet to be realized due to technical limitations. Fast dose deposit coefficient calculation is a critical component of the online planning process that is required for plan optimization of intensity-modulated radiation therapy (IMRT). Computer graphics processing units (GPUs) are well suited to provide the requisite fast performance for the data-parallel nature of dose calculation. In this work, we develop a dose calculation engine based on a finite-size pencil beam (FSPB) algorithm and a GPU parallel computing framework. The developed framework can accommodate any FSPB model. We test our implementation in the case of a water phantom and the case of a prostate cancer patient with varying beamlet and voxel sizes. All testing scenarios achieved speedup ranging from 200 to 400 times when using a NVIDIA Tesla C1060 card in comparison with a 2.27 GHz Intel Xeon CPU. The computational time for calculating dose deposition coefficients for a nine-field prostate IMRT plan with this new framework is less than 1 s. This indicates that the GPU-based FSPB algorithm is well suited for online re-planning for adaptive radiotherapy.
Cryogenic parallel, single phase flows: an analytical approach

NASA Astrophysics Data System (ADS)

Eichhorn, R.

2017-02-01

Managing the cryogenic flows inside a state-of-the-art accelerator cryomodule has become a demanding endeavour: In order to build highly efficient modules, all heat transfers are usually intercepted at various temperatures. For a multi-cavity module, operated at 1.8 K, this requires intercepts at 4 K and at 80 K at different locations with sometimes strongly varying heat loads which for simplicity reasons are operated in parallel. This contribution will describe an analytical approach, based on optimization theories.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

NASA Technical Reports Server (NTRS)

Steinman, Jeff S.

1992-01-01

Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Parallel processing architecture for computing inverse differential kinematic equations of the PUMA arm

NASA Technical Reports Server (NTRS)

Hsia, T. C.; Lu, G. Z.; Han, W. H.

1987-01-01

In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
A model for optimizing file access patterns using spatio-temporal parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boonthanome, Nouanesengsy; Patchett, John; Geveci, Berk

2013-01-01

For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible filemore » access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.« less
Variation in the foraging behaviors of two flycatchers: associations with stage of the breeding cycle

Treesearch

H.F. Sakai; B.R. Noon

1990-01-01

The foraging characteristics of Hammondâs and Western flycatchers in northwestern California varied with different stages of the breeding cycle during the breeding seasons (early April-mid August) in 1984 and 1985. The speciesâ behaviors did not always vary in parallel nor were all foraging behaviors distributed equally during the breeding cycle. For example, the...
Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

NASA Astrophysics Data System (ADS)

Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

2012-06-01

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

PubMed

Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

2012-06-13

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Poiseuille, thermal transpiration and Couette flows of a rarefied gas between plane parallel walls with nonuniform surface properties in the transverse direction and their reciprocity relations

NASA Astrophysics Data System (ADS)

Doi, Toshiyuki

2018-04-01

Slow flows of a rarefied gas between two plane parallel walls with nonuniform surface properties are studied based on kinetic theory. It is assumed that one wall is a diffuse reflection boundary and the other wall is a Maxwell-type boundary whose accommodation coefficient varies periodically in the direction perpendicular to the flow. The time-independent Poiseuille, thermal transpiration and Couette flows are considered. The flow behavior is numerically studied based on the linearized Bhatnagar-Gross-Krook-Welander model of the Boltzmann equation. The flow field, the mass and heat flow rates in the gas, and the tangential force acting on the wall surface are studied over a wide range of the gas rarefaction degree and the parameters characterizing the distribution of the accommodation coefficient. The locally convex velocity distribution is observed in Couette flow of a highly rarefied gas, similarly to Poiseuille flow and thermal transpiration. The reciprocity relations are numerically confirmed over a wide range of the flow parameters.
A multiscale MDCT image-based breathing lung model with time-varying regional ventilation

PubMed Central

Yin, Youbing; Choi, Jiwoong; Hoffman, Eric A.; Tawhai, Merryn H.; Lin, Ching-Long

2012-01-01

A novel algorithm is presented that links local structural variables (regional ventilation and deforming central airways) to global function (total lung volume) in the lung over three imaged lung volumes, to derive a breathing lung model for computational fluid dynamics simulation. The algorithm constitutes the core of an integrative, image-based computational framework for subject-specific simulation of the breathing lung. For the first time, the algorithm is applied to three multi-detector row computed tomography (MDCT) volumetric lung images of the same individual. A key technique in linking global and local variables over multiple images is an in-house mass-preserving image registration method. Throughout breathing cycles, cubic interpolation is employed to ensure C1 continuity in constructing time-varying regional ventilation at the whole lung level, flow rate fractions exiting the terminal airways, and airway deformation. The imaged exit airway flow rate fractions are derived from regional ventilation with the aid of a three-dimensional (3D) and one-dimensional (1D) coupled airway tree that connects the airways to the alveolar tissue. An in-house parallel large-eddy simulation (LES) technique is adopted to capture turbulent-transitional-laminar flows in both normal and deep breathing conditions. The results obtained by the proposed algorithm when using three lung volume images are compared with those using only one or two volume images. The three-volume-based lung model produces physiologically-consistent time-varying pressure and ventilation distribution. The one-volume-based lung model under-predicts pressure drop and yields un-physiological lobar ventilation. The two-volume-based model can account for airway deformation and non-uniform regional ventilation to some extent, but does not capture the non-linear features of the lung. PMID:23794749
A Structure-Toxicity Study of Aß42 Reveals a New Anti-Parallel Aggregation Pathway

PubMed Central

Vignaud, Hélène; Bobo, Claude; Lascu, Ioan; Sörgjerd, Karin Margareta; Zako, Tamotsu; Maeda, Mizuo; Salin, Benedicte; Lecomte, Sophie; Cullin, Christophe

2013-01-01

Amyloid beta (Aβ) peptides produced by APP cleavage are central to the pathology of Alzheimer’s disease. Despite widespread interest in this issue, the relationship between the auto-assembly and toxicity of these peptides remains controversial. One intriguing feature stems from their capacity to form anti-parallel ß-sheet oligomeric intermediates that can be converted into a parallel topology to allow the formation of protofibrillar and fibrillar Aβ. Here, we present a novel approach to determining the molecular aspects of Aß assembly that is responsible for its in vivo toxicity. We selected Aß mutants with varying intracellular toxicities. In vitro, only toxic Aß (including wild-type Aß42) formed urea-resistant oligomers. These oligomers were able to assemble into fibrils that are rich in anti-parallel ß-sheet structures. Our results support the existence of a new pathway that depends on the folding capacity of Aß . PMID:24244667
Parallel Adaptive Mesh Refinement Library

NASA Technical Reports Server (NTRS)

Mac-Neice, Peter; Olson, Kevin

2005-01-01

Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.
A new parallel plate shear cell for in situ real-space measurements of complex fluids under shear flow.

PubMed

Wu, Yu Ling; Brand, Joost H J; van Gemert, Josephus L A; Verkerk, Jaap; Wisman, Hans; van Blaaderen, Alfons; Imhof, Arnout

2007-10-01

We developed and tested a parallel plate shear cell that can be mounted on top of an inverted microscope to perform confocal real-space measurements on complex fluids under shear. To follow structural changes in time, a plane of zero velocity is created by letting the plates move in opposite directions. The location of this plane is varied by changing the relative velocities of the plates. The gap width is variable between 20 and 200 microm with parallelism better than 1 microm. Such a small gap width enables us to examine the total sample thickness using high numerical aperture objective lenses. The achieved shear rates cover the range of 0.02-10(3) s(-1). This shear cell can apply an oscillatory shear with adjustable amplitude and frequency. The maximum travel of each plate equals 1 cm, so that strains up to 500 can be applied. For most complex fluids, an oscillatory shear with such a large amplitude can be regarded as a continuous shear. We measured the flow profile of a suspension of silica colloids in this shear cell. It was linear except for a small deviation caused by sedimentation. To demonstrate the excellent performance and capabilities of this new setup we examined shear induced crystallization and melting of concentrated suspensions of 1 microm diameter silica colloids.
The effect of cosmic-ray acceleration on supernova blast wave dynamics

NASA Astrophysics Data System (ADS)

Pais, M.; Pfrommer, C.; Ehlert, K.; Pakmor, R.

2018-05-01

Non-relativistic shocks accelerate ions to highly relativistic energies provided that the orientation of the magnetic field is closely aligned with the shock normal (quasi-parallel shock configuration). In contrast, quasi-perpendicular shocks do not efficiently accelerate ions. We model this obliquity-dependent acceleration process in a spherically expanding blast wave setup with the moving-mesh code AREPO for different magnetic field morphologies, ranging from homogeneous to turbulent configurations. A Sedov-Taylor explosion in a homogeneous magnetic field generates an oblate ellipsoidal shock surface due to the slower propagating blast wave in the direction of the magnetic field. This is because of the efficient cosmic ray (CR) production in the quasi-parallel polar cap regions, which softens the equation of state and increases the compressibility of the post-shock gas. We find that the solution remains self-similar because the ellipticity of the propagating blast wave stays constant in time. This enables us to derive an effective ratio of specific heats for a composite of thermal gas and CRs as a function of the maximum acceleration efficiency. We finally discuss the behavior of supernova remnants expanding into a turbulent magnetic field with varying coherence lengths. For a maximum CR acceleration efficiency of about 15 per cent at quasi-parallel shocks (as suggested by kinetic plasma simulations), we find an average efficiency of about 5 per cent, independent of the assumed magnetic coherence length.
INVITED TOPICAL REVIEW: Parallel magnetic resonance imaging

NASA Astrophysics Data System (ADS)

Larkman, David J.; Nunes, Rita G.

2007-04-01

Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed.
"Tools For Analysis and Visualization of Large Time- Varying CFD Data Sets"

NASA Technical Reports Server (NTRS)

Wilhelms, Jane; vanGelder, Allen

1999-01-01

During the four years of this grant (including the one year extension), we have explored many aspects of the visualization of large CFD (Computational Fluid Dynamics) datasets. These have included new direct volume rendering approaches, hierarchical methods, volume decimation, error metrics, parallelization, hardware texture mapping, and methods for analyzing and comparing images. First, we implemented an extremely general direct volume rendering approach that can be used to render rectilinear, curvilinear, or tetrahedral grids, including overlapping multiple zone grids, and time-varying grids. Next, we developed techniques for associating the sample data with a k-d tree, a simple hierarchial data model to approximate samples in the regions covered by each node of the tree, and an error metric for the accuracy of the model. We also explored a new method for determining the accuracy of approximate models based on the light field method described at ACM SIGGRAPH (Association for Computing Machinery Special Interest Group on Computer Graphics) '96. In our initial implementation, we automatically image the volume from 32 approximately evenly distributed positions on the surface of an enclosing tessellated sphere. We then calculate differences between these images under different conditions of volume approximation or decimation.
Circuital characterisation of space-charge motion with a time-varying applied bias

PubMed Central

Kim, Chul; Moon, Eun-Yi; Hwang, Jungho; Hong, Hiki

2015-01-01

Understanding the behaviour of space-charge between two electrodes is important for a number of applications. The Shockley-Ramo theorem and equivalent circuit models are useful for this; however, fundamental questions of the microscopic nature of the space-charge remain, including the meaning of capacitance and its evolution into a bulk property. Here we show that the microscopic details of the space-charge in terms of resistance and capacitance evolve in a parallel topology to give the macroscopic behaviour via a charge-based circuit or electric-field-based circuit. We describe two approaches to this problem, both of which are based on energy conservation: the energy-to-current transformation rule, and an energy-equivalence-based definition of capacitance. We identify a significant capacitive current due to the rate of change of the capacitance. Further analysis shows that Shockley-Ramo theorem does not apply with a time-varying applied bias, and an additional electric-field-based current is identified to describe the resulting motion of the space-charge. Our results and approach provide a facile platform for a comprehensive understanding of the behaviour of space-charge between electrodes. PMID:26133999

Large-scale parallel lattice Boltzmann-cellular automaton model of two-dimensional dendritic growth

NASA Astrophysics Data System (ADS)

Jelinek, Bohumir; Eshraghi, Mohsen; Felicelli, Sergio; Peters, John F.

2014-03-01

An extremely scalable lattice Boltzmann (LB)-cellular automaton (CA) model for simulations of two-dimensional (2D) dendritic solidification under forced convection is presented. The model incorporates effects of phase change, solute diffusion, melt convection, and heat transport. The LB model represents the diffusion, convection, and heat transfer phenomena. The dendrite growth is driven by a difference between actual and equilibrium liquid composition at the solid-liquid interface. The CA technique is deployed to track the new interface cells. The computer program was parallelized using the Message Passing Interface (MPI) technique. Parallel scaling of the algorithm was studied and major scalability bottlenecks were identified. Efficiency loss attributable to the high memory bandwidth requirement of the algorithm was observed when using multiple cores per processor. Parallel writing of the output variables of interest was implemented in the binary Hierarchical Data Format 5 (HDF5) to improve the output performance, and to simplify visualization. Calculations were carried out in single precision arithmetic without significant loss in accuracy, resulting in 50% reduction of memory and computational time requirements. The presented solidification model shows a very good scalability up to centimeter size domains, including more than ten million of dendrites. Catalogue identifier: AEQZ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQZ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, UK Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 29,767 No. of bytes in distributed program, including test data, etc.: 3131,367 Distribution format: tar.gz Programming language: Fortran 90. Computer: Linux PC and clusters. Operating system: Linux. Has the code been vectorized or parallelized?: Yes. Program is parallelized using MPI. Number of processors used: 1-50,000 RAM: Memory requirements depend on the grid size Classification: 6.5, 7.7. External routines: MPI (http://www.mcs.anl.gov/research/projects/mpi/), HDF5 (http://www.hdfgroup.org/HDF5/) Nature of problem: Dendritic growth in undercooled Al-3 wt% Cu alloy melt under forced convection. Solution method: The lattice Boltzmann model solves the diffusion, convection, and heat transfer phenomena. The cellular automaton technique is deployed to track the solid/liquid interface. Restrictions: Heat transfer is calculated uncoupled from the fluid flow. Thermal diffusivity is constant. Unusual features: Novel technique, utilizing periodic duplication of a pre-grown “incubation” domain, is applied for the scaleup test. Running time: Running time varies from minutes to days depending on the domain size and number of computational cores.
The effects of heat treatment on technological properties in Red-bud maple (Acer trautvetteri Medw.) wood.

PubMed

Korkut, Süleyman; Kök, M Samil; Korkut, Derya Sevim; Gürleyen, Tuğba

2008-04-01

Heat treatment is often used to improve the dimensional stability of wood. In this study, the effects of heat treatment on technological properties of Red-bud maple (Acer trautvetteri Medw.) wood were examined. Samples obtained from Düzce Forest Enterprises, Turkey, were subjected to heat treatment at varying temperatures (120 degrees C, 150 degrees C and 180 degrees C) and for varying durations (2h, 6h and 10h). The technological properties of heat-treated wood samples and control samples were tested. Compression strength parallel to grain, bending strength, modulus of elasticity in bending, janka-hardness, impact bending strength, and tension strength perpendicular to grain were determined. The results showed that technological strength values decreased with increasing treatment temperature and treatment times. Red-bud maple wood could be utilized by using proper heat treatment techniques with minimal losses in strength values in areas where working, and stability such as in window frames, are important factors.
A derivation and scalable implementation of the synchronous parallel kinetic Monte Carlo method for simulating long-time dynamics

NASA Astrophysics Data System (ADS)

Byun, Hye Suk; El-Naggar, Mohamed Y.; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2017-10-01

Kinetic Monte Carlo (KMC) simulations are used to study long-time dynamics of a wide variety of systems. Unfortunately, the conventional KMC algorithm is not scalable to larger systems, since its time scale is inversely proportional to the simulated system size. A promising approach to resolving this issue is the synchronous parallel KMC (SPKMC) algorithm, which makes the time scale size-independent. This paper introduces a formal derivation of the SPKMC algorithm based on local transition-state and time-dependent Hartree approximations, as well as its scalable parallel implementation based on a dual linked-list cell method. The resulting algorithm has achieved a weak-scaling parallel efficiency of 0.935 on 1024 Intel Xeon processors for simulating biological electron transfer dynamics in a 4.2 billion-heme system, as well as decent strong-scaling parallel efficiency. The parallel code has been used to simulate a lattice of cytochrome complexes on a bacterial-membrane nanowire, and it is broadly applicable to other problems such as computational synthesis of new materials.
Parallel algorithms for mapping pipelined and parallel computations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Efficient Simulation of Compressible, Viscous Fluids using Multi-rate Time Integration

NASA Astrophysics Data System (ADS)

Mikida, Cory; Kloeckner, Andreas; Bodony, Daniel

2017-11-01

In the numerical simulation of problems of compressible, viscous fluids with single-rate time integrators, the global timestep used is limited to that of the finest mesh point or fastest physical process. This talk discusses the application of multi-rate Adams-Bashforth (MRAB) integrators to an overset mesh framework to solve compressible viscous fluid problems of varying scale with improved efficiency, with emphasis on the strategy of timescale separation and the application of the resulting numerical method to two sample problems: subsonic viscous flow over a cylinder and a viscous jet in crossflow. The results presented indicate the numerical efficacy of MRAB integrators, outline a number of outstanding code challenges, demonstrate the expected reduction in time enabled by MRAB, and emphasize the need for proper load balancing through spatial decomposition in order for parallel runs to achieve the predicted time-saving benefit. This material is based in part upon work supported by the Department of Energy, National Nuclear Security Administration, under Award Number DE-NA0002374.
Parallel coding of conjunctions in visual search.

PubMed

Found, A

1998-10-01

Two experiments investigated whether the conjunctive nature of nontarget items influenced search for a conjunction target. Each experiment consisted of two conditions. In both conditions, the target item was a red bar tilted to the right, among white tilted bars and vertical red bars. As well as color and orientation, display items also differed in terms of size. Size was irrelevant to search in that the size of the target varied randomly from trial to trial. In one condition, the size of items correlated with the other attributes of display items (e.g., all red items were big and all white items were small). In the other condition, the size of items varied randomly (i.e., some red items were small and some were big, and some white items were big and some were small). Search was more efficient in the size-correlated condition, consistent with the parallel coding of conjunctions in visual search.
Factors related to the parallel use of complementary and alternative medicine with conventional medicine among patients with chronic conditions in South Korea.

PubMed

Choi, Byunghee; Han, Dongwoon; Na, Seonsam; Lim, Byungmook

2017-06-01

This study aims to examine the characteristics and behavioral patterns of patients with chronic conditions behind their parallel use of the conventional medicine (CM) and the complementary and alternative medicine (CAM) that includes traditional Korean Medicine (KM). This cross-sectional study used the self-administered anonymous survey method to obtain the results from inpatients who were staying in three hospitals in Gyeongnam province in Korea. Of the 423 participants surveyed, 334 participants (79.0%) used some form of CAM among which KM therapies were the most common modalities. The results of a logistic regression analysis showed that the parallel use pattern was most apparent in the groups aged over 40. Patients with hypertension or joint diseases were seen to have higher propensity to show the parallel use patterns, whereas patients with diabetes were not. In addition, many sociodemographic and health-related characteristics are related to the patterns of the parallel use of CAM and CM. In the rural area of Korea, most inpatients who used CM for the management of chronic conditions used CAM in parallel. KM was the most common in CAM modalities, and the aspect of parallel use varied according to the disease conditions.
A Parallel Framework with Block Matrices of a Discrete Fourier Transform for Vector-Valued Discrete-Time Signals.

PubMed

Soto-Quiros, Pablo

2015-01-01

This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT): the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.
A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

NASA Technical Reports Server (NTRS)

Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

2005-01-01

A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.
Non-Cartesian Parallel Imaging Reconstruction

PubMed Central

Wright, Katherine L.; Hamilton, Jesse I.; Griswold, Mark A.; Gulani, Vikas; Seiberlich, Nicole

2014-01-01

Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be employed to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the non-homogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian GRAPPA, and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. PMID:24408499
Aggregation and Gelation of Aromatic Polyamides with Parallel and Anti-parallel Alignment of Molecular Dipole Along the Backbone

NASA Astrophysics Data System (ADS)

Zhu, Dan; Shang, Jing; Ye, Xiaodong; Shen, Jian

2016-12-01

The understanding of macromolecular structures and interactions is important but difficult, due to the facts that a macromolecules are of versatile conformations and aggregate states, which vary with environmental conditions and histories. In this work two polyamides with parallel or anti-parallel dipoles along the linear backbone, named as ABAB (parallel) and AABB (anti-parallel) have been studied. By using a combination of methods, the phase behaviors of the polymers during the aggregate and gelation, i.e., the forming or dissociation processes of nuclei and fibril, cluster of fibrils, and cluster-cluster aggregation have been revealed. Such abundant phase behaviors are dominated by the inter-chain interactions, including dispersion, polarity and hydrogen bonding, and correlatd with the solubility parameters of solvents, the temperature, and the polymer concentration. The results of X-ray diffraction and fast-mode dielectric relaxation indicate that AABB possesses more rigid conformation than ABAB, and because of that AABB aggregates are of long fibers while ABAB is of hairy fibril clusters, the gelation concentration in toluene is 1 w/v% for AABB, lower than the 3 w/v% for ABAB.
Fusion of Asynchronous, Parallel, Unreliable Data Streams

DTIC Science & Technology

2010-09-01

channels that might be used. The two channels chosen for this study, galvanic skin response (GSR) and pulse rate, are convenient and reasonably well...vector as NA. The MDS software tool, PERMAP, uses this same abbreviation. The impact of the lack of information may vary depending on the situation...of how PERMAP (and MDS in general) functions when the input parameters are varied. That is outlined in this section; the impact of those choices is
Externally Calibrated Parallel Imaging for 3D Multispectral Imaging Near Metallic Implants Using Broadband Ultrashort Echo Time Imaging

PubMed Central

Wiens, Curtis N.; Artz, Nathan S.; Jang, Hyungseok; McMillan, Alan B.; Reeder, Scott B.

2017-01-01

Purpose To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. Theory and Methods A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Results Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. Conclusion A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. PMID:27403613
Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

NASA Astrophysics Data System (ADS)

Akil, Mohamed

2017-05-01

The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

PubMed

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
String resistance detector

NASA Technical Reports Server (NTRS)

Hall, A. Daniel (Inventor); Davies, Francis J. (Inventor)

2007-01-01

Method and system are disclosed for determining individual string resistance in a network of strings when the current through a parallel connected string is unknown and when the voltage across a series connected string is unknown. The method/system of the invention involves connecting one or more frequency-varying impedance components with known electrical characteristics to each string and applying a frequency-varying input signal to the network of strings. The frequency-varying impedance components may be one or more capacitors, inductors, or both, and are selected so that each string is uniquely identifiable in the output signal resulting from the frequency-varying input signal. Numerical methods, such as non-linear regression, may then be used to resolve the resistance associated with each string.
A PC parallel port button box provides millisecond response time accuracy under Linux.

PubMed

Stewart, Neil

2006-02-01

For psychologists, it is sometimes necessary to measure people's reaction times to the nearest millisecond. This article describes how to use the PC parallel port to receive signals from a button box to achieve millisecond response time accuracy. The workings of the parallel port, the corresponding port addresses, and a simple Linux program for controlling the port are described. A test of the speed and reliability of button box signal detection is reported. If the reader is moderately familiar with Linux, this article should provide sufficient instruction for him or her to build and test his or her own parallel port button box. This article also describes how the parallel port could be used to control an external apparatus.
Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Multigrid methods with space–time concurrency

DOE PAGES

Falgout, R. D.; Friedhoff, S.; Kolev, Tz. V.; ...

2017-10-06

Here, we consider the comparison of multigrid methods for parabolic partial differential equations that allow space–time concurrency. With current trends in computer architectures leading towards systems with more, but not faster, processors, space–time concurrency is crucial for speeding up time-integration simulations. In contrast, traditional time-integration techniques impose serious limitations on parallel performance due to the sequential nature of the time-stepping approach, allowing spatial concurrency only. This paper considers the three basic options of multigrid algorithms on space–time grids that allow parallelism in space and time: coarsening in space and time, semicoarsening in the spatial dimensions, and semicoarsening in the temporalmore » dimension. We develop parallel software and performance models to study the three methods at scales of up to 16K cores and introduce an extension of one of them for handling multistep time integration. We then discuss advantages and disadvantages of the different approaches and their benefit compared to traditional space-parallel algorithms with sequential time stepping on modern architectures.« less
Multigrid methods with space–time concurrency

DOE Office of Scientific and Technical Information (OSTI.GOV)

Falgout, R. D.; Friedhoff, S.; Kolev, Tz. V.

Here, we consider the comparison of multigrid methods for parabolic partial differential equations that allow space–time concurrency. With current trends in computer architectures leading towards systems with more, but not faster, processors, space–time concurrency is crucial for speeding up time-integration simulations. In contrast, traditional time-integration techniques impose serious limitations on parallel performance due to the sequential nature of the time-stepping approach, allowing spatial concurrency only. This paper considers the three basic options of multigrid algorithms on space–time grids that allow parallelism in space and time: coarsening in space and time, semicoarsening in the spatial dimensions, and semicoarsening in the temporalmore » dimension. We develop parallel software and performance models to study the three methods at scales of up to 16K cores and introduce an extension of one of them for handling multistep time integration. We then discuss advantages and disadvantages of the different approaches and their benefit compared to traditional space-parallel algorithms with sequential time stepping on modern architectures.« less

Parallel implementation of all-digital timing recovery for high-speed and real-time optical coherent receivers.

PubMed

Zhou, Xian; Chen, Xue

2011-05-09

The digital coherent receivers combine coherent detection with digital signal processing (DSP) to compensate for transmission impairments, and therefore are a promising candidate for future high-speed optical transmission system. However, the maximum symbol rate supported by such real-time receivers is limited by the processing rate of hardware. In order to cope with this difficulty, the parallel processing algorithms is imperative. In this paper, we propose a novel parallel digital timing recovery loop (PDTRL) based on our previous work. Furthermore, for increasing the dynamic dispersion tolerance range of receivers, we embed a parallel adaptive equalizer in the PDTRL. This parallel joint scheme (PJS) can be used to complete synchronization, equalization and polarization de-multiplexing simultaneously. Finally, we demonstrate that PDTRL and PJS allow the hardware to process 112 Gbit/s POLMUX-DQPSK signal at the hundreds MHz range. © 2011 Optical Society of America
Numerical modelling of the formation of fibrous bedding-parallel veins

NASA Astrophysics Data System (ADS)

Torremans, Koen; Muchez, Philippe; Sintubin, Manuel

2014-05-01

Bedding-parallel veins with a fibrous infill oriented orthogonal to the vein wall, are often observed in fine-grained metasedimentary sequences. Several mechanisms have been proposed for their formation, mostly with respect to effects of fluid overpressures and anisotropy of the host-rock fabric in order to explain the inferred extensional failure with sub-vertical opening. Abundant pre-folding, bedding-parallel fibrous dolomite veins are found associated with the Nkana-Mindola stratiform Cu-Co deposit in Zambia. The goal of this study is to better understand the formation mechanisms of these veins and to explain their particular spatial and thickness distribution, with respect to failure of transversely isotropic rocks. The spatial distribution and thickness variation of these veins was quantified during a field campaign in thirteen line transects perpendicular to undeformed veins in underground crosscuts. The fibrous dolomite veins studied are not related to lithological contrasts, but to a strong bedding-parallel shaly fabric, typical for the black shale facies of the Copperbelt Orebody Member. The host rock can hence be considered as transversely isotropic. Growth morphologies vary from antitaxial with a pronounced median surface to asymmetric syntaxial, always with small but quantifiable growth competition. A microstructural fabric study reveals that the undeformed dolomite veins show low-tortuosity vein walls and quantifiable growth competition. Here, we use a Discrete Element Method numerical modelling approach with ESyS-Particle (http://launchpad.net/esys-particle) to simulate the observed properties of the veins. Calibrated numerical specimens with a transversely isotropic matrix are repeatedly brought to failure under constant strain rates by changing the effective strain rates at model boundaries. After each fracture event, fractures in the numerical model are filled with cohesive vein material and the experiment is repeated. By systematically varying stress states, fluid pressures and mechanical properties of materials (host rock, vein infill and interface), we attempt to reproduce the characteristics of spatial distribution and thickness variation of the veins. Four parameter sets of mechanical micro-properties are defined in the models, essentially yielding (1) a competent and (2) incompetent matrix, (3) a vein material and (4) a vein-matrix interface. Each combination of parameters and particle packings is calibrated to fit a predetermined Mohr-Coulomb type failure envelope, via an automated calibration procedure. Preliminary tests already show that by varying these parameters, we are able to simulate realistically distributed cracking through crack-seal processes. Different types of veins and vein generations can be modelled, ranging from single veins, over crack-seal veins to anastomosing veins, by varying the mechanical strength of competent and incompetent matrix, vein and interface material. Further results of this approach will be presented. We will discuss our results with respect to mechanisms proposed in the literature for bedding-parallel, fibrous veins in metasedimentary rock sequences.
Parallel computations and control of adaptive structures

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

1991-01-01

The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Solution pH change in non-uniform alternating current electric fields at frequencies above the electrode charging frequency

PubMed Central

An, Ran; Massa, Katherine

2014-01-01

AC Faradaic reactions have been reported as a mechanism inducing non-ideal phenomena such as flow reversal and cell deformation in electrokinetic microfluidic systems. Prior published work described experiments in parallel electrode arrays below the electrode charging frequency (fc), the frequency for electrical double layer charging at the electrode. However, 2D spatially non-uniform AC electric fields are required for applications such as in plane AC electroosmosis, AC electrothermal pumps, and dielectrophoresis. Many microscale experimental applications utilize AC frequencies around or above fc. In this work, a pH sensitive fluorescein sodium salt dye was used to detect [H+] as an indicator of Faradaic reactions in aqueous solutions within non-uniform AC electric fields. Comparison experiments with (a) parallel (2D uniform fields) electrodes and (b) organic media were employed to deduce the electrode charging mechanism at 5 kHz (1.5fc). Time dependency analysis illustrated that Faradaic reactions exist above the theoretically predicted electrode charging frequency. Spatial analysis showed [H+] varied spatially due to electric field non-uniformities and local pH changed at length scales greater than 50 μm away from the electrode surface. Thus, non-uniform AC fields yielded spatially varied pH gradients as a direct consequence of ion path length differences while uniform fields did not yield pH gradients; the latter is consistent with prior published data. Frequency dependence was examined from 5 kHz to 12 kHz at 5.5 Vpp potential, and voltage dependency was explored from 3.5 to 7.5 Vpp at 5 kHz. Results suggest that Faradaic reactions can still proceed within electrochemical systems in the absence of well-established electrical double layers. This work also illustrates that in microfluidic systems, spatial medium variations must be considered as a function of experiment time, initial medium conditions, electric signal potential, frequency, and spatial position. PMID:25553200
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

NASA Astrophysics Data System (ADS)

Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

2013-08-01

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution time/parallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
Externally calibrated parallel imaging for 3D multispectral imaging near metallic implants using broadband ultrashort echo time imaging.

PubMed

Wiens, Curtis N; Artz, Nathan S; Jang, Hyungseok; McMillan, Alan B; Reeder, Scott B

2017-06-01

To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. Magn Reson Med 77:2303-2309, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Branched Polymers for Enhancing Polymer Gel Strength and Toughness

DTIC Science & Technology

2013-02-01

Molecular Massively Parallel Simulator ( LAMMPS ) program and the stress-strain relations were calculated with varying strain-rates (figure 6). A...Acronyms ARL U.S. Army Research Laboratory D3 hexamethylcyclotrisiloxane FTIR Fourier transform infrared GPC gel permeation chromatography LAMMPS
Parallel simulation today

NASA Technical Reports Server (NTRS)

Nicol, David; Fujimoto, Richard

1992-01-01

This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Spatial-Temporal Heterogeneity in Regional Watershed Phosphorus Cycles Driven by Changes in Human Activity over the Past Century

NASA Astrophysics Data System (ADS)

Hale, R. L.; Grimm, N. B.; Vorosmarty, C. J.

2014-12-01

An ongoing challenge for society is to harness the benefits of phosphorus (P) while minimizing negative effects on downstream ecosystems. To meet this challenge we must understand the controls on the delivery of anthropogenic P from landscapes to downstream ecosystems. We used a model that incorporates P inputs to watersheds, hydrology, and infrastructure (sewers, waste-water treatment plants, and reservoirs) to reconstruct historic P yields for the northeastern U.S. from 1930 to 2002. At the regional scale, increases in P inputs were paralleled by increased fractional retention, thus P loading to the coast did not increase significantly. We found that temporal variation in regional P yield was correlated with P inputs. Spatial patterns of watershed P yields were best predicted by inputs, but the correlation between inputs and yields in space weakened over time, due to infrastructure development. Although the magnitude of infrastructure effect was small, its role changed over time and was important in creating spatial and temporal heterogeneity in input-yield relationships. We then conducted a hierarchical cluster analysis to identify a typology of anthropogenic P cycling, using data on P inputs (fertilizer, livestock feed, and human food), infrastructure (dams, wastewater treatment plants, sewers), and hydrology (runoff coefficient). We identified 6 key types of watersheds that varied significantly in climate, infrastructure, and the types and amounts of P inputs. Annual watershed P yields and retention varied significantly across watershed types. Although land cover varied significantly across typologies, clusters based on land cover alone did not explain P budget patterns, suggesting that this variable is insufficient to understand patterns of P cycling across large spatial scales. Furthermore, clusters varied over time as patterns of climate, P use, and infrastructure changed. Our results demonstrate that the drivers of P cycles are spatially and temporally heterogeneous, yet they also suggest that a relatively simple typology of watersheds can be useful for understanding regional P cycles and may help inform P management approaches.
Magnetic-Flux-Compensated Voltage Divider

NASA Technical Reports Server (NTRS)

Mata, Carlos T.

2005-01-01

A magnetic-flux-compensated voltage-divider circuit has been proposed for use in measuring the true potential across a component that is exposed to large, rapidly varying electric currents like those produced by lightning strikes. An example of such a component is a lightning arrester, which is typically exposed to currents of the order of tens of kiloamperes, having rise times of the order of hundreds of nanoseconds. Traditional voltage-divider circuits are not designed for magnetic-flux-compensation: They contain uncompensated loops having areas large enough that the transient magnetic fluxes associated with large transient currents induce spurious voltages large enough to distort voltage-divider outputs significantly. A drawing of the proposed circuit was not available at the time of receipt of information for this article. What is known from a summary textual description is that the proposed circuit would contain a total of four voltage dividers: There would be two mixed dividers in parallel with each other and with the component of interest (e.g., a lightning arrester), plus two mixed dividers in parallel with each other and in series with the component of interest in the same plane. The electrical and geometric configuration would provide compensation for induced voltages, including those attributable to asymmetry in the volumetric density of the lightning or other transient current, canceling out the spurious voltages and measuring the true voltage across the component.
Magnetic Helicity Injection and Thermal Transport

NASA Astrophysics Data System (ADS)

Moses, Ronald; Gerwin, Richard; Schoenberg, Kurt

1999-11-01

In magnetic helicity injection, a current is driven between electrodes, parallel to the magnetic field in the edge plasma of a machine.^1 Plasma instabilities distribute current throughout the plasma. To model the injection of magnetic helicity, K, into an arbitrary closed surface, K is defined as the volume integral of A^.B. To make K unique, a gauge is chosen where the tangential surface components of A are purely solenoidal. If magnetic fields within a plasma are time varying, yet undergo no macroscopic changes over an extended period, and if the plasma is subject to an Ohm’s law with Hall terms, then it is shown that no closed magnetic surfaces with sustained internal currents can exist continuously within the plasma.^2 It is also shown that parallel thermal transport connects all parts of the plasma to the helicity injection electrodes and requires the electrode voltage difference to be at least 2.5 to 3 times the peak plasma temperature. This ratio is almost independent of the length of the electron mean-free path. If magnetic helicity injection is to be used for fusion-grade plasmas, then high-voltage, high-impedance injection techniques must be developed. ^1T. R. Jarboe, Plasma Physics and Controlled Fusion, V36, 945-990 (June 1994). ^2R. W. Moses, 1991 Sherwood International Fusion Theory Conference, Seattle, WA (April 22-24, 1991).
Dynamics of an integral membrane peptide: a deuterium NMR relaxation study of gramicidin.

PubMed Central

Prosser, R S; Davis, J H

1994-01-01

Solid state deuterium (2H) NMR inversion-recovery and Jeener-Broekaert relaxation experiments were performed on oriented multilamellar dispersions consisting of 1,2-dilauroyl-sn-glycero-3-phosphatidylcholine and 2H exchange-labeled gramicidin D, at a lipid to protein molar ratio (L/P) of 15:1, in order to study the dynamics of the channel conformation of the peptide in a liquid crystalline phase. Our dynamic model for the whole body motions of the peptide includes diffusion of the peptide around its helix axis and a wobbling diffusion around a second axis perpendicular to the local bilayer normal in a simple Maier-Saupe mean field potential. This anisotropic diffusion is characterized by the correlation times, tau R parallel and tau R perpendicular. Aligning the bilayer normal perpendicular to the magnetic field and graphing the relaxation rate, 1/T1Z, as a function of (1-S2N-2H), where S2N-2H represents the orientational order parameter, wer were able to estimate the correlation time, tau R parallel, for rotational diffusion. Although in the quadrupolar splitting, which varies as (3 cos2 theta D-1), has in general two possible solutions to theta D in the range 0 < or = theta D < or = 90 degrees, the 1/T1Z vs. (1-S2N-2H) curve can be used to determine a single value of theta D in this range. Thus, the 1/T1Z vs. (1-S2N-2H) profile can be used both to define the axial diffusion rate and to remove potential structural ambiguities in the splittings. The T1Z anisotropy permits us to solve for the two correlation times (tau R parallel = 6.8 x 10(-9) s and tau R perpendicular = 6 x 10(-6) s). The simulated parameters were corroborated by a Jeener-Broekaert experiment where the bilayer normal was parallel to the principal magnetic field. At this orientation the ratio, J2(2 omega 0)/J1(omega 0) was obtained in order to estimate the strength of the restoring potential in a model-independent fashion. This measurement yields the rms angle, 1/2 (= 16 +/- 2 degrees at 34 degrees C), formed by the peptide helix axis and the average bilayer normal. PMID:7520294
Knee Kinetics during Squats of Varying Loads and Depths in Recreationally Trained Females.

PubMed

Flores, Victoria; Becker, James; Burkhardt, Eric; Cotter, Joshua

2018-03-06

The back squat exercise is typically practiced with varying squat depths and barbell loads. However, depth has been inconsistently defined, resulting in unclear safety precautions when squatting with loads. Additionally, females exhibit anatomical and kinematic differences to males which may predispose them to knee joint injuries. The purpose of this study was to characterize peak knee extensor moments (pKEMs) at three commonly practiced squat depths of above parallel, parallel, and full depth, and with three loads of 0% (unloaded), 50%, and 85% depth-specific one repetition maximum (1RM) in recreationally active females. Nineteen females (age, 25.1 ± 5.8 years; body mass, 62.5 ± 10.2 kg; height, 1.6 ± 0.10 m; mean ± SD) performed squats of randomized depth and load. Inverse dynamics were used to obtain pKEMs from three-dimensional knee kinematics. Depth and load had significant interaction effects on pKEMs (p = 0.014). Significantly greater pKEMs were observed at full depth compared to parallel depth with 50% 1RM load (p = 0.001, d = 0.615), and 85% 1RM load (p = 0.010, d = 0.714). Greater pKEMs were also observed at full depth compared to above parallel depth with 50% 1RM load (p = 0.003, d = 0.504). Results indicate effect of load on female pKEMs do not follow a progressively increasing pattern with either increasing depth or load. Therefore, when high knee loading is a concern, individuals are must carefully consider both the depth of squat being performed and the relative load they are using.
Parallel Evolution of Copy-Number Variation across Continents in Drosophila melanogaster.

PubMed

Schrider, Daniel R; Hahn, Matthew W; Begun, David J

2016-05-01

Genetic differentiation across populations that is maintained in the presence of gene flow is a hallmark of spatially varying selection. In Drosophila melanogaster, the latitudinal clines across the eastern coasts of Australia and North America appear to be examples of this type of selection, with recent studies showing that a substantial portion of the D. melanogaster genome exhibits allele frequency differentiation with respect to latitude on both continents. As of yet there has been no genome-wide examination of differentiated copy-number variants (CNVs) in these geographic regions, despite their potential importance for phenotypic variation in Drosophila and other taxa. Here, we present an analysis of geographic variation in CNVs in D. melanogaster. We also present the first genomic analysis of geographic variation for copy-number variation in the sister species, D. simulans, in order to investigate patterns of parallel evolution in these close relatives. In D. melanogaster we find hundreds of CNVs, many of which show parallel patterns of geographic variation on both continents, lending support to the idea that they are influenced by spatially varying selection. These findings support the idea that polymorphic CNVs contribute to local adaptation in D. melanogaster In contrast, we find very few CNVs in D. simulans that are geographically differentiated in parallel on both continents, consistent with earlier work suggesting that clinal patterns are weaker in this species. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Research on Parallel Three Phase PWM Converters base on RTDS

NASA Astrophysics Data System (ADS)

Xia, Yan; Zou, Jianxiao; Li, Kai; Liu, Jingbo; Tian, Jun

2018-01-01

Converters parallel operation can increase capacity of the system, but it may lead to potential zero-sequence circulating current, so the control of circulating current was an important goal in the design of parallel inverters. In this paper, the Real Time Digital Simulator (RTDS) is used to model the converters parallel system in real time and study the circulating current restraining. The equivalent model of two parallel converters and zero-sequence circulating current(ZSCC) were established and analyzed, then a strategy using variable zero vector control was proposed to suppress the circulating current. For two parallel modular converters, hardware-in-the-loop(HIL) study based on RTDS and practical experiment were implemented, results prove that the proposed control strategy is feasible and effective.
Thermal-hydraulic simulation of natural convection decay heat removal in the High Flux Isotope Reactor (HFIR) using RELAP5 and TEMPEST: Part 2, Interpretation and validation of results

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruggles, A.E.; Morris, D.G.

The RELAP5/MOD2 code was used to predict the thermal-hydraulic behavior of the HFIR core during decay heat removal through boiling natural circulation. The low system pressure and low mass flux values associated with boiling natural circulation are far from conditions for which RELAP5 is well exercised. Therefore, some simple hand calculations are used herein to establish the physics of the results. The interpretation and validation effort is divided between the time average flow conditions and the time varying flow conditions. The time average flow conditions are evaluated using a lumped parameter model and heat balance. The Martinelli-Nelson correlations are usedmore » to model the two-phase pressure drop and void fraction vs flow quality relationship within the core region. Systems of parallel channels are susceptible to both density wave oscillations and pressure drop oscillations. Periodic variations in the mass flux and exit flow quality of individual core channels are predicted by RELAP5. These oscillations are consistent with those observed experimentally and are of the density wave type. The impact of the time varying flow properties on local wall superheat is bounded herein. The conditions necessary for Ledinegg flow excursions are identified. These conditions do not fall within the envelope of decay heat levels relevant to HFIR in boiling natural circulation. 14 refs., 5 figs., 1 tab.« less
Fast parallel approach for 2-D DHT-based real-valued discrete Gabor transform.

PubMed

Tao, Liang; Kwan, Hon Keung

2009-12-01

Two-dimensional fast Gabor transform algorithms are useful for real-time applications due to the high computational complexity of the traditional 2-D complex-valued discrete Gabor transform (CDGT). This paper presents two block time-recursive algorithms for 2-D DHT-based real-valued discrete Gabor transform (RDGT) and its inverse transform and develops a fast parallel approach for the implementation of the two algorithms. The computational complexity of the proposed parallel approach is analyzed and compared with that of the existing 2-D CDGT algorithms. The results indicate that the proposed parallel approach is attractive for real time image processing.
Parallel processing of real-time dynamic systems simulation on OSCAR (Optimally SCheduled Advanced multiprocessoR)

NASA Technical Reports Server (NTRS)

Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke

1989-01-01

Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
Irregular earthquake recurrence patterns and slip variability on a plate-boundary Fault

NASA Astrophysics Data System (ADS)

Wechsler, N.; Rockwell, T. K.; Klinger, Y.

2015-12-01

The Dead Sea fault in the Levant represents a simple, segmented plate boundary from the Gulf of Aqaba northward to the Sea of Galilee, where it changes its character into a complex plate boundary with multiple sub-parallel faults in northern Israel, Lebanon and Syria. The studied Jordan Gorge (JG) segment is the northernmost part of the simple section, before the fault becomes more complex. Seven fault-crossing buried paleo-channels, offset by the Dead Sea fault, were investigated using paleoseismic and geophysical methods. The mapped offsets capture the long-term rupture history and slip-rate behavior on the JG fault segment for the past 4000 years. The ~20 km long JG segment appears to be more active (in term of number of earthquakes) than its neighboring segments to the south and north. The rate of movement on this segment varies considerably over the studied period: the long-term slip-rate for the entire 4000 years is similar to previously observed rates (~4 mm/yr), yet over shorter time periods the rate varies from 3-8 mm/yr. Paleoseismic data on both timing and displacement indicate a high COV >1 (clustered) with displacement per event varying by nearly an order of magnitude. The rate of earthquake production does not produce a time predictable pattern over a period of 2 kyr. We postulate that the seismic behavior of the JG fault is influenced by stress interactions with its neighboring faults to the north and south. Coulomb stress modelling demonstrates that an earthquake on any neighboring fault will increase the Coulomb stress on the JG fault and thus promote rupture. We conclude that deriving on-fault slip-rates and earthquake recurrence patterns from a single site and/or over a short time period can produce misleading results. The definition of an adequately long time period to resolve slip-rate is a question that needs to be addressed and requires further work.
Parallel Computation of the Regional Ocean Modeling System (ROMS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, P; Song, Y T; Chao, Y

2005-04-05

The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less

Modelling and experimental evaluation of parallel connected lithium ion cells for an electric vehicle battery system

NASA Astrophysics Data System (ADS)

Bruen, Thomas; Marco, James

2016-04-01

Variations in cell properties are unavoidable and can be caused by manufacturing tolerances and usage conditions. As a result of this, cells connected in series may have different voltages and states of charge that limit the energy and power capability of the complete battery pack. Methods of removing this energy imbalance have been extensively reported within literature. However, there has been little discussion around the effect that such variation has when cells are connected electrically in parallel. This work aims to explore the impact of connecting cells, with varied properties, in parallel and the issues regarding energy imbalance and battery management that may arise. This has been achieved through analysing experimental data and a validated model. The main results from this study highlight that significant differences in current flow can occur between cells within a parallel stack that will affect how the cells age and the temperature distribution within the battery assembly.
Time Parallel Solution of Linear Partial Differential Equations on the Intel Touchstone Delta Supercomputer

NASA Technical Reports Server (NTRS)

Toomarian, N.; Fijany, A.; Barhen, J.

1993-01-01

Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.
A Lightweight Remote Parallel Visualization Platform for Interactive Massive Time-varying Climate Data Analysis

NASA Astrophysics Data System (ADS)

Li, J.; Zhang, T.; Huang, Q.; Liu, Q.

2014-12-01

Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.
Obtaining identical results with double precision global accuracy on different numbers of processors in parallel particle Monte Carlo simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cleveland, Mathew A., E-mail: cleveland7@llnl.gov; Brunner, Thomas A.; Gentile, Nicholas A.

2013-10-15

We describe and compare different approaches for achieving numerical reproducibility in photon Monte Carlo simulations. Reproducibility is desirable for code verification, testing, and debugging. Parallelism creates a unique problem for achieving reproducibility in Monte Carlo simulations because it changes the order in which values are summed. This is a numerical problem because double precision arithmetic is not associative. Parallel Monte Carlo, both domain replicated and decomposed simulations, will run their particles in a different order during different runs of the same simulation because the non-reproducibility of communication between processors. In addition, runs of the same simulation using different domain decompositionsmore » will also result in particles being simulated in a different order. In [1], a way of eliminating non-associative accumulations using integer tallies was described. This approach successfully achieves reproducibility at the cost of lost accuracy by rounding double precision numbers to fewer significant digits. This integer approach, and other extended and reduced precision reproducibility techniques, are described and compared in this work. Increased precision alone is not enough to ensure reproducibility of photon Monte Carlo simulations. Non-arbitrary precision approaches require a varying degree of rounding to achieve reproducibility. For the problems investigated in this work double precision global accuracy was achievable by using 100 bits of precision or greater on all unordered sums which where subsequently rounded to double precision at the end of every time-step.« less
Including trait-based early warning signals helps predict population collapse

PubMed Central

Clements, Christopher F.; Ozgul, Arpat

2016-01-01

Foreseeing population collapse is an on-going target in ecology, and this has led to the development of early warning signals based on expected changes in leading indicators before a bifurcation. Such signals have been sought for in abundance time-series data on a population of interest, with varying degrees of success. Here we move beyond these established methods by including parallel time-series data of abundance and fitness-related trait dynamics. Using data from a microcosm experiment, we show that including information on the dynamics of phenotypic traits such as body size into composite early warning indices can produce more accurate inferences of whether a population is approaching a critical transition than using abundance time-series alone. By including fitness-related trait information alongside traditional abundance-based early warning signals in a single metric of risk, our generalizable approach provides a powerful new way to assess what populations may be on the verge of collapse. PMID:27009968
Simple method for generating adjustable trains of picosecond electron bunches

NASA Astrophysics Data System (ADS)

Muggli, P.; Allen, B.; Yakimenko, V. E.; Park, J.; Babzien, M.; Kusche, K. P.; Kimura, W. D.

2010-05-01

A simple, passive method for producing an adjustable train of picosecond electron bunches is demonstrated. The key component of this method is an electron beam mask consisting of an array of parallel wires that selectively spoils the beam emittance. This mask is positioned in a high magnetic dispersion, low beta-function region of the beam line. The incoming electron beam striking the mask has a time/energy correlation that corresponds to a time/position correlation at the mask location. The mask pattern is transformed into a time pattern or train of bunches when the dispersion is brought back to zero downstream of the mask. Results are presented of a proof-of-principle experiment demonstrating this novel technique that was performed at the Brookhaven National Laboratory Accelerator Test Facility. This technique allows for easy tailoring of the bunch train for a particular application, including varying the bunch width and spacing, and enabling the generation of a trailing witness bunch.
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Parallel consistent labeling algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samal, A.; Henderson, T.

Mackworth and Freuder have analyzed the time complexity of several constraint satisfaction algorithms. Mohr and Henderson have given new algorithms, AC-4 and PC-3, for arc and path consistency, respectively, and have shown that the arc consistency algorithm is optimal in time complexity and of the same order space complexity as the earlier algorithms. In this paper, they give parallel algorithms for solving node and arc consistency. They show that any parallel algorithm for enforcing arc consistency in the worst case must have O(na) sequential steps, where n is number of nodes, and a is the number of labels per node.more » They give several parallel algorithms to do arc consistency. It is also shown that they all have optimal time complexity. The results of running the parallel algorithms on a BBN Butterfly multiprocessor are also presented.« less
Bayer image parallel decoding based on GPU

NASA Astrophysics Data System (ADS)

Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

2012-11-01

In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
PRAIS: Distributed, real-time knowledge-based systems made easy

NASA Technical Reports Server (NTRS)

Goldstein, David G.

1990-01-01

This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.
Spherical harmonic representation of the main geomagnetic field for world charting and investigations of some fundamental problems of physics and geophysics

NASA Technical Reports Server (NTRS)

Barraclough, D. R.; Hide, R.; Leaton, B. R.; Lowes, F. J.; Malin, S. R. C.; Wilson, R. L. (Principal Investigator)

1981-01-01

Quiet-day data from MAGSAT were examined for effects which might test the validity of Maxwell's equations. Both external and toroidal fields which might represent a violation of the equations appear to exist, well within the associated errors. The external field might be associated with the ring current, and varies of a time-scale of one day or less. Its orientation is parallel to the geomagnetic dipole. The toriodal field can be confused with an orientation in error (in yaw). It the toroidal field really exists, its can be related to either ionospheric currents, or to toroidal fields in the Earth's core in accordance with Einstein's unified field theory, or to both.
Maps of interaural delay in the owl's nucleus laminaris

PubMed Central

Shah, Sahil; McColgan, Thomas; Ashida, Go; Kuokkanen, Paula T.; Brill, Sandra; Kempter, Richard; Wagner, Hermann

2015-01-01

Axons from the nucleus magnocellularis form a presynaptic map of interaural time differences (ITDs) in the nucleus laminaris (NL). These inputs generate a field potential that varies systematically with recording position and can be used to measure the map of ITDs. In the barn owl, the representation of best ITD shifts with mediolateral position in NL, so as to form continuous, smoothly overlapping maps of ITD with iso-ITD contours that are not parallel to the NL border. Frontal space (0°) is, however, represented throughout and thus overrepresented with respect to the periphery. Measurements of presynaptic conduction delay, combined with a model of delay line conduction velocity, reveal that conduction delays can account for the mediolateral shifts in the map of ITD. PMID:26224776
Tunable plasmonic dual wavelength multi/demultiplexer based on graphene sheets and cylindrical resonator

NASA Astrophysics Data System (ADS)

Asgari, Somayyeh; Granpayeh, Nosrat

2017-06-01

Two parallel graphene sheet waveguides and a graphene cylindrical resonator between them is proposed, analyzed, and simulated numerically by using the finite-difference time-domain method. One end of each graphene waveguide is the input and output port. The resonance and the prominent mid-infrared band-pass filtering effect are achieved. The transmittance spectrum is tuned by varying the radius of the graphene cylindrical resonator, the dielectric inside it, and also the chemical potential of graphene utilizing gate voltage. Simulation results are in good agreement with theoretical calculations. As an application, a multi/demultiplexer is proposed and analyzed. Our studies demonstrate that graphene based ultra-compact, nano-scale devices can be designed for optical processing and photonic integrated devices.
Dynamic metrology and data processing for precision freeform optics fabrication and testing

NASA Astrophysics Data System (ADS)

Aftab, Maham; Trumper, Isaac; Huang, Lei; Choi, Heejoo; Zhao, Wenchuan; Graves, Logan; Oh, Chang Jin; Kim, Dae Wook

2017-06-01

Dynamic metrology holds the key to overcoming several challenging limitations of conventional optical metrology, especially with regards to precision freeform optical elements. We present two dynamic metrology systems: 1) adaptive interferometric null testing; and 2) instantaneous phase shifting deflectometry, along with an overview of a gradient data processing and surface reconstruction technique. The adaptive null testing method, utilizing a deformable mirror, adopts a stochastic parallel gradient descent search algorithm in order to dynamically create a null testing condition for unknown freeform optics. The single-shot deflectometry system implemented on an iPhone uses a multiplexed display pattern to enable dynamic measurements of time-varying optical components or optics in vibration. Experimental data, measurement accuracy / precision, and data processing algorithms are discussed.
Methods for assisting recovery of damaged brain and spinal cord using arrays of X-Ray microplanar beams

DOEpatents

Dilmanian, F. Avraham; McDonald, III, John W.

2007-12-04

A method of assisting recovery of an injury site of brain or spinal cord injury includes providing a therapeutic dose of X-ray radiation to the injury site through an array of parallel microplanar beams. The dose at least temporarily removes regeneration inhibitors from the irradiated regions. Substantially unirradiated cells surviving between the microplanar beams migrate to the in-beam irradiated portion and assist in recovery. The dose may be administered in dose fractions over several sessions, separated in time, using angle-variable intersecting microbeam arrays (AVIMA). Additional doses may be administered by varying the orientation of the microplanar beams. The method may be enhanced by injecting stem cells into the injury site.
Methods for assisting recovery of damaged brain and spinal cord using arrays of X-ray microplanar beams

DOEpatents

Dilmanian, F. Avraham; McDonald, III, John W.

2007-01-02

A method of assisting recovery of an injury site of brain or spinal cord injury includes providing a therapeutic dose of X-ray radiation to the injury site through an array of parallel microplanar beams. The dose at least temporarily removes regeneration inhibitors from the irradiated regions. Substantially unirradiated cells surviving between the microplanar beams migrate to the in-beam irradiated portion and assist in recovery. The dose may be administered in dose fractions over several sessions, separated in time, using angle-variable intersecting microbeam arrays (AVIMA). Additional doses may be administered by varying the orientation of the microplanar beams. The method may be enhanced by injecting stem cells into the injury site.
Neurovision processor for designing intelligent sensors

NASA Astrophysics Data System (ADS)

Gupta, Madan M.; Knopf, George K.

1992-03-01

A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.
Highly Parallel Alternating Directions Algorithm for Time Dependent Problems

NASA Astrophysics Data System (ADS)

Ganzha, M.; Georgiev, K.; Lirkov, I.; Margenov, S.; Paprzycki, M.

2011-11-01

In our work, we consider the time dependent Stokes equation on a finite time interval and on a uniform rectangular mesh, written in terms of velocity and pressure. For this problem, a parallel algorithm based on a novel direction splitting approach is developed. Here, the pressure equation is derived from a perturbed form of the continuity equation, in which the incompressibility constraint is penalized in a negative norm induced by the direction splitting. The scheme used in the algorithm is composed of two parts: (i) velocity prediction, and (ii) pressure correction. This is a Crank-Nicolson-type two-stage time integration scheme for two and three dimensional parabolic problems in which the second-order derivative, with respect to each space variable, is treated implicitly while the other variable is made explicit at each time sub-step. In order to achieve a good parallel performance the solution of the Poison problem for the pressure correction is replaced by solving a sequence of one-dimensional second order elliptic boundary value problems in each spatial direction. The parallel code is implemented using the standard MPI functions and tested on two modern parallel computer systems. The performed numerical tests demonstrate good level of parallel efficiency and scalability of the studied direction-splitting-based algorithm.
Parallel grid library for rapid and flexible simulation development

NASA Astrophysics Data System (ADS)

Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.

2013-04-01

We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and load balancing. Solution method: The simulation grid is represented by an adjacency list (graph) with vertices stored into a hash table and edges into contiguous arrays. Message Passing Interface standard is used for parallelization. Cell data is given as a template parameter when instantiating the grid. Restrictions: Logically cartesian grid. Running time: Running time depends on the hardware, problem and the solution method. Small problems can be solved in under a minute and very large problems can take weeks. The examples and tests provided with the package take less than about one minute using default options. In the version of dccrg presented here the speed of adaptive mesh refinement is at most of the order of 106 total created cells per second. http://www.mpi-forum.org/. http://www.boost.org/. K. Devine, E. Boman, R. Heaphy, B. Hendrickson, C. Vaughan, Zoltan data management services for parallel dynamic applications, Comput. Sci. Eng. 4 (2002) 90-97. http://dx.doi.org/10.1109/5992.988653. https://gitorious.org/sfc++.
TIMEDELN: A programme for the detection and parametrization of overlapping resonances using the time-delay method

NASA Astrophysics Data System (ADS)

Little, Duncan A.; Tennyson, Jonathan; Plummer, Martin; Noble, Clifford J.; Sunderland, Andrew G.

2017-06-01

TIMEDELN implements the time-delay method of determining resonance parameters from the characteristic Lorentzian form displayed by the largest eigenvalues of the time-delay matrix. TIMEDELN constructs the time-delay matrix from input K-matrices and analyses its eigenvalues. This new version implements multi-resonance fitting and may be run serially or as a high performance parallel code with three levels of parallelism. TIMEDELN takes K-matrices from a scattering calculation, either read from a file or calculated on a dynamically adjusted grid, and calculates the time-delay matrix. This is then diagonalized, with the largest eigenvalue representing the longest time-delay experienced by the scattering particle. A resonance shows up as a characteristic Lorentzian form in the time-delay: the programme searches the time-delay eigenvalues for maxima and traces resonances when they pass through different eigenvalues, separating overlapping resonances. It also performs the fitting of the calculated data to the Lorentzian form and outputs resonance positions and widths. Any remaining overlapping resonances can be fitted jointly. The branching ratios of decay into the open channels can also be found. The programme may be run serially or in parallel with three levels of parallelism. The parallel code modules are abstracted from the main physics code and can be used independently.

Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov; Weare, Jonathan Q., E-mail: weare@uchicago.edu; Weare, John H., E-mail: jweare@ucsd.edu

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for themore » trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H{sub 2}O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.« less
Techno-economic feasibility of waste biorefinery: Using slaughtering waste streams as starting material for biopolyester production.

PubMed

Shahzad, Khurram; Narodoslawsky, Michael; Sagir, Muhammad; Ali, Nadeem; Ali, Shahid; Rashid, Muhammad Imtiaz; Ismail, Iqbal Mohammad Ibrahim; Koller, Martin

2017-09-01

The utilization of industrial waste streams as input materials for bio-mediated production processes constitutes a current R&D objective not only to reduce process costs at the input side but in parallel, to minimize hazardous environmental emissions. In this context, the EU-funded project ANIMPOL elaborated a process for the production of polyhydroxyalkanoate (PHA) biopolymers starting from diverse waste streams of the animal processing industry. This article provides a detailed economic analysis of PHA production from this waste biorefinery concept, encompassing the utilization of low-quality biodiesel, offal material and meat and bone meal (MBM). Techno-economic analysis reveals that PHA production cost varies from 1.41 €/kg to 1.64 €/kg when considering offal on the one hand as waste, or, on the other hand, accounting its market price, while calculating with fixed costs for the co-products biodiesel (0.97 €/L) and MBM (350 €/t), respectively. The effect of fluctuating market prices for offal materials, biodiesel, and MBM on the final PHA production cost as well as the investment payback time have been evaluated. Depending on the current market situation, the calculated investment payback time varies from 3.25 to 4.5years. Copyright © 2017 Elsevier Ltd. All rights reserved.
Energization and Transport in 3D Kinetic Simulations of MMS Magnetopause Reconnection Site Encounters with Varying Guide Fields

NASA Astrophysics Data System (ADS)

Le, A.; Daughton, W. S.; Ohia, O.; Chen, L. J.; Liu, Y. H.

2017-12-01

We present 3D fully kinetic simulations of asymmetric reconnection with plasma parameters matching MMS magnetopause diffusion region crossings with varying guide fields of 0.1 [Burch et al., Science (2016)], 0.4 [Chen et al. JGR (2017)], and 1 [Burch and Phan, GRL (2016] of the reconnecting sheath field. Strong diamagnetic drifts across the magnetopause current sheet drive lower-hybrid drift instabilities (LHDI) over a range of wavelengths [Daughton, PoP (2003); Roytershteyn et al., PRL (2012)] that develop into a turbulent state. Magnetic field tracing diagnostics are employed to characterize the turbulent magnetic geometry and to evaluate the global reconnection rate. The contributions to Ohm's law are evaluated field line by field line, including time-averaged diagnostics that allow the quantification of anomalous resistivity and viscosity. We examine how fluctuating electric fields and chaotic magnetic field lines contribute to particle mixing across the separatrix, and we characterize the accelerated electron distributions that form under varying magnetic shear or guide field. The LHDI turbulence is found to strongly enhance transport and parallel electron heating in 3D compared to 2D, particularly along the magnetospheric separatrix [Le et al., GRL (2017)]. The PIC simulation results are compared to MMS observations.
Suppressing Thermal Energy Drift in the LLNL Flash X-Ray Accelerator Using Linear Disk Resistor Stacks

DTIC Science & Technology

2011-06-01

induction accelerator with a voltage output of 18MeV at a current of 3kA. The electron beam is focused onto a tantalum target to produce X-rays. The... capacitors in each bank, half of which are charged in parallel positively, and the other half are negatively charged in parallel. The charge voltage can...be varied from ±30kV to ±40kV. The Marx capacitors are fired in series into the Blumleins with up to 400kV 2µS output. Figure 1 FXR Pulsed Power
Robust control of a parallel hybrid drivetrain with a CVT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mayer, T.; Schroeder, D.

1996-09-01

In this paper the design of a robust control system for a parallel hybrid drivetrain is presented. The drivetrain is based on a continuously variable transmission (CVT) and is therefore a highly nonlinear multiple-input-multiple-output system (MIMO-System). Input-Output-Linearization offers the possibility of linearizing and of decoupling the system. Since for example the vehicle mass varies with the load and the efficiency of the gearbox depends strongly on the actual working point, an exact linearization of the plant will mostly fail. Therefore a robust control algorithm based on sliding mode is used to control the drivetrain.
Evaluation of Proteus as a Tool for the Rapid Development of Models of Hydrologic Systems

NASA Astrophysics Data System (ADS)

Weigand, T. M.; Farthing, M. W.; Kees, C. E.; Miller, C. T.

2013-12-01

Models of modern hydrologic systems can be complex and involve a variety of operators with varying character. The goal is to implement approximations of such models that are both efficient for the developer and computationally efficient, which is a set of naturally competing objectives. Proteus is a Python-based toolbox that supports prototyping of model formulations as well as a wide variety of modern numerical methods and parallel computing. We used Proteus to develop numerical approximations for three models: Richards' equation, a brine flow model derived using the Thermodynamically Constrained Averaging Theory (TCAT), and a multiphase TCAT-based tumor growth model. For Richards' equation, we investigated discontinuous Galerkin solutions with higher order time integration based on the backward difference formulas. The TCAT brine flow model was implemented using Proteus and a variety of numerical methods were compared to hand coded solutions. Finally, an existing tumor growth model was implemented in Proteus to introduce more advanced numerics and allow the code to be run in parallel. From these three example models, Proteus was found to be an attractive open-source option for rapidly developing high quality code for solving existing and evolving computational science models.
Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments.

PubMed

Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke E

2018-03-01

Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode, to explain the maximum variance of the data. Functional PARAFAC permits the entities in different modes to be smooth functions or curves, varying over a continuum, rather than a collection of unconnected responses. The existing functional PARAFAC methods handle functions of a one-dimensional argument (e.g., time) only. In this paper, we propose a new extension of functional PARAFAC for handling three-way data whose responses are sequenced along both a two-dimensional domain (e.g., a plane with x- and y-axis coordinates) and a one-dimensional argument. Technically, the proposed method combines PARAFAC with basis function expansion approximations, using a set of piecewise quadratic finite element basis functions for estimating two-dimensional smooth functions and a set of one-dimensional basis functions for estimating one-dimensional smooth functions. In a simulation study, the proposed method appeared to outperform the conventional PARAFAC. We apply the method to EEG data to demonstrate its empirical usefulness.
A Two-dimensional Version of the Niblett-Bostick Transformation for Magnetotelluric Interpretations

NASA Astrophysics Data System (ADS)

Esparza, F.

2005-05-01

An imaging technique for two-dimensional magnetotelluric interpretations is developed following the well known Niblett-Bostick transformation for one-dimensional profiles. The algorithm uses a Hopfield artificial neural network to process series and parallel magnetotelluric impedances along with their analytical influence functions. The adaptive, weighted average approximation preserves part of the nonlinearity of the original problem. No initial model in the usual sense is required for the recovery of a functional model. Rather, the built-in relationship between model and data considers automatically, all at the same time, many half spaces whose electrical conductivities vary according to the data. The use of series and parallel impedances, a self-contained pair of invariants of the impedance tensor, avoids the need to decide on best angles of rotation for TE and TM separations. Field data from a given profile can thus be fed directly into the algorithm without much processing. The solutions offered by the Hopfield neural network correspond to spatial averages computed through rectangular windows that can be chosen at will. Applications of the algorithm to simple synthetic models and to the COPROD2 data set illustrate the performance of the approximation.
Coil Compression for Accelerated Imaging with Cartesian Sampling

PubMed Central

Zhang, Tao; Pauly, John M.; Vasanawala, Shreyas S.; Lustig, Michael

2012-01-01

MRI using receiver arrays with many coil elements can provide high signal-to-noise ratio and increase parallel imaging acceleration. At the same time, the growing number of elements results in larger datasets and more computation in the reconstruction. This is of particular concern in 3D acquisitions and in iterative reconstructions. Coil compression algorithms are effective in mitigating this problem by compressing data from many channels into fewer virtual coils. In Cartesian sampling there often are fully sampled k-space dimensions. In this work, a new coil compression technique for Cartesian sampling is presented that exploits the spatially varying coil sensitivities in these non-subsampled dimensions for better compression and computation reduction. Instead of directly compressing in k-space, coil compression is performed separately for each spatial location along the fully-sampled directions, followed by an additional alignment process that guarantees the smoothness of the virtual coil sensitivities. This important step provides compatibility with autocalibrating parallel imaging techniques. Its performance is not susceptible to artifacts caused by a tight imaging fieldof-view. High quality compression of in-vivo 3D data from a 32 channel pediatric coil into 6 virtual coils is demonstrated. PMID:22488589
Engineered plant biomass feedstock particles

DOEpatents

Dooley, James H [Federal Way, WA; Lanning, David N [Federal Way, WA; Broderick, Thomas F [Lake Forest Park, WA

2012-04-17

A new class of plant biomass feedstock particles characterized by consistent piece size and shape uniformity, high skeletal surface area, and good flow properties. The particles of plant biomass material having fibers aligned in a grain are characterized by a length dimension (L) aligned substantially parallel to the grain and defining a substantially uniform distance along the grain, a width dimension (W) normal to L and aligned cross grain, and a height dimension (H) normal to W and L. In particular, the L.times.H dimensions define a pair of substantially parallel side surfaces characterized by substantially intact longitudinally arrayed fibers, the W.times.H dimensions define a pair of substantially parallel end surfaces characterized by crosscut fibers and end checking between fibers, and the L.times.W dimensions define a pair of substantially parallel top and bottom surfaces. The L.times.W surfaces of particles with L/H dimension ratios of 4:1 or less are further elaborated by surface checking between longitudinally arrayed fibers. The length dimension L is preferably aligned within 30.degree. parallel to the grain, and more preferably within 10.degree. parallel to the grain. The plant biomass material is preferably selected from among wood, agricultural crop residues, plantation grasses, hemp, bagasse, and bamboo.
Engineered plant biomass particles coated with biological agents

DOEpatents

Dooley, James H.; Lanning, David N.

2014-06-24

Plant biomass particles coated with a biological agent such as a bacterium or seed, characterized by a length dimension (L) aligned substantially parallel to a grain direction and defining a substantially uniform distance along the grain, a width dimension (W) normal to L and aligned cross grain, and a height dimension (H) normal to W and L. In particular, the L.times.H dimensions define a pair of substantially parallel side surfaces characterized by substantially intact longitudinally arrayed fibers, the W.times.H dimensions define a pair of substantially parallel end surfaces characterized by crosscut fibers and end checking between fibers, and the L.times.W dimensions define a pair of substantially parallel top and bottom surfaces.
DNA looping by FokI: the impact of synapse geometry on loop topology at varied site orientations

PubMed Central

Rusling, David A.; Laurens, Niels; Pernstich, Christian; Wuite, Gijs J. L.; Halford, Stephen E.

2012-01-01

Most restriction endonucleases, including FokI, interact with two copies of their recognition sequence before cutting DNA. On DNA with two sites they act in cis looping out the intervening DNA. While many restriction enzymes operate symmetrically at palindromic sites, FokI acts asymmetrically at a non-palindromic site. The directionality of its sequence means that two FokI sites can be bridged in either parallel or anti-parallel alignments. Here we show by biochemical and single-molecule biophysical methods that FokI aligns two recognition sites on separate DNA molecules in parallel and that the parallel arrangement holds for sites in the same DNA regardless of whether they are in inverted or repeated orientations. The parallel arrangement dictates the topology of the loop trapped between sites in cis: the loop from inverted sites has a simple 180° bend, while that with repeated sites has a convoluted 360° turn. The ability of FokI to act at asymmetric sites thus enabled us to identify the synapse geometry for sites in trans and in cis, which in turn revealed the relationship between synapse geometry and loop topology. PMID:22362745
Influence of Segmentation of Ring-Shaped NdFeB Magnets with Parallel Magnetization on Cylindrical Actuators

PubMed Central

Eckert, Paulo Roberto; Goltz, Evandro Claiton; Filho, Aly Ferreira Flores

2014-01-01

This work analyses the effects of segmentation followed by parallel magnetization of ring-shaped NdFeB permanent magnets used in slotless cylindrical linear actuators. The main purpose of the work is to evaluate the effects of that segmentation on the performance of the actuator and to present a general overview of the influence of parallel magnetization by varying the number of segments and comparing the results with ideal radially magnetized rings. The analysis is first performed by modelling mathematically the radial and circumferential components of magnetization for both radial and parallel magnetizations, followed by an analysis carried out by means of the 3D finite element method. Results obtained from the models are validated by measuring radial and tangential components of magnetic flux distribution in the air gap on a prototype which employs magnet rings with eight segments each with parallel magnetization. The axial force produced by the actuator was also measured and compared with the results obtained from numerical models. Although this analysis focused on a specific topology of cylindrical actuator, the observed effects on the topology could be extended to others in which surface-mounted permanent magnets are employed, including rotating electrical machines. PMID:25051032
Influence of segmentation of ring-shaped NdFeB magnets with parallel magnetization on cylindrical actuators.

PubMed

Eckert, Paulo Roberto; Goltz, Evandro Claiton; Flores Filho, Aly Ferreira

2014-07-21

This work analyses the effects of segmentation followed by parallel magnetization of ring-shaped NdFeB permanent magnets used in slotless cylindrical linear actuators. The main purpose of the work is to evaluate the effects of that segmentation on the performance of the actuator and to present a general overview of the influence of parallel magnetization by varying the number of segments and comparing the results with ideal radially magnetized rings. The analysis is first performed by modelling mathematically the radial and circumferential components of magnetization for both radial and parallel magnetizations, followed by an analysis carried out by means of the 3D finite element method. Results obtained from the models are validated by measuring radial and tangential components of magnetic flux distribution in the air gap on a prototype which employs magnet rings with eight segments each with parallel magnetization. The axial force produced by the actuator was also measured and compared with the results obtained from numerical models. Although this analysis focused on a specific topology of cylindrical actuator, the observed effects on the topology could be extended to others in which surface-mounted permanent magnets are employed, including rotating electrical machines.
A novel mobile monitoring approach to characterize spatial and temporal variation in traffic-related air pollutants in an urban community

NASA Astrophysics Data System (ADS)

Yu, Chang Ho; Fan, Zhihua; Lioy, Paul J.; Baptista, Ana; Greenberg, Molly; Laumbach, Robert J.

2016-09-01

Air concentrations of traffic-related air pollutants (TRAPs) vary in space and time within urban communities, presenting challenges for estimating human exposure and potential health effects. Conventional stationary monitoring stations/networks cannot effectively capture spatial characteristics. Alternatively, mobile monitoring approaches became popular to measure TRAPs along roadways or roadsides. However, these linear mobile monitoring approaches cannot thoroughly distinguish spatial variability from temporal variations in monitored TRAP concentrations. In this study, we used a novel mobile monitoring approach to simultaneously characterize spatial/temporal variations in roadside concentrations of TRAPs in urban settings. We evaluated the effectiveness of this mobile monitoring approach by performing concurrent measurements along two parallel paths perpendicular to a major roadway and/or along heavily trafficked roads at very narrow scale (one block away each other) within short time period (<30 min) in an urban community. Based on traffic and particulate matter (PM) source information, we selected 4 neighborhoods to study. The sampling activities utilized real-time monitors, including battery-operated PM2.5 monitor (SidePak), condensation particle counter (CPC 3007), black carbon (BC) monitor (Micro-Aethalometer), carbon monoxide (CO) monitor (Langan T15), and portable temperature/humidity data logger (HOBO U12), and a GPS-based tracker (Trackstick). Sampling was conducted for ∼3 h in the morning (7:30-10:30) in 7 separate days in March/April and 6 days in May/June 2012. Two simultaneous samplings were made at 5 spatially-distributed locations on parallel roads, usually distant one block each other, in each neighborhood. The 5-min averaged BC concentrations (AVG ± SD, [range]) were 2.53 ± 2.47 [0.09-16.3] μg/m3, particle number concentrations (PNC) were 33,330 ± 23,451 [2512-159,130] particles/cm3, PM2.5 mass concentrations were 8.87 ± 7.65 [0.27-46.5] μg/m3, and CO concentrations were 1.22 ± 0.60 [0.22-6.29] ppm in the community. The traffic-related air pollutants, BC and PNC, but not PM2.5 or CO, varied spatially depending on proximity to local stationary/mobile sources. Seasonal differences were observed for all four TRAPs, significantly higher in colder months than in warmer months. The coefficients of variation (CVs) in concurrent measurements from two parallel routes were calculated around 0.21 ± 0.17, and variations were attributed by meteorological variation (25%), temporal variability (19%), concentration level (6%), and spatial variability (2%), respectively. Overall study findings suggest this mobile monitoring approach could effectively capture and distinguish spatial/temporal characteristics in TRAP concentrations for communities impacted by heavy motor vehicle traffic and mixed urban air pollution sources.
Scalable Preconditioners for Structure Preserving Discretizations of Maxwell Equations in First Order Form

DOE PAGES

Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.

2018-05-01

Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
Scalable Preconditioners for Structure Preserving Discretizations of Maxwell Equations in First Order Form

DOE Office of Scientific and Technical Information (OSTI.GOV)

Phillips, Edward Geoffrey; Shadid, John N.; Cyr, Eric C.

Here, we report multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physicsmore » compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Lastly, results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.« less
Real-time computation of parameter fitting and image reconstruction using graphical processing units

NASA Astrophysics Data System (ADS)

Locans, Uldis; Adelmann, Andreas; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Günther; Wang, Qiulin

2017-06-01

In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup. During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version were more than × 40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.
Redundant binary number representation for an inherently parallel arithmetic on optical computers.

PubMed

De Biase, G A; Massini, A

1993-02-10

A simple redundant binary number representation suitable for digital-optical computers is presented. By means of this representation it is possible to build an arithmetic with carry-free parallel algebraic sums carried out in constant time and parallel multiplication in log N time. This redundant number representation naturally fits the 2's complement binary number system and permits the construction of inherently parallel arithmetic units that are used in various optical technologies. Some properties of this number representation and several examples of computation are presented.
A LONGITUDINAL PERSPECTIVE ON THE CONUNDRUM OF CENTRAL ARTERIAL STIFFNESS, BLOOD PRESSURE AND AGING

PubMed Central

Scuteri, Angelo; Morrell, Christopher H.; Orru, Marco; Strait, James B.; Tarasov, Kirill V.; AlGhatrif, Majd; Pina Ferreli, Liana Anna; Loi, Francesco; Pilia, Maria Grazia; Delitala, Alessandro; Spurgeon, Harold; Najjar, Samer S.; Lakatta, Edward G.

2014-01-01

The age-associated increase in arterial stiffness has long been considered to parallel or to cause the age-associated increase in blood pressure (BP). Yet, the rates at which pulse wave velocity (PWV), a measure of arterial stiffness, and BP trajectories change over time within individuals who differ by age and sex have not been assessed and compared. This study determined the evolution of BP and aortic PWV trajectories over a 9.4-year follow-up in over 4,000 community dwelling men and women of 20–100 years of age at entry into the SardiNIA Study. Linear mixed effects model analyses revealed that PWV accelerates with time over the observation period, at about the same rate over the entire age range in both men and women. In men, the longitudinal rate at which BP changed over time, however, did not generally parallel that of PWV acceleration: at ages above 40 years the rates of change in SBP and PP increase plateaued and then declined so that SBP, itself, also declined at older ages while PP plateaued. In women, SBP, DBP and MBP increased at constant rates across all ages, producing an increasing rate of increase in PP. Therefore, increased aortic stiffness is implicated in the age-associated increase in SBP and PP. These findings indicate that PWV is not a surrogate for BP and that arterial properties other than arterial wall stiffness that vary by age and sex also modulate the BP trajectories during aging and lead to the dissociation of PWV, PP and SBP trajectories in men. PMID:25225210

Turbomachinery CFD on parallel computers

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.

1992-01-01

The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.
Information criteria for quantifying loss of reversibility in parallelized KMC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gourgoulias, Konstantinos, E-mail: gourgoul@math.umass.edu; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Rey-Bellet, Luc, E-mail: luc@math.umass.edu

Parallel Kinetic Monte Carlo (KMC) is a potent tool to simulate stochastic particle systems efficiently. However, despite literature on quantifying domain decomposition errors of the particle system for this class of algorithms in the short and in the long time regime, no study yet explores and quantifies the loss of time-reversibility in Parallel KMC. Inspired by concepts from non-equilibrium statistical mechanics, we propose the entropy production per unit time, or entropy production rate, given in terms of an observable and a corresponding estimator, as a metric that quantifies the loss of reversibility. Typically, this is a quantity that cannot bemore » computed explicitly for Parallel KMC, which is why we develop a posteriori estimators that have good scaling properties with respect to the size of the system. Through these estimators, we can connect the different parameters of the scheme, such as the communication time step of the parallelization, the choice of the domain decomposition, and the computational schedule, with its performance in controlling the loss of reversibility. From this point of view, the entropy production rate can be seen both as an information criterion to compare the reversibility of different parallel schemes and as a tool to diagnose reversibility issues with a particular scheme. As a demonstration, we use Sandia Lab's SPPARKS software to compare different parallelization schemes and different domain (lattice) decompositions.« less
Information criteria for quantifying loss of reversibility in parallelized KMC

NASA Astrophysics Data System (ADS)

Gourgoulias, Konstantinos; Katsoulakis, Markos A.; Rey-Bellet, Luc

2017-01-01

Parallel Kinetic Monte Carlo (KMC) is a potent tool to simulate stochastic particle systems efficiently. However, despite literature on quantifying domain decomposition errors of the particle system for this class of algorithms in the short and in the long time regime, no study yet explores and quantifies the loss of time-reversibility in Parallel KMC. Inspired by concepts from non-equilibrium statistical mechanics, we propose the entropy production per unit time, or entropy production rate, given in terms of an observable and a corresponding estimator, as a metric that quantifies the loss of reversibility. Typically, this is a quantity that cannot be computed explicitly for Parallel KMC, which is why we develop a posteriori estimators that have good scaling properties with respect to the size of the system. Through these estimators, we can connect the different parameters of the scheme, such as the communication time step of the parallelization, the choice of the domain decomposition, and the computational schedule, with its performance in controlling the loss of reversibility. From this point of view, the entropy production rate can be seen both as an information criterion to compare the reversibility of different parallel schemes and as a tool to diagnose reversibility issues with a particular scheme. As a demonstration, we use Sandia Lab's SPPARKS software to compare different parallelization schemes and different domain (lattice) decompositions.
Experiments on shells under base excitation

NASA Astrophysics Data System (ADS)

Pellicano, Francesco; Barbieri, Marco; Zippo, Antonio; Strozzi, Matteo

2016-05-01

The aim of the present paper is a deep experimental investigation of the nonlinear dynamics of circular cylindrical shells. The specific problem regards the response of circular cylindrical shells subjected to base excitation. The shells are mounted on a shaking table that furnishes a vertical vibration parallel to the cylinder axis; a heavy rigid disk is mounted on the top of the shells. The base vibration induces a rigid body motion, which mainly causes huge inertia forces exerted by the top disk to the shell. In-plane stresses due to the aforementioned inertias give rise to impressively large vibration on the shell. An extremely violent dynamic phenomenon suddenly appears as the excitation frequency varies up and down close to the linear resonant frequency of the first axisymmetric mode. The dynamics are deeply investigated by varying excitation level and frequency. Moreover, in order to generalise the investigation, two different geometries are analysed. The paper furnishes a complete dynamic scenario by means of: (i) amplitude frequency diagrams, (ii) bifurcation diagrams, (iii) time histories and spectra, (iv) phase portraits and Poincaré maps. It is to be stressed that all the results presented here are experimental.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations.

PubMed

Bylaska, Eric J; Weare, Jonathan Q; Weare, John H

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
Trapezius muscle activity increases during near work activity regardless of accommodation/vergence demand level.

PubMed

Richter, H O; Zetterberg, C; Forsman, M

2015-07-01

To investigate if trapezius muscle activity increases over time during visually demanding near work. The vision task consisted of sustained focusing on a contrast-varying black and white Gabor grating. Sixty-six participants with a median age of 38 (range 19-47) fixated the grating from a distance of 65 cm (1.5 D) during four counterbalanced 7-min periods: binocularly through -3.5 D lenses, and monocularly through -3.5 D, 0 D and +3.5 D. Accommodation, heart rate variability and trapezius muscle activity were recorded in parallel. General estimating equation analyses showed that trapezius muscle activity increased significantly over time in all four lens conditions. A concurrent effect of accommodation response on trapezius muscle activity was observed with the minus lenses irrespective of whether incongruence between accommodation and convergence was present or not. Trapezius muscle activity increased significantly over time during the near work task. The increase in muscle activity over time may be caused by an increased need of mental effort and visual attention to maintain performance during the visual tasks to counteract mental fatigue.
Perceptual Real-Time 2D-to-3D Conversion Using Cue Fusion.

PubMed

Leimkuhler, Thomas; Kellnhofer, Petr; Ritschel, Tobias; Myszkowski, Karol; Seidel, Hans-Peter

2018-06-01

We propose a system to infer binocular disparity from a monocular video stream in real-time. Different from classic reconstruction of physical depth in computer vision, we compute perceptually plausible disparity, that is numerically inaccurate, but results in a very similar overall depth impression with plausible overall layout, sharp edges, fine details and agreement between luminance and disparity. We use several simple monocular cues to estimate disparity maps and confidence maps of low spatial and temporal resolution in real-time. These are complemented by spatially-varying, appearance-dependent and class-specific disparity prior maps, learned from example stereo images. Scene classification selects this prior at runtime. Fusion of prior and cues is done by means of robust MAP inference on a dense spatio-temporal conditional random field with high spatial and temporal resolution. Using normal distributions allows this in constant-time, parallel per-pixel work. We compare our approach to previous 2D-to-3D conversion systems in terms of different metrics, as well as a user study and validate our notion of perceptually plausible disparity.
Simulation Exploration through Immersive Parallel Planes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brunhart-Lupo, Nicholas J; Bush, Brian W; Gruchalla, Kenny M

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less
Simulation Exploration through Immersive Parallel Planes: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations

NASA Technical Reports Server (NTRS)

Fijany, Amir

1993-01-01

In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using amore » variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl+4H2O AIMD simulation at the MP2 level. The maximum speedup obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl+4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. By using these algorithms we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 seconds per time step to 6.9 seconds per time step.« less
Parallel MR imaging: a user's guide.

PubMed

Glockner, James F; Hu, Houchun H; Stanley, David W; Angelos, Lisa; King, Kevin

2005-01-01

Parallel imaging is a recently developed family of techniques that take advantage of the spatial information inherent in phased-array radiofrequency coils to reduce acquisition times in magnetic resonance imaging. In parallel imaging, the number of sampled k-space lines is reduced, often by a factor of two or greater, thereby significantly shortening the acquisition time. Parallel imaging techniques have only recently become commercially available, and the wide range of clinical applications is just beginning to be explored. The potential clinical applications primarily involve reduction in acquisition time, improved spatial resolution, or a combination of the two. Improvements in image quality can be achieved by reducing the echo train lengths of fast spin-echo and single-shot fast spin-echo sequences. Parallel imaging is particularly attractive for cardiac and vascular applications and will likely prove valuable as 3-T body and cardiovascular imaging becomes part of standard clinical practice. Limitations of parallel imaging include reduced signal-to-noise ratio and reconstruction artifacts. It is important to consider these limitations when deciding when to use these techniques. (c) RSNA, 2005.
Comparison between four dissimilar solar panel configurations

NASA Astrophysics Data System (ADS)

Suleiman, K.; Ali, U. A.; Yusuf, Ibrahim; Koko, A. D.; Bala, S. I.

2017-12-01

Several studies on photovoltaic systems focused on how it operates and energy required in operating it. Little attention is paid on its configurations, modeling of mean time to system failure, availability, cost benefit and comparisons of parallel and series-parallel designs. In this research work, four system configurations were studied. Configuration I consists of two sub-components arranged in parallel with 24 V each, configuration II consists of four sub-components arranged logically in parallel with 12 V each, configuration III consists of four sub-components arranged in series-parallel with 8 V each, and configuration IV has six sub-components with 6 V each arranged in series-parallel. Comparative analysis was made using Chapman Kolmogorov's method. The derivation for explicit expression of mean time to system failure, steady state availability and cost benefit analysis were performed, based on the comparison. Ranking method was used to determine the optimal configuration of the systems. The results of analytical and numerical solutions of system availability and mean time to system failure were determined and it was found that configuration I is the optimal configuration.
Bimodal and multimodal plant biomass particle mixtures

DOEpatents

Dooley, James H.

2013-07-09

An industrial feedstock of plant biomass particles having fibers aligned in a grain, wherein the particles are individually characterized by a length dimension (L) aligned substantially parallel to the grain, a width dimension (W) normal to L and aligned cross grain, and a height dimension (H) normal to W and L, wherein the L.times.H dimensions define a pair of substantially parallel side surfaces characterized by substantially intact longitudinally arrayed fibers, the W.times.H dimensions define a pair of substantially parallel end surfaces characterized by crosscut fibers and end checking between fibers, and the L.times.W dimensions define a pair of substantially parallel top and bottom surfaces, and wherein the particles in the feedstock are collectively characterized by having a bimodal or multimodal size distribution.
Engineered plant biomass particles coated with bioactive agents

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dooley, James H; Lanning, David N

Plant biomass particles coated with a bioactive agent such as a fertilizer or pesticide, characterized by a length dimension (L) aligned substantially parallel to a grain direction and defining a substantially uniform distance along the grain, a width dimension (W) normal to L and aligned cross grain, and a height dimension (H) normal to W and L. In particular, the L.times.H dimensions define a pair of substantially parallel side surfaces characterized by substantially intact longitudinally arrayed fibers, the W.times.H dimensions define a pair of substantially parallel end surfaces characterized by crosscut fibers and end checking between fibers, and the L.times.Wmore » dimensions define a pair of substantially parallel top and bottom surfaces.« less
Parallel Monte Carlo transport modeling in the context of a time-dependent, three-dimensional multi-physics code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Procassini, R.J.

1997-12-31

The fine-scale, multi-space resolution that is envisioned for accurate simulations of complex weapons systems in three spatial dimensions implies flop-rate and memory-storage requirements that will only be obtained in the near future through the use of parallel computational techniques. Since the Monte Carlo transport models in these simulations usually stress both of these computational resources, they are prime candidates for parallelization. The MONACO Monte Carlo transport package, which is currently under development at LLNL, will utilize two types of parallelism within the context of a multi-physics design code: decomposition of the spatial domain across processors (spatial parallelism) and distribution ofmore » particles in a given spatial subdomain across additional processors (particle parallelism). This implementation of the package will utilize explicit data communication between domains (message passing). Such a parallel implementation of a Monte Carlo transport model will result in non-deterministic communication patterns. The communication of particles between subdomains during a Monte Carlo time step may require a significant level of effort to achieve a high parallel efficiency.« less
Constituent order and semantic parallelism in online comprehension: eye-tracking evidence from German.

PubMed

Knoeferle, Pia; Crocker, Matthew W

2009-12-01

Reading times for the second conjunct of and-coordinated clauses are faster when the second conjunct parallels the first conjunct in its syntactic or semantic (animacy) structure than when its structure differs (Frazier, Munn, & Clifton, 2000; Frazier, Taft, Roeper, & Clifton, 1984). What remains unclear, however, is the time course of parallelism effects, their scope, and the kinds of linguistic information to which they are sensitive. Findings from the first two eye-tracking experiments revealed incremental constituent order parallelism across the board-both during structural disambiguation (Experiment 1) and in sentences with unambiguously case-marked constituent order (Experiment 2), as well as for both marked and unmarked constituent orders (Experiments 1 and 2). Findings from Experiment 3 revealed effects of both constituent order and subtle semantic (noun phrase similarity) parallelism. Together our findings provide evidence for an across-the-board account of parallelism for processing and-coordinated clauses, in which both constituent order and semantic aspects of representations contribute towards incremental parallelism effects. We discuss our findings in the context of existing findings on parallelism and priming, as well as mechanisms of sentence processing.
Run-time parallelization and scheduling of loops

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

1991-01-01

Run-time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.

On the parallel solution of parabolic equations

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Youcef

1989-01-01

Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
An asymptotic induced numerical method for the convection-diffusion-reaction equation

NASA Technical Reports Server (NTRS)

Scroggs, Jeffrey S.; Sorensen, Danny C.

1988-01-01

A parallel algorithm for the efficient solution of a time dependent reaction convection diffusion equation with small parameter on the diffusion term is presented. The method is based on a domain decomposition that is dictated by singular perturbation analysis. The analysis is used to determine regions where certain reduced equations may be solved in place of the full equation. Parallelism is evident at two levels. Domain decomposition provides parallelism at the highest level, and within each domain there is ample opportunity to exploit parallelism. Run time results demonstrate the viability of the method.
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

Turner, Mark G.; Topp, David A.

1998-01-01

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
Electropneumatic rheostat regulates high current

NASA Technical Reports Server (NTRS)

Haacker, J. F.; Jedlicka, J. R.; Wagoner, C. B.

1965-01-01

Electropneumatic rheostat maintains a constant direct current in each of several high-power parallel loads, of variable resistance, across a single source. It provides current regulation at any preset value by dissipating the proper amount of energy thermally, and uses a column of mercury to vary the effective length of a resistance element.
Magnetic arrays

DOEpatents

Trumper, David L.; Kim, Won-jong; Williams, Mark E.

1997-05-20

Electromagnet arrays which can provide selected field patterns in either two or three dimensions, and in particular, which can provide single-sided field patterns in two or three dimensions. These features are achieved by providing arrays which have current densities that vary in the windings both parallel to the array and in the direction of array thickness.
A parallel time integrator for noisy nonlinear oscillatory systems

NASA Astrophysics Data System (ADS)

Subber, Waad; Sarkar, Abhijit

2018-06-01

In this paper, we adapt a parallel time integration scheme to track the trajectories of noisy non-linear dynamical systems. Specifically, we formulate a parallel algorithm to generate the sample path of nonlinear oscillator defined by stochastic differential equations (SDEs) using the so-called parareal method for ordinary differential equations (ODEs). The presence of Wiener process in SDEs causes difficulties in the direct application of any numerical integration techniques of ODEs including the parareal algorithm. The parallel implementation of the algorithm involves two SDEs solvers, namely a fine-level scheme to integrate the system in parallel and a coarse-level scheme to generate and correct the required initial conditions to start the fine-level integrators. For the numerical illustration, a randomly excited Duffing oscillator is investigated in order to study the performance of the stochastic parallel algorithm with respect to a range of system parameters. The distributed implementation of the algorithm exploits Massage Passing Interface (MPI).
A new parallel-vector finite element analysis software on distributed-memory computers

NASA Technical Reports Server (NTRS)

Qin, Jiangning; Nguyen, Duc T.

1993-01-01

A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Parallel transformation of K-SVD solar image denoising algorithm

NASA Astrophysics Data System (ADS)

Liang, Youwen; Tian, Yu; Li, Mei

2017-02-01

The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.

PubMed

Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar

2014-01-01

Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.
National Combustion Code: Parallel Implementation and Performance

NASA Technical Reports Server (NTRS)

Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

2000-01-01

The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier

1992-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

PubMed Central

Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

2014-01-01

Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

PubMed

Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

2014-07-01

Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Small Arms - Hand and Shoulder Weapons and Machine Guns

DTIC Science & Technology

2016-06-24

temperature with a wind speed less than 8 kilometers per hour (km/hr) (5 miles per hour (mph)) with no sunlight on the barrel or receiver. e...the aiming of the mount/weapon system. d. Meteorological Conditions. (1) Ensure that the velocity of the transverse wind is no greater than 8...km/hr (5 mph) or varies by more than 4 km/hr (2.5 mph); wind parallel to the LOF should not exceed 16 km/hr (10 mph) or vary by more than 8 km/hr (5
A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

2010-01-01

Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code,more » the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.« less
Parallel line raster eliminates ambiguities in reading timing of pulses less than 500 microseconds apart

NASA Technical Reports Server (NTRS)

Horne, A. P.

1966-01-01

Parallel horizontal line raster is used for precision timing of events occurring less than 500 microseconds apart for observation of hypervelocity phenomena. The raster uses a staircase vertical deflection and eliminates ambiguities in reading timing of pulses close to the end of each line.
Comprehensive quantification of signal-to-noise ratio and g-factor for image-based and k-space-based parallel imaging reconstructions.

PubMed

Robson, Philip M; Grant, Aaron K; Madhuranthakam, Ananth J; Lattanzi, Riccardo; Sodickson, Daniel K; McKenzie, Charles A

2008-10-01

Parallel imaging reconstructions result in spatially varying noise amplification characterized by the g-factor, precluding conventional measurements of noise from the final image. A simple Monte Carlo based method is proposed for all linear image reconstruction algorithms, which allows measurement of signal-to-noise ratio and g-factor and is demonstrated for SENSE and GRAPPA reconstructions for accelerated acquisitions that have not previously been amenable to such assessment. Only a simple "prescan" measurement of noise amplitude and correlation in the phased-array receiver, and a single accelerated image acquisition are required, allowing robust assessment of signal-to-noise ratio and g-factor. The "pseudo multiple replica" method has been rigorously validated in phantoms and in vivo, showing excellent agreement with true multiple replica and analytical methods. This method is universally applicable to the parallel imaging reconstruction techniques used in clinical applications and will allow pixel-by-pixel image noise measurements for all parallel imaging strategies, allowing quantitative comparison between arbitrary k-space trajectories, image reconstruction, or noise conditioning techniques. (c) 2008 Wiley-Liss, Inc.
The new moon illusion and the role of perspective in the perception of straight and parallel lines.

PubMed

Rogers, Brian; Naumenko, Olga

2015-01-01

In the new moon illusion, the sun does not appear to be in a direction perpendicular to the boundary between the lit and dark sides of the moon, and aircraft jet trails appear to follow curved paths across the sky. In both cases, lines that are physically straight and parallel to the horizon appear to be curved. These observations prompted us to investigate the neglected question of how we are able to judge the straightness and parallelism of extended lines. To do this, we asked observers to judge the 2-D alignment of three artificial "stars" projected onto the dome of the Saint Petersburg Planetarium that varied in both their elevation and their separation in horizontal azimuth. The results showed that observers make substantial, systematic errors, biasing their judgments away from the veridical great-circle locations and toward equal-elevation settings. These findings further demonstrate that whenever information about the distance of extended lines or isolated points is insufficient, observers tend to assume equidistance, and as a consequence, their straightness judgments are biased toward the angular separation of straight and parallel lines.
Development and Validation of a Fast, Accurate and Cost-Effective Aeroservoelastic Method on Advanced Parallel Computing Systems

NASA Technical Reports Server (NTRS)

Goodwin, Sabine A.; Raj, P.

1999-01-01

Progress to date towards the development and validation of a fast, accurate and cost-effective aeroelastic method for advanced parallel computing platforms such as the IBM SP2 and the SGI Origin 2000 is presented in this paper. The ENSAERO code, developed at the NASA-Ames Research Center has been selected for this effort. The code allows for the computation of aeroelastic responses by simultaneously integrating the Euler or Navier-Stokes equations and the modal structural equations of motion. To assess the computational performance and accuracy of the ENSAERO code, this paper reports the results of the Navier-Stokes simulations of the transonic flow over a flexible aeroelastic wing body configuration. In addition, a forced harmonic oscillation analysis in the frequency domain and an analysis in the time domain are done on a wing undergoing a rigid pitch and plunge motion. Finally, to demonstrate the ENSAERO flutter-analysis capability, aeroelastic Euler and Navier-Stokes computations on an L-1011 wind tunnel model including pylon, nacelle and empennage are underway. All computational solutions are compared with experimental data to assess the level of accuracy of ENSAERO. As the computations described above are performed, a meticulous log of computational performance in terms of wall clock time, execution speed, memory and disk storage is kept. Code scalability is also demonstrated by studying the impact of varying the number of processors on computational performance on the IBM SP2 and the Origin 2000 systems.

Inefficient conjunction search made efficient by concurrent spoken delivery of target identity.

PubMed

Reali, Florencia; Spivey, Michael J; Tyler, Melinda J; Terranova, Joseph

2006-08-01

Visual search based on a conjunction of two features typically elicits reaction times that increase linearly as a function of the number of distractors, whereas search based on a single feature is essentially unaffected by set size. These and related findings have often been interpreted as evidence of a serial search stage that follows a parallel search stage. However, a wide range of studies has been showing a form of blending of these two processes. For example, when a spoken instruction identifies the conjunction target concurrently with the visual display, the effect of set size is significantly reduced, suggesting that incremental linguistic processing of the first feature adjective and then the second feature adjective may facilitate something approximating a parallel extraction of objects during search for the target. Here, we extend these results to a variety of experimental designs. First, we replicate the result with a mixed-trials design (ruling out potential strategies associated with the blocked design of the original study). Second, in a mixed-trials experiment, the order of adjective types in the spoken query varies randomly across conditions. In a third experiment, we extend the effect to a triple-conjunction search task. A fourth (control) experiment demonstrates that these effects are not due to an efficient odd-one-out search that ignores the linguistic input. This series of experiments, along with attractor-network simulations of the phenomena, provide further evidence toward understanding linguistically mediated influences in real-time visual search processing.
Symplectic molecular dynamics simulations on specially designed parallel computers.

PubMed

Borstnik, Urban; Janezic, Dusanka

2005-01-01

We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
Single-agent parallel window search

NASA Technical Reports Server (NTRS)

Powley, Curt; Korf, Richard E.

1991-01-01

Parallel window search is applied to single-agent problems by having different processes simultaneously perform iterations of Iterative-Deepening-A(asterisk) (IDA-asterisk) on the same problem but with different cost thresholds. This approach is limited by the time to perform the goal iteration. To overcome this disadvantage, the authors consider node ordering. They discuss how global node ordering by minimum h among nodes with equal f = g + h values can reduce the time complexity of serial IDA-asterisk by reducing the time to perform the iterations prior to the goal iteration. Finally, the two ideas of parallel window search and node ordering are combined to eliminate the weaknesses of each approach while retaining the strengths. The resulting approach, called simply parallel window search, can be used to find a near-optimal solution quickly, improve the solution until it is optimal, and then finally guarantee optimality, depending on the amount of time available.
Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

NASA Technical Reports Server (NTRS)

Goldstein, David

1991-01-01

Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
Near-ridge seamount chains in the northeastern Pacific Ocean

NASA Astrophysics Data System (ADS)

Clague, David A.; Reynolds, Jennifer R.; Davis, Alicé S.

2000-07-01

High-resolution bathymetry and side-scan data of the Vance, President Jackson, and Taney near-ridge seamount chains in the northeast Pacific were collected with a hull-mounted 30-kHz sonar. The central volcanoes in each chain consist of truncated cone-shaped volcanoes with steep sides and nearly flat tops. Several areas are characterized by frequent small eruptions that result in disorganized volcanic regions with numerous small cones and volcanic ridges but no organized truncated conical structure. Several volcanoes are crosscut by ridge-parallel faults, showing that they formed within 30-40 km of the ridge axis where ridge-parallel faulting is still active. Magmas that built the volcanoes were probably transported through the crust along active ridge-parallel faults. The volcanoes range in volume from 11 to 187 km3, and most have one or more multiple craters and calderas that modify their summits and flanks. The craters (<1 km diameter) and calderas (>1 km diameter) range from small pit craters to calderas as large as 6.5×8.5 km, although most are 2-4 km across. Crosscutting relationships commonly show a sequence of calderas stepping toward the ridge axis. The calderas overlie crustal magma chambers at least as large as those that underlie Kilauea and Mauna Loa Volcanoes in Hawaii, perhaps 4-5 km in diameter and ˜1-3 km below the surface. The nearly flat tops of many of the volcanoes have remnants of centrally located summit shields, suggesting that their flat tops did not form from eruptions along circumferential ring faults but instead form by filling and overflowing of earlier large calderas. The lavas retain their primitive character by residing in such chambers for only short time periods prior to eruption. Stored magmas are withdrawn, probably as dikes intruded into the adjacent ocean crust along active ridge-parallel faults, triggering caldera collapse, or solidified before the next batch of magma is intruded into the volcano, probably 1000-10,000 years later. The chains are oriented parallel to subaxial asthenospheric flow rather than absolute or relative plate motion vectors. The subaxial asthenospheric flow model yields rates of volcanic migration of 3.4, 3.3 and 5.9 cm yr-1 for the Vance, President Jackson, and Taney Seamounts, respectively. The modeled lifespans of the individual volcanoes in the three chains vary from 75 to 95 kyr. These lifespans, coupled with the geologic observations based on the bathymetry, allow us to construct models of magma supply through time for the volcanoes in the three chains.
Temperature responsive transmitter

NASA Technical Reports Server (NTRS)

Kleinberg, Leonard L. (Inventor)

1987-01-01

A temperature responsive transmitter is provided in which frequency varies linearly with temperature. The transmitter includes two identically biased transistors connected in parallel. A capacitor, which reflects into the common bases to generate negative resistance effectively in parallel with the capacitor, is connected to the common emitters. A crystal is effectively in parallel with the capacitor and the negative resistance. Oscillations occur if the magnitude of the absolute value of the negative resistance is less than the positive resistive impedance of the capacitor and the inductance of the crystal. The crystal has a large linear temperature coefficient and a resonant frequency which is substantially less than the gain-bandwidth product of the transistors to ensure that the crystal primarily determines the frequency of oscillation. A high-Q tank circuit having an inductor and a capacitor is connected to the common collectors to increase the collector current flow which in turn enhances the radiation of the oscillator frequency by the inductor.
Miniature Trailing Edge Effector for Aerodynamic Control

NASA Technical Reports Server (NTRS)

Lee, Hak-Tae (Inventor); Bieniawski, Stefan R. (Inventor); Kroo, Ilan M. (Inventor)

2008-01-01

Improved miniature trailing edge effectors for aerodynamic control are provided. Three types of devices having aerodynamic housings integrated to the trailing edge of an aerodynamic shape are presented, which vary in details of how the control surface can move. A bucket type device has a control surface which is the back part of a C-shaped member having two arms connected by the back section. The C-shaped section is attached to a housing at the ends of the arms, and is rotatable about an axis parallel to the wing trailing edge to provide up, down and neutral states. A flip-up type device has a control surface which rotates about an axis parallel to the wing trailing edge to provide up, down, neutral and brake states. A rotating type device has a control surface which rotates about an axis parallel to the chord line to provide up, down and neutral states.
Light scattering of rectangular slot antennas: parallel magnetic vector vs perpendicular electric vector

NASA Astrophysics Data System (ADS)

Lee, Dukhyung; Kim, Dai-Sik

2016-01-01

We study light scattering off rectangular slot nano antennas on a metal film varying incident polarization and incident angle, to examine which field vector of light is more important: electric vector perpendicular to, versus magnetic vector parallel to the long axis of the rectangle. While vector Babinet’s principle would prefer magnetic field along the long axis for optimizing slot antenna function, convention and intuition most often refer to the electric field perpendicular to it. Here, we demonstrate experimentally that in accordance with vector Babinet’s principle, the incident magnetic vector parallel to the long axis is the dominant component, with the perpendicular incident electric field making a small contribution of the factor of 1/|ε|, the reciprocal of the absolute value of the dielectric constant of the metal, owing to the non-perfectness of metals at optical frequencies.
Elliptically polarizing adjustable phase insertion device

DOEpatents

Carr, Roger

1995-01-01

An insertion device for extracting polarized electromagnetic energy from a beam of particles is disclosed. The insertion device includes four linear arrays of magnets which are aligned with the particle beam. The magnetic field strength to which the particles are subjected is adjusted by altering the relative alignment of the arrays in a direction parallel to that of the particle beam. Both the energy and polarization of the extracted energy may be varied by moving the relevant arrays parallel to the beam direction. The present invention requires a substantially simpler and more economical superstructure than insertion devices in which the magnetic field strength is altered by changing the gap between arrays of magnets.
Low phase noise oscillator using two parallel connected amplifiers

NASA Technical Reports Server (NTRS)

Kleinberg, Leonard L.

1987-01-01

A high frequency oscillator is provided by connecting two amplifier circuits in parallel where each amplifier circuit provides the other amplifier circuit with the conditions necessary for oscillation. The inherent noise present in both amplifier circuits causes the quiescent current, and in turn, the generated frequency, to change. The changes in quiescent current cause the transconductance and the load impedance of each amplifier circuit to vary, and this in turn results in opposing changes in the input susceptance of each amplifier circuit. Because the changes in input susceptance oppose each other, the changes in quiescent current also oppose each other. The net result is that frequency stability is enhanced.
Acceleration techniques and their impact on arterial input function sampling: Non-accelerated versus view-sharing and compressed sensing sequences.

PubMed

Benz, Matthias R; Bongartz, Georg; Froehlich, Johannes M; Winkel, David; Boll, Daniel T; Heye, Tobias

2018-07-01

The aim was to investigate the variation of the arterial input function (AIF) within and between various DCE MRI sequences. A dynamic flow-phantom and steady signal reference were scanned on a 3T MRI using fast low angle shot (FLASH) 2d, FLASH3d (parallel imaging factor (P) = P0, P2, P4), volumetric interpolated breath-hold examination (VIBE) (P = P0, P3, P2 × 2, P2 × 3, P3 × 2), golden-angle radial sparse parallel imaging (GRASP), and time-resolved imaging with stochastic trajectories (TWIST). Signal over time curves were normalized and quantitatively analyzed by full width half maximum (FWHM) measurements to assess variation within and between sequences. The coefficient of variation (CV) for the steady signal reference ranged from 0.07-0.8%. The non-accelerated gradient echo FLASH2d, FLASH3d, and VIBE sequences showed low within sequence variation with 2.1%, 1.0%, and 1.6%. The maximum FWHM CV was 3.2% for parallel imaging acceleration (VIBE P2 × 3), 2.7% for GRASP and 9.1% for TWIST. The FWHM CV between sequences ranged from 8.5-14.4% for most non-accelerated/accelerated gradient echo sequences except 6.2% for FLASH3d P0 and 0.3% for FLASH3d P2; GRASP FWHM CV was 9.9% versus 28% for TWIST. MRI acceleration techniques vary in reproducibility and quantification of the AIF. Incomplete coverage of the k-space with TWIST as a representative of view-sharing techniques showed the highest variation within sequences and might be less suited for reproducible quantification of the AIF. Copyright © 2018 Elsevier B.V. All rights reserved.
Time-domain seismic modeling in viscoelastic media for full waveform inversion on heterogeneous computing platforms with OpenCL

NASA Astrophysics Data System (ADS)

Fabien-Ouellet, Gabriel; Gloaguen, Erwan; Giroux, Bernard

2017-03-01

Full Waveform Inversion (FWI) aims at recovering the elastic parameters of the Earth by matching recordings of the ground motion with the direct solution of the wave equation. Modeling the wave propagation for realistic scenarios is computationally intensive, which limits the applicability of FWI. The current hardware evolution brings increasing parallel computing power that can speed up the computations in FWI. However, to take advantage of the diversity of parallel architectures presently available, new programming approaches are required. In this work, we explore the use of OpenCL to develop a portable code that can take advantage of the many parallel processor architectures now available. We present a program called SeisCL for 2D and 3D viscoelastic FWI in the time domain. The code computes the forward and adjoint wavefields using finite-difference and outputs the gradient of the misfit function given by the adjoint state method. To demonstrate the code portability on different architectures, the performance of SeisCL is tested on three different devices: Intel CPUs, NVidia GPUs and Intel Xeon PHI. Results show that the use of GPUs with OpenCL can speed up the computations by nearly two orders of magnitudes over a single threaded application on the CPU. Although OpenCL allows code portability, we show that some device-specific optimization is still required to get the best performance out of a specific architecture. Using OpenCL in conjunction with MPI allows the domain decomposition of large models on several devices located on different nodes of a cluster. For large enough models, the speedup of the domain decomposition varies quasi-linearly with the number of devices. Finally, we investigate two different approaches to compute the gradient by the adjoint state method and show the significant advantages of using OpenCL for FWI.
Parallel simulation of tsunami inundation on a large-scale supercomputer

NASA Astrophysics Data System (ADS)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Assessing Coupled Social Ecological Flood Vulnerability from Uttarakhand, India, to the State of New York with Google Earth Engine

NASA Astrophysics Data System (ADS)

Tellman, B.; Schwarz, B.

2014-12-01

This talk describes the development of a web application to predict and communicate vulnerability to floods given publicly available data, disaster science, and geotech cloud capabilities. The proof of concept in Google Earth Engine API with initial testing on case studies in New York and Utterakhand India demonstrates the potential of highly parallelized cloud computing to model socio-ecological disaster vulnerability at high spatial and temporal resolution and in near real time. Cloud computing facilitates statistical modeling with variables derived from large public social and ecological data sets, including census data, nighttime lights (NTL), and World Pop to derive social parameters together with elevation, satellite imagery, rainfall, and observed flood data from Dartmouth Flood Observatory to derive biophysical parameters. While more traditional, physically based hydrological models that rely on flow algorithms and numerical methods are currently unavailable in parallelized computing platforms like Google Earth Engine, there is high potential to explore "data driven" modeling that trades physics for statistics in a parallelized environment. A data driven approach to flood modeling with geographically weighted logistic regression has been initially tested on Hurricane Irene in southeastern New York. Comparison of model results with observed flood data reveals a 97% accuracy of the model to predict flooded pixels. Testing on multiple storms is required to further validate this initial promising approach. A statistical social-ecological flood model that could produce rapid vulnerability assessments to predict who might require immediate evacuation and where could serve as an early warning. This type of early warning system would be especially relevant in data poor places lacking the computing power, high resolution data such as LiDar and stream gauges, or hydrologic expertise to run physically based models in real time. As the data-driven model presented relies on globally available data, the only real time data input required would be typical data from a weather service, e.g. precipitation or coarse resolution flood prediction. However, model uncertainty will vary locally depending upon the resolution and frequency of observed flood and socio-economic damage impact data.
A high-order language for a system of closely coupled processing elements

NASA Technical Reports Server (NTRS)

Feyock, S.; Collins, W. R.

1986-01-01

The research reported in this paper was occasioned by the requirements on part of the Real-Time Digital Simulator (RTDS) project under way at NASA Lewis Research Center. The RTDS simulation scheme employs a network of CPUs running lock-step cycles in the parallel computations of jet airplane simulations. Their need for a high order language (HOL) that would allow non-experts to write simulation applications and that could be implemented on a possibly varying network can best be fulfilled by using the programming language Ada. We describe how the simulation problems can be modeled in Ada, how to map a single, multi-processing Ada program into code for individual processors, regardless of network reconfiguration, and why some Ada language features are particulary well-suited to network simulations.
Wolf-Rayet nebulae - Chemical enrichment and effective temperatures of the exciting stars

NASA Technical Reports Server (NTRS)

Rosa, Michael R.; Mathis, John S.

1990-01-01

Extensive new spectrophotometric observations of five Wolf-Rayet nebulas are analyzed by means of models photoionized by plane-parallel and also WR atmosphere models. Abundance ratios O/H and Ne, S, Cl, and Ar relative to O are close to solar. N/H is enriched relative to solar and variable over the faces of the nebulas. He/H varies from one to three times solar. The O(+)/O - S(+)/S(2+) diagram is used in estimating T(eff) for the exciting stars. It indicates that S 308, NGC 3199, NGC 6888, and NGC 2359 are ionized by hot stars. RCW 58, RCW 104, MR 26, and MR 100 have such low-excitation spectra that their stellar T(eff) and nebular He/H cannot be reliably determined.
Matrix multiplication on the Intel Touchstone Delta

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huss-Lederman, S.; Jacobson, E.M.; Tsao, A.

1993-12-31

Matrix multiplication is a key primitive in block matrix algorithms such as those found in LAPACK. We present results from our study of matrix multiplication algorithms on the Intel Touchstone Delta, a distributed memory message-passing architecture with a two-dimensional mesh topology. We obtain an implementation that uses communication primitives highly suited to the Delta and exploits the single node assembly-coded matrix multiplication. Our algorithm is completely general, able to deal with arbitrary mesh aspect ratios and matrix dimensions, and has achieved parallel efficiency of 86% with overall peak performance in excess of 8 Gflops on 256 nodes for an 8800more » {times} 8800 matrix. We describe our algorithm design and implementation, and present performance results that demonstrate scalability and robust behavior over varying mesh topologies.« less
Integration of perception and reasoning in fast neural modules

NASA Technical Reports Server (NTRS)

Fritz, David G.

1989-01-01

Artificial neural systems promise to integrate symbolic and sub-symbolic processing to achieve real time control of physical systems. Two potential alternatives exist. In one, neural nets can be used to front-end expert systems. The expert systems, in turn, are developed with varying degrees of parallelism, including their implementation in neural nets. In the other, rule-based reasoning and sensor data can be integrated within a single hybrid neural system. The hybrid system reacts as a unit to provide decisions (problem solutions) based on the simultaneous evaluation of data and rules. Discussed here is a model hybrid system based on the fuzzy cognitive map (FCM). The operation of the model is illustrated with the control of a hypothetical satellite that intelligently alters its attitude in space in response to an intersecting micrometeorite shower.
Flight Results from the HST SM4 Relative Navigation Sensor System

NASA Technical Reports Server (NTRS)

Naasz, Bo; Eepoel, John Van; Queen, Steve; Southward, C. Michael; Hannah, Joel

2010-01-01

On May 11, 2009, Space Shuttle Atlantis roared off of Launch Pad 39A enroute to the Hubble Space Telescope (HST) to undertake its final servicing of HST, Servicing Mission 4. Onboard Atlantis was a small payload called the Relative Navigation Sensor experiment, which included three cameras of varying focal ranges, avionics to record images and estimate, in real time, the relative position and attitude (aka "pose") of the telescope during rendezvous and deploy. The avionics package, known as SpaceCube and developed at the Goddard Space Flight Center, performed image processing using field programmable gate arrays to accelerate this process, and in addition executed two different pose algorithms in parallel, the Goddard Natural Feature Image Recognition and the ULTOR Passive Pose and Position Engine (P3E) algorithms

An overview of confounding. Part 2: how to identify it and special situations.

PubMed

Howards, Penelope P

2018-04-01

Confounding biases study results when the effect of the exposure on the outcome mixes with the effects of other risk and protective factors for the outcome that are present differentially by exposure status. However, not all differences between the exposed and unexposed group cause confounding. Thus, sources of confounding must be identified before they can be addressed. Confounding is absent in an ideal study where all of the population of interest is exposed in one universe and is unexposed in a parallel universe. In an actual study, an observed unexposed population represents the unobserved parallel universe. Thinking about differences between this substitute population and the unexposed parallel universe helps identify sources of confounding. These differences can then be represented in a diagram that shows how risk and protective factors for the outcome are related to the exposure. Sources of confounding identified in the diagram should be addressed analytically and through study design. However, treating all factors that differ by exposure status as confounders without considering the structure of their relation to the exposure can introduce bias. For example, conditions affected by the exposure are not confounders. There are also special types of confounding, such as time-varying confounding and unfixable confounding. It is important to evaluate carefully whether factors of interest contribute to confounding because bias can be introduced both by ignoring potential confounders and by adjusting for factors that are not confounders. The resulting bias can result in misleading conclusions about the effect of the exposure of interest on the outcome. © 2018 Nordic Federation of Societies of Obstetrics and Gynecology.
Nerve Fiber Activation During Peripheral Nerve Field Stimulation: Importance of Electrode Orientation and Estimation of Area of Paresthesia.

PubMed

Frahm, Ken Steffen; Hennings, Kristian; Vera-Portocarrero, Louis; Wacnik, Paul W; Mørch, Carsten Dahl

2016-04-01

Low back pain is one of the indications for using peripheral nerve field stimulation (PNFS). However, the effect of PNFS varies between patients; several stimulation parameters have not been investigated in depth, such as orientation of the nerve fiber in relation to the electrode. While placing the electrode parallel to the nerve fiber may give lower activation thresholds, anodal blocking may occur when the propagating action potential passes an anode. A finite element model was used to simulate the extracellular potential during PNFS. This was combined with an active cable model of Aβ and Aδ nerve fibers. It was investigated how the angle between the nerve fiber and electrode affected the nerve activation and whether anodal blocking could occur. Finally, the area of paresthesia was estimated and compared with any concomitant Aδ fiber activation. The lowest threshold was found when nerve and electrode were in parallel, and that anodal blocking did not appear to occur during PNFS. The activation of Aβ fibers was within therapeutic range (<10V) of PNFS; however, within this range, Aδ fiber activation also may occur. The combined area of activated Aβ fibers (paresthesia) was at least two times larger than Aδ fibers for similar stimulation intensities. No evidence of anodal blocking was observed in this PNFS model. The thresholds were lowest when the nerves and electrodes were parallel; thus, it may be relevant to investigate the overall position of the target nerve fibers prior to electrode placement. © 2015 International Neuromodulation Society.
Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

NASA Astrophysics Data System (ADS)

Zerr, Robert Joseph

2011-12-01

The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)
Communication library for run-time visualization of distributed, asynchronous data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rowlan, J.; Wightman, B.T.

1994-04-01

In this paper we present a method for collecting and visualizing data generated by a parallel computational simulation during run time. Data distributed across multiple processes is sent across parallel communication lines to a remote workstation, which sorts and queues the data for visualization. We have implemented our method in a set of tools called PORTAL (for Parallel aRchitecture data-TrAnsfer Library). The tools comprise generic routines for sending data from a parallel program (callable from either C or FORTRAN), a semi-parallel communication scheme currently built upon Unix Sockets, and a real-time connection to the scientific visualization program AVS. Our methodmore » is most valuable when used to examine large datasets that can be efficiently generated and do not need to be stored on disk. The PORTAL source libraries, detailed documentation, and a working example can be obtained by anonymous ftp from info.mcs.anl.gov from the file portal.tar.Z from the directory pub/portal.« less
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

2004-01-01

In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers

NASA Technical Reports Server (NTRS)

Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)

1997-01-01

The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Testing for carryover effects after cessation of treatments: a design approach.

PubMed

Sturdevant, S Gwynn; Lumley, Thomas

2016-08-02

Recently, trials addressing noisy measurements with diagnosis occurring by exceeding thresholds (such as diabetes and hypertension) have been published which attempt to measure carryover - the impact that treatment has on an outcome after cessation. The design of these trials has been criticised and simulations have been conducted which suggest that the parallel-designs used are not adequate to test this hypothesis; two solutions are that either a differing parallel-design or a cross-over design could allow for diagnosis of carryover. We undertook a systematic simulation study to determine the ability of a cross-over or a parallel-group trial design to detect carryover effects on incident hypertension in a population with prehypertension. We simulated blood pressure and focused on varying criteria to diagnose systolic hypertension. Using the difference in cumulative incidence hypertension to analyse parallel-group or cross-over trials resulted in none of the designs having acceptable Type I error rate. Under the null hypothesis of no carryover the difference is well above the nominal 5 % error rate. When a treatment is effective during the intervention period, reliable testing for a carryover effect is difficult. Neither parallel-group nor cross-over designs using the difference in cumulative incidence appear to be a feasible approach. Future trials should ensure their design and analysis is validated by simulation.
GSRP/David Marshall: Fully Automated Cartesian Grid CFD Application for MDO in High Speed Flows

NASA Technical Reports Server (NTRS)

2003-01-01

With the renewed interest in Cartesian gridding methodologies for the ease and speed of gridding complex geometries in addition to the simplicity of the control volumes used in the computations, it has become important to investigate ways of extending the existing Cartesian grid solver functionalities. This includes developing methods of modeling the viscous effects in order to utilize Cartesian grids solvers for accurate drag predictions and addressing the issues related to the distributed memory parallelization of Cartesian solvers. This research presents advances in two areas of interest in Cartesian grid solvers, viscous effects modeling and MPI parallelization. The development of viscous effects modeling using solely Cartesian grids has been hampered by the widely varying control volume sizes associated with the mesh refinement and the cut cells associated with the solid surface. This problem is being addressed by using physically based modeling techniques to update the state vectors of the cut cells and removing them from the finite volume integration scheme. This work is performed on a new Cartesian grid solver, NASCART-GT, with modifications to its cut cell functionality. The development of MPI parallelization addresses issues associated with utilizing Cartesian solvers on distributed memory parallel environments. This work is performed on an existing Cartesian grid solver, CART3D, with modifications to its parallelization methodology.
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Parallel performance of TORT on the CRAY J90: Model and measurement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barnett, A.; Azmy, Y.Y.

1997-10-01

A limitation on the parallel performance of TORT on the CRAY J90 is the amount of extra work introduced by the multitasking algorithm itself. The extra work beyond that of the serial version of the code, called overhead, arises from the synchronization of the parallel tasks and the accumulation of results by the master task. The goal of recent updates to TORT was to reduce the time consumed by these activities. To help understand which components of the multitasking algorithm contribute significantly to the overhead, a parallel performance model was constructed and compared to measurements of actual timings of themore » code.« less
Efficient parallel implementation of active appearance model fitting algorithm on GPU.

PubMed

Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

2014-01-01

The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

PubMed Central

Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

2014-01-01

The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

PubMed

Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

2013-10-24

Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (<100 ms) moved to the center only one saccade later. Our data on eye movement positions provide novel evidence of parallel programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.
A visual parallel-BCI speller based on the time-frequency coding strategy.

PubMed

Xu, Minpeng; Chen, Long; Zhang, Lixin; Qi, Hongzhi; Ma, Lan; Tang, Jiabei; Wan, Baikun; Ming, Dong

2014-04-01

Spelling is one of the most important issues in brain-computer interface (BCI) research. This paper is to develop a visual parallel-BCI speller system based on the time-frequency coding strategy in which the sub-speller switching among four simultaneously presented sub-spellers and the character selection are identified in a parallel mode. The parallel-BCI speller was constituted by four independent P300+SSVEP-B (P300 plus SSVEP blocking) spellers with different flicker frequencies, thereby all characters had a specific time-frequency code. To verify its effectiveness, 11 subjects were involved in the offline and online spellings. A classification strategy was designed to recognize the target character through jointly using the canonical correlation analysis and stepwise linear discriminant analysis. Online spellings showed that the proposed parallel-BCI speller had a high performance, reaching the highest information transfer rate of 67.4 bit min(-1), with an average of 54.0 bit min(-1) and 43.0 bit min(-1) in the three rounds and five rounds, respectively. The results indicated that the proposed parallel-BCI could be effectively controlled by users with attention shifting fluently among the sub-spellers, and highly improved the BCI spelling performance.
The role of parallelism in the real-time processing of anaphora.

PubMed

Poirier, Josée; Walenski, Matthew; Shapiro, Lewis P

2012-06-01

Parallelism effects refer to the facilitated processing of a target structure when it follows a similar, parallel structure. In coordination, a parallelism-related conjunction triggers the expectation that a second conjunct with the same structure as the first conjunct should occur. It has been proposed that parallelism effects reflect the use of the first structure as a template that guides the processing of the second. In this study, we examined the role of parallelism in real-time anaphora resolution by charting activation patterns in coordinated constructions containing anaphora, Verb-Phrase Ellipsis (VPE) and Noun-Phrase Traces (NP-traces). Specifically, we hypothesised that an expectation of parallelism would incite the parser to assume a structure similar to the first conjunct in the second, anaphora-containing conjunct. The speculation of a similar structure would result in early postulation of covert anaphora. Experiment 1 confirms that following a parallelism-related conjunction, first-conjunct material is activated in the second conjunct. Experiment 2 reveals that an NP-trace in the second conjunct is posited immediately where licensed, which is earlier than previously reported in the literature. In light of our findings, we propose an intricate relation between structural expectations and anaphor resolution.
The role of parallelism in the real-time processing of anaphora

PubMed Central

Poirier, Josée; Walenski, Matthew; Shapiro, Lewis P.

2012-01-01

Parallelism effects refer to the facilitated processing of a target structure when it follows a similar, parallel structure. In coordination, a parallelism-related conjunction triggers the expectation that a second conjunct with the same structure as the first conjunct should occur. It has been proposed that parallelism effects reflect the use of the first structure as a template that guides the processing of the second. In this study, we examined the role of parallelism in real-time anaphora resolution by charting activation patterns in coordinated constructions containing anaphora, Verb-Phrase Ellipsis (VPE) and Noun-Phrase Traces (NP-traces). Specifically, we hypothesised that an expectation of parallelism would incite the parser to assume a structure similar to the first conjunct in the second, anaphora-containing conjunct. The speculation of a similar structure would result in early postulation of covert anaphora. Experiment 1 confirms that following a parallelism-related conjunction, first-conjunct material is activated in the second conjunct. Experiment 2 reveals that an NP-trace in the second conjunct is posited immediately where licensed, which is earlier than previously reported in the literature. In light of our findings, we propose an intricate relation between structural expectations and anaphor resolution. PMID:23741080
A Comparison of Automatic Parallelization Tools/Compilers on the SGI Origin 2000 Using the NAS Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash; Frumkin, Michael; Hribar, Michelle; Jin, Hao-Qiang; Waheed, Abdul; Yan, Jerry

1998-01-01

Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is extremely time consuming and costly, porting codes would ideally be automated by using some parallelization tools and compilers. In this paper, we compare the performance of the hand written NAB Parallel Benchmarks against three parallel versions generated with the help of tools and compilers: 1) CAPTools: an interactive computer aided parallelization too] that generates message passing code, 2) the Portland Group's HPF compiler and 3) using compiler directives with the native FORTAN77 compiler on the SGI Origin2000.
Run-time parallelization and scheduling of loops

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

1990-01-01

Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure.
Computational time analysis of the numerical solution of 3D electrostatic Poisson's equation

NASA Astrophysics Data System (ADS)

Kamboh, Shakeel Ahmed; Labadin, Jane; Rigit, Andrew Ragai Henri; Ling, Tech Chaw; Amur, Khuda Bux; Chaudhary, Muhammad Tayyab

2015-05-01

3D Poisson's equation is solved numerically to simulate the electric potential in a prototype design of electrohydrodynamic (EHD) ion-drag micropump. Finite difference method (FDM) is employed to discretize the governing equation. The system of linear equations resulting from FDM is solved iteratively by using the sequential Jacobi (SJ) and sequential Gauss-Seidel (SGS) methods, simulation results are also compared to examine the difference between the results. The main objective was to analyze the computational time required by both the methods with respect to different grid sizes and parallelize the Jacobi method to reduce the computational time. In common, the SGS method is faster than the SJ method but the data parallelism of Jacobi method may produce good speedup over SGS method. In this study, the feasibility of using parallel Jacobi (PJ) method is attempted in relation to SGS method. MATLAB Parallel/Distributed computing environment is used and a parallel code for SJ method is implemented. It was found that for small grid size the SGS method remains dominant over SJ method and PJ method while for large grid size both the sequential methods may take nearly too much processing time to converge. Yet, the PJ method reduces computational time to some extent for large grid sizes.
Criteria for approximating certain microgravity flow boiling characteristics in Earth gravity.

PubMed

Merte, Herman; Park, Jaeseok; Shultz, William W; Keller, Robert B

2002-10-01

The forces governing flow boiling, aside from system pressure, are buoyancy, liquid momentum, interfacial surface tensions, and liquid viscosity. Guidance for approximating certain aspects of the flow boiling process in microgravity can be obtained in Earth gravity research by the imposition of a liquid velocity parallel to a flat heater surface in the inverted position, horizontal, or nearly horizontal, by having buoyancy hold the heated liquid and vapor formed close to the heater surface. Bounds on the velocities of interest are obtained from several dimensionless numbers: a two-phase Richardson number, a two-phase Weber number, and a Bond number. For the fluid used in the experimental work here, liquid velocities in the range U = 5-10cm/sec are judged to be critical for changes in behavior of the flow boiling process. Experimental results are presented for flow boiling heat transfer, concentrating on orientations that provide the largest reductions in buoyancy parallel to the heater surface, varying +/-5 degrees from facing horizontal downward. Results are presented for velocity, orientation, and subcooling effects on nucleation, dryout, and heat transfer. Two different heater surfaces were used: a thin gold film on a polished quartz substrate, acting as a heater and resistance thermometer, and a gold-plated copper heater. Both transient and steady measurements of surface heat flux and superheat were made with the quartz heater; only steady measurements were possible with the copper heater. R-113 was the fluid used; the velocity varied over the interval 4-16cm/sec; bulk liquid subcooling varied over 2-20 degrees C; heat flux varied over 4-8W/cm(2).

Efficient Parallel Levenberg-Marquardt Model Fitting towards Real-Time Automated Parametric Imaging Microscopy

PubMed Central

Zhu, Xiang; Zhang, Dianwen

2013-01-01

We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

NASA Astrophysics Data System (ADS)

Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

2017-07-01

Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.
Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform.

PubMed

Tao, Liang; Kwan, Hon Keung

2012-07-01

Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing.
Parallels in History.

ERIC Educational Resources Information Center

Mugleston, William F.

2000-01-01

Believes that by focusing on the recurrent situations and problems, or parallels, throughout history, students will understand the relevance of history to their own times and lives. Provides suggestions for parallels in history that may be introduced within lectures or as a means to class discussions. (CMK)
Observing with HST V: Improvements to the Scheduling of HST Parallel Observations

NASA Astrophysics Data System (ADS)

Taylor, D. K.; Vanorsow, D.; Lucks, M.; Henry, R.; Ratnatunga, K.; Patterson, A.

1994-12-01

Recent improvements to the Hubble Space Telescope (HST) ground system have significantly increased the frequency of pure parallel observations, i.e. the simultaneous use of multiple HST instruments by different observers. Opportunities for parallel observations are limited by a variety of timing, hardware, and scientific constraints. Formerly, such opportunities were heuristically predicted prior to the construction of the primary schedule (or calendar), and lack of complete information resulted in high rates of scheduling failures and missed opportunities. In the current process the search for parallel opportunities is delayed until the primary schedule is complete, at which point new software tools are employed to identify places where parallel observations are supported. The result has been a considerable increase in parallel throughput. A new technique, known as ``parallel crafting,'' is currently under development to streamline further the parallel scheduling process. This radically new method will replace the standard exposure logsheet with a set of abstract rules from which observation parameters will be constructed ``on the fly'' to best match the constraints of the parallel opportunity. Currently, parallel observers must specify a huge (and highly redundant) set of exposure types in order to cover all possible types of parallel opportunities. Crafting rules permit the observer to express timing, filter, and splitting preferences in a far more succinct manner. The issue of coordinated parallel observations (same PI using different instruments simultaneously), long a troublesome aspect of the ground system, is also being addressed. For Cycle 5, the Phase II Proposal Instructions now have an exposure-level PAR WITH special requirement. While only the primary's alignment will be scheduled on the calendar, new commanding will provide for parallel exposures with both instruments.
Molecular events during the early stages of aggregation of GNNQQNY: An all atom MD simulation study of randomly dispersed peptides.

PubMed

Srivastava, Alka; Balaji, Petety V

2015-12-01

This study probes the early events during lag phase of aggregation of GNNQQNY using all atom MD simulations in explicit solvent. Simulations were performed by varying system size, temperature and starting configuration. Peptides dispersed randomly in the simulation box come together early on in the simulation and form aggregates. These aggregates are dynamic implying the absence of stabilizing interactions. This facilitates the exploration of alternate arrangements. The constituent peptides sample a variety of conformations, frequently re-orient and re-arrange with respect to each other and dissociate from/re-associate with the aggregate. The size and lifetime of aggregates vary depending upon the number of inter-peptide backbone H-bonds. Most of the aggregates formed are amorphous but crystalline aggregates of smaller size (mainly 2-mers) do appear and sustain for varying durations of time. The peptides in crystalline 2-mers are mostly anti-parallel. The largest crystalline aggregate that appears is a 4-mer in a single sheet and a 4-, 5-, or 6-mer in double layered arrangement. Crystalline aggregates grow either by the sequential addition of peptides, or by the head-on or lateral collision-adhesion of 2-mers. The formation of various smaller aggregates suggests the polymorphic nature of oligomers and heterogeneity in the lag phase. Copyright © 2015 Elsevier Inc. All rights reserved.
/S/ Variation as Accommodation.

ERIC Educational Resources Information Center

Coles, Felice Anne

1993-01-01

The few remaining fluent speakers of the isleno dialect of Spanish vary their casual pronunciation of /s/ in a manner consistent with, but not identical to, other Caribbean Spanish dialects. The behavior of /s/ in the speech of nonfluent islenos parallels that of fluent speakers, differing only in the higher degree of aspiration and deletion. This…
Enhancing Established Counting Routines to Promote Place-Value Understanding: An Empirical Study in Early Elementary Classrooms

ERIC Educational Resources Information Center

Fraivillig, Judith L.

2018-01-01

Understanding place value is a critical and foundational competency for elementary mathematics. Classroom teachers who endeavor to promote place-value development adopt a variety of established practices to varying degrees of effectiveness. In parallel, researchers have validated models of how young children acquire place-value understanding.…
Wood

Treesearch

David W. Green; Robert H. White; Antoni TenWolde; William Simpson; Joseph Murphy; Robert J. Ross; Roland Hernandez; Stan T. Lebow

2006-01-01

Wood is a naturally formed organic material consisting essentially of elongated tubular elements called cells arranged in a parallel manner for the most part. These cells vary in dimensions and wall thickness with position in the tree, age, conditions of growth, and kind of tree. The walls of the cells are formed principally of chain molecules of cellulose, polymerized...
Magnetic arrays

DOEpatents

Trumper, D.L.; Kim, W.; Williams, M.E.

1997-05-20

Electromagnet arrays are disclosed which can provide selected field patterns in either two or three dimensions, and in particular, which can provide single-sided field patterns in two or three dimensions. These features are achieved by providing arrays which have current densities that vary in the windings both parallel to the array and in the direction of array thickness. 12 figs.
HEVC real-time decoding

NASA Astrophysics Data System (ADS)

Bross, Benjamin; Alvarez-Mesa, Mauricio; George, Valeri; Chi, Chi Ching; Mayer, Tobias; Juurlink, Ben; Schierl, Thomas

2013-09-01

The new High Efficiency Video Coding Standard (HEVC) was finalized in January 2013. Compared to its predecessor H.264 / MPEG4-AVC, this new international standard is able to reduce the bitrate by 50% for the same subjective video quality. This paper investigates decoder optimizations that are needed to achieve HEVC real-time software decoding on a mobile processor. It is shown that HEVC real-time decoding up to high definition video is feasible using instruction extensions of the processor while decoding 4K ultra high definition video in real-time requires additional parallel processing. For parallel processing, a picture-level parallel approach has been chosen because it is generic and does not require bitstreams with special indication.
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nomura, K; Seymour, R; Wang, W

2009-02-17

A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

PubMed

Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Long-term stability of radiotherapy dosimeters calibrated at the Polish Secondary Standard Dosimetry Laboratory.

PubMed

Ulkowski, Piotr; Bulski, Wojciech; Chełmiński, Krzysztof

2015-10-01

Unidos 10001, Unidos E (10008/10009) and Dose 1 electrometers from 14 radiotherapy centres were calibrated 3-4 times over a long period of time, together with Farmer type (PTW 30001, 30013, Nuclear Enterprises 2571 and Scanditronix-Wellhofer FC65G) cylindrical ionization chambers and plane-parallel type chambers (PTW Markus 23343 and Scanditronix-Wellhofer PPC05). On the basis of the long period of repetitive establishing of calibration coefficients for the same electrometers and ionization chambers, the accuracy of electrometers and the long-term stability of ionization chambers were examined. All measurements were carried out at the same laboratory, by the same staff, according to the same IAEA recommendations. A good accuracy and long-term stability of the dosimeters used in Polish radiotherapy centres was observed. These values were within 0.1% for electrometers and 0.2% for the chambers with electrometers. Furthermore, these values were not observed to vary over time. The observations confirm the opinion that the requirement of calibration of the dosimeters more often than every 2 years is not justified. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automatic Fitting of Spiking Neuron Models to Electrophysiological Recordings

PubMed Central

Rossant, Cyrille; Goodman, Dan F. M.; Platkiewicz, Jonathan; Brette, Romain

2010-01-01

Spiking models can accurately predict the spike trains produced by cortical neurons in response to somatically injected currents. Since the specific characteristics of the model depend on the neuron, a computational method is required to fit models to electrophysiological recordings. The fitting procedure can be very time consuming both in terms of computer simulations and in terms of code writing. We present algorithms to fit spiking models to electrophysiological data (time-varying input and spike trains) that can run in parallel on graphics processing units (GPUs). The model fitting library is interfaced with Brian, a neural network simulator in Python. If a GPU is present it uses just-in-time compilation to translate model equations into optimized code. Arbitrary models can then be defined at script level and run on the graphics card. This tool can be used to obtain empirically validated spiking models of neurons in various systems. We demonstrate its use on public data from the INCF Quantitative Single-Neuron Modeling 2009 competition by comparing the performance of a number of neuron spiking models. PMID:20224819
Parallel Implementation of a High Order Implicit Collocation Method for the Heat Equation

NASA Technical Reports Server (NTRS)

Kouatchou, Jules; Halem, Milton (Technical Monitor)

2000-01-01

We combine a high order compact finite difference approximation and collocation techniques to numerically solve the two dimensional heat equation. The resulting method is implicit arid can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank-Nicolson method, where the parallelization is done across space only. Numerical experiments are carried out on the SGI Origin 2000.
Parallelization of the TRIGRS model for rainfall-induced landslides using the message passing interface

USGS Publications Warehouse

Alvioli, M.; Baum, R.L.

2016-01-01

We describe a parallel implementation of TRIGRS, the Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability Model for the timing and distribution of rainfall-induced shallow landslides. We have parallelized the four time-demanding execution modes of TRIGRS, namely both the saturated and unsaturated model with finite and infinite soil depth options, within the Message Passing Interface framework. In addition to new features of the code, we outline details of the parallel implementation and show the performance gain with respect to the serial code. Results are obtained both on commercial hardware and on a high-performance multi-node machine, showing the different limits of applicability of the new code. We also discuss the implications for the application of the model on large-scale areas and as a tool for real-time landslide hazard monitoring.
A fully parallel in time and space algorithm for simulating the electrical activity of a neural tissue.

PubMed

Bedez, Mathieu; Belhachmi, Zakaria; Haeberlé, Olivier; Greget, Renaud; Moussaoui, Saliha; Bouteiller, Jean-Marie; Bischoff, Serge

2016-01-15

The resolution of a model describing the electrical activity of neural tissue and its propagation within this tissue is highly consuming in term of computing time and requires strong computing power to achieve good results. In this study, we present a method to solve a model describing the electrical propagation in neuronal tissue, using parareal algorithm, coupling with parallelization space using CUDA in graphical processing unit (GPU). We applied the method of resolution to different dimensions of the geometry of our model (1-D, 2-D and 3-D). The GPU results are compared with simulations from a multi-core processor cluster, using message-passing interface (MPI), where the spatial scale was parallelized in order to reach a comparable calculation time than that of the presented method using GPU. A gain of a factor 100 in term of computational time between sequential results and those obtained using the GPU has been obtained, in the case of 3-D geometry. Given the structure of the GPU, this factor increases according to the fineness of the geometry used in the computation. To the best of our knowledge, it is the first time such a method is used, even in the case of neuroscience. Parallelization time coupled with GPU parallelization space allows for drastically reducing computational time with a fine resolution of the model describing the propagation of the electrical signal in a neuronal tissue. Copyright © 2015 Elsevier B.V. All rights reserved.
Workload capacity spaces: a unified methodology for response time measures of efficiency as workload is varied.

PubMed

Townsend, James T; Eidels, Ami

2011-08-01

Increasing the number of available sources of information may impair or facilitate performance, depending on the capacity of the processing system. Tests performed on response time distributions are proving to be useful tools in determining the workload capacity (as well as other properties) of cognitive systems. In this article, we develop a framework and relevant mathematical formulae that represent different capacity assays (Miller's race model bound, Grice's bound, and Townsend's capacity coefficient) in the same space. The new space allows a direct comparison between the distinct bounds and the capacity coefficient values and helps explicate the relationships among the different measures. An analogous common space is proposed for the AND paradigm, relating the capacity index to the Colonius-Vorberg bounds. We illustrate the effectiveness of the unified spaces by presenting data from two simulated models (standard parallel, coactive) and a prototypical visual detection experiment. A conversion table for the unified spaces is provided.
Multitasking the Davidson algorithm for the large, sparse eigenvalue problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Umar, V.M.; Fischer, C.F.

1989-01-01

The authors report how the Davidson algorithm, developed for handling the eigenvalue problem for large and sparse matrices arising in quantum chemistry, was modified for use in atomic structure calculations. To date these calculations have used traditional eigenvalue methods, which limit the range of feasible calculations because of their excessive memory requirements and unsatisfactory performance attributed to time-consuming and costly processing of zero valued elements. The replacement of a traditional matrix eigenvalue method by the Davidson algorithm reduced these limitations. Significant speedup was found, which varied with the size of the underlying problem and its sparsity. Furthermore, the range ofmore » matrix sizes that can be manipulated efficiently was expended by more than one order or magnitude. On the CRAY X-MP the code was vectorized and the importance of gather/scatter analyzed. A parallelized version of the algorithm obtained an additional 35% reduction in execution time. Speedup due to vectorization and concurrency was also measured on the Alliant FX/8.« less

Sheathfolds in rheomorphic ignimbrites

USGS Publications Warehouse

Branney, M.J.; Barry, T.L.; Godchaux, Martha

2004-01-01

Structural reappraisal of several classic rheomorphic ignimbrites in Colorado, Idaho, the Canary Islands and Italy has, for the first time, revealed abundant oblique folds, curvilinear folds and sheathfolds which formed during emplacement. Like their equivalents in tectonic shear-zones, the sheathfold axes lie sub-parallel to a pervasive elongation lineation, and appear as eye structures on rock surfaces normal to the transport direction. With the recognition of sheathfolds, ignimbrites previously inferred to have undergone complex rheomorphic deformation histories are re-interpreted as recording a single, progressive deformation event. In some examples, the trends of sheathfolds and related lineations change with height through a single ignimbrite suggesting that rheomorphism did not affect the entire thickness of ignimbrite synchronously. Instead, we infer that in these ignimbrites a thin ductile shear-zone rose gradually through the aggrading agglutinating mass whilst the flow direction varied with time. This suggests that, in some cases, both welding and rheomorphism can be extremely rapid, with ductile strain rates significantly exceeding rates of ignimbrite aggradation. ?? Springer-Verlag 2004.
Evaluation of 2 cognitive abilities tests in a dual-task environment

NASA Technical Reports Server (NTRS)

Vidulich, M. A.; Tsang, P. S.

1986-01-01

Most real world operators are required to perform multiple tasks simultaneously. In some cases, such as flying a high performance aircraft or trouble shooting a failing nuclear power plant, the operator's ability to time share or process in parallel" can be driven to extremes. This has created interest in selection tests of cognitive abilities. Two tests that have been suggested are the Dichotic Listening Task and the Cognitive Failures Questionnaire. Correlations between these test results and time sharing performance were obtained and the validity of these tests were examined. The primary task was a tracking task with dynamically varying bandwidth. This was performed either alone or concurrently with either another tracking task or a spatial transformation task. The results were: (1) An unexpected negative correlation was detected between the two tests; (2) The lack of correlation between either test and task performance made the predictive utility of the tests scores appear questionable; (3) Pilots made more errors on the Dichotic Listening Task than college students.
The Symptoms and Functioning Severity Scale (SFSS): Psychometric Evaluation and Discrepancies among Youth, Caregiver, and Clinician Ratings over Time

PubMed Central

Athay, M. Michele; Riemer, Manuel; Bickman, Leonard

2012-01-01

This paper describes the development and psychometric evaluation of the Symptoms and Functioning Severity Scale (SFSS), which includes three parallel forms to systematically capture clinician, youth, and caregiver perspectives of youth symptoms on a frequent basis. While there is widespread consensus that different raters of youth psychopathology vary significantly in their assessment this is the first paper that specifically investigates the discrepancies among clinician, youth, and caregiver ratings in a community mental health setting throughout the treatment process. Results for all three respondent versions indicate the SFSS is a psychometrically sound instrument for use in this population. Significant discrepancies in scores exist at baseline among the three respondents. Longitudinal analyses reveal the youth-clinician and caregiver-clinician score discrepancies decrease significantly over time. Differences by youth gender exist for caregiver-clinician discrepancies. The average youth-caregiver score discrepancy remains consistent throughout treatment. Implications for future research and clinical practice are discussed. PMID:22407556
Through thick and thin: a microfluidic approach for continuous measurements of biofilm viscosity and the effect of ionic strength.

PubMed

Paquet-Mercier, F; Parvinzadeh Gashti, M; Bellavance, J; Taghavi, S M; Greener, J

2016-11-29

Continuous, non-intrusive measurements of time-varying viscosity of Pseudomonas sp. biofilms are made using a microfluidic method that combines video tracking with a semi-empirical viscous flow model. The approach uses measured velocity and height of tracked biofilm segments, which move under the constant laminar flow of a nutrient solution. Following a low viscosity growth stage, rapid thickening was observed. During this stage, viscosity increased by over an order of magnitude in less than ten hours. The technique was also demonstrated as a promising platform for parallel experiments by subjecting multiple biofilm-laden microchannels to nutrient solutions containing NaCl in the range of 0 to 34 mM. Preliminary data suggest a strong relationship between ionic strength and biofilm properties, such as average viscosity and rapid thickening onset time. The technique opens the way for a combinatorial approach to study the response of biofilm viscosity under well-controlled physical, chemical and biological growth conditions.
PARALLEL MEASUREMENT AND MODELING OF TRANSPORT IN THE DARHT II BEAMLINE ON ETA II

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chambers, F W; Raymond, B A; Falabella, S

To successfully tune the DARHT II transport beamline requires the close coupling of a model of the beam transport and the measurement of the beam observables as the beam conditions and magnet settings are varied. For the ETA II experiment using the DARHT II beamline components this was achieved using the SUICIDE (Simple User Interface Connecting to an Integrated Data Environment) data analysis environment and the FITS (Fully Integrated Transport Simulation) model. The SUICIDE environment has direct access to the experimental beam transport data at acquisition and the FITS predictions of the transport for immediate comparison. The FITS model ismore » coupled into the control system where it can read magnet current settings for real time modeling. We find this integrated coupling is essential for model verification and the successful development of a tuning aid for the efficient convergence on a useable tune. We show the real time comparisons of simulation and experiment and explore the successes and limitations of this close coupled approach.« less
Size-controlled InGaN/GaN nanorod LEDs with an ITO/graphene transparent layer

NASA Astrophysics Data System (ADS)

Shim, Jae-Phil; Seong, Won-Seok; Min, Jung-Hong; Kong, Duk-Jo; Seo, Dong-Ju; Kim, Hyung-jun; Lee, Dong-Seon

2016-11-01

We introduce ITO on graphene as a current-spreading layer for separated InGaN/GaN nanorod LEDs for the purpose of passivation-free and high light-extraction efficiency. Transferred graphene on InGaN/GaN nanorods effectively blocks the diffusion of ITO atoms to nanorods, facilitating the production of transparent ITO/graphene contact on parallel-nanorod LEDs, without filling the air gaps, like a bridge structure. The ITO/graphene layer sufficiently spreads current in a lateral direction, resulting in uniform and reliable light emission observed from the whole area of the top surface. Using KOH treatment, we reduce series resistance and reverse leakage current in nanorod LEDs by recovering the plasma-damaged region. We also control the size of the nanorods by varying the KOH treatment time and observe strain relaxation via blueshift in electroluminescence. As a result, bridge-structured LEDs with 8 min of KOH treatment show 15 times higher light-emitting efficiency than with 2 min of KOH treatment.
A queueing network model to analyze the impact of parallelization of care on patient cycle time.

PubMed

Jiang, Lixiang; Giachetti, Ronald E

2008-09-01

The total time a patient spends in an outpatient facility, called the patient cycle time, is a major contributor to overall patient satisfaction. A frequently recommended strategy to reduce the total time is to perform some activities in parallel thereby shortening patient cycle time. To analyze patient cycle time this paper extends and improves upon existing multi-class open queueing network model (MOQN) so that the patient flow in an urgent care center can be modeled. Results of the model are analyzed using data from an urgent care center contemplating greater parallelization of patient care activities. The results indicate that parallelization can reduce the cycle time for those patient classes which require more than one diagnostic and/ or treatment intervention. However, for many patient classes there would be little if any improvement, indicating the importance of tools to analyze business process reengineering rules. The paper makes contributions by implementing an approximation for fork/join queues in the network and by improving the approximation for multiple server queues in both low traffic and high traffic conditions. We demonstrate the accuracy of the MOQN results through comparisons to simulation results.
A class of parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
Pharmaceutical removal in tropical subsurface flow constructed wetlands at varying hydraulic loading rates.

PubMed

Zhang, Dong Qing; Gersberg, Richard M; Hua, Tao; Zhu, Junfei; Tuan, Nguyen Anh; Tan, Soon Keat

2012-04-01

Determining the fate of emerging organic contaminants in an aquatic ecosystem is important for developing constructed wetlands (CWs) treatment technology. Experiments were carried out in subsurface flow CWs in Singapore to evaluate the fate and transport of eight pharmaceutical compounds. The CW system included three parallel horizontal subsurface flow CWs and three parallel unplanted beds fed continuously with synthetic wastewater at different hydraulic retention times (HRTs). The findings of the tests at 2-6 d HRTs showed that the pharmaceuticals could be categorized as (i) efficiently removed compounds with removal higher than 85% (ketoprofen and salicylic acid); (ii) moderately removed compounds with removal efficiencies between 50% and 85% (naproxen, ibuprofen and caffeine); and (iii) poorly removed compounds with efficiency rate lower than 50% (carbamazepine, diclofenac, and clofibric acid). Except for carbamazepine and salicylic acid, removal efficiencies of the selected pharmaceuticals showed significant (p<0.05) enhancement in planted beds as compared to the unplanted beds. Removal of caffeine, ketoprofen and clofibric acid were found to follow first order decay kinetics with decay constants higher in the planted beds than the unplanted beds. Correlations between pharmaceutical removal efficiencies and log K(ow) were not significant (p>0.05), implying that their removal is not well related to the compound's hydrophobicity. Copyright Â© 2011 Elsevier Ltd. All rights reserved.
Follow-up of cortical activity and structure after lesion with laser speckle imaging and magnetic resonance imaging in nonhuman primates

NASA Astrophysics Data System (ADS)

Peuser, Jörn; Belhaj-Saif, Abderraouf; Hamadjida, Adjia; Schmidlin, Eric; Gindrat, Anne-Dominique; Völker, Andreas Charles; Zakharov, Pavel; Hoogewoud, Henri-Marcel; Rouiller, Eric M.; Scheffold, Frank

2011-09-01

The nonhuman primate model is suitable to study mechanisms of functional recovery following lesion of the cerebral cortex (motor cortex), on which therapeutic strategies can be tested. To interpret behavioral data (time course and extent of functional recovery), it is crucial to monitor the properties of the experimental cortical lesion, induced by infusion of the excitotoxin ibotenic acid. In two adult macaque monkeys, ibotenic acid infusions produced a restricted, permanent lesion of the motor cortex. In one monkey, the lesion was monitored over 3.5 weeks, combining laser speckle imaging (LSI) as metabolic readout (cerebral blood flow) and anatomical assessment with magnetic resonance imaging (T2-weighted MRI). The cerebral blood flow, measured online during subsequent injections of the ibotenic acid in the motor cortex, exhibited a dramatic increase, still present after one week, in parallel to a MRI hypersignal. After 3.5 weeks, the cerebral blood flow was strongly reduced (below reference level) and the hypersignal disappeared from the MRI scan, although the lesion was permanent as histologically assessed post-mortem. The MRI data were similar in the second monkey. Our experiments suggest that LSI and MRI, although they reflect different features, vary in parallel during a few weeks following an excitotoxic cortical lesion.
A Well-Known But Still Surprising Generator

NASA Astrophysics Data System (ADS)

Haugland, Ole Anton

2014-12-01

The bicycle generator is often mentioned as an example of a method to produce electric energy. It is cheap and easily accessible, so it is a natural example to use in teaching. There are different types, but I prefer the old side-wall dynamo. The most common explanation of its working principle seems to be something like the illustration in Fig. 1. The illustration is taken from a popular textbook in the Norwegian junior high school.1 Typically it is explained as a system of a moving magnet or coils that directly results in a varying magnetic field through the coils. According to Faraday's law a voltage is induced in the coils. Simple and easy! A few times I have had a chance to glimpse into a bicycle generator, and I was somewhat surprised to sense that the magnet rotated parallel to the turns of the coil. How could the flux through the coil change and induce a voltage when the magnet rotated parallel to the turns of the coil? When teaching electromagnetic induction I have showed the students a dismantled generator and asked them how this could work. They naturally found that this was more difficult to understand than the principle illustrated in Fig. 1. Other authors in this journal have discussed even more challenging questions concerning electric generators.2,3
Accounting for inherent variability of growth in microbial risk assessment.

PubMed

Marks, H M; Coleman, M E

2005-04-15

Risk assessments of pathogens need to account for the growth of small number of cells under varying conditions. In order to determine the possible risks that occur when there are small numbers of cells, stochastic models of growth are needed that would capture the distribution of the number of cells over replicate trials of the same scenario or environmental conditions. This paper provides a simple stochastic growth model, accounting only for inherent cell-growth variability, assuming constant growth kinetic parameters, for an initial, small, numbers of cells assumed to be transforming from a stationary to an exponential phase. Two, basic, microbial sets of assumptions are considered: serial, where it is assume that cells transform through a lag phase before entering the exponential phase of growth; and parallel, where it is assumed that lag and exponential phases develop in parallel. The model is based on, first determining the distribution of the time when growth commences, and then modelling the conditional distribution of the number of cells. For the latter distribution, it is found that a Weibull distribution provides a simple approximation to the conditional distribution of the relative growth, so that the model developed in this paper can be easily implemented in risk assessments using commercial software packages.
: A Scalable and Transparent System for Simulating MPI Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S

2010-01-01

is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, andmore » MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.« less
Template-directed atomically precise self-organization of perfectly ordered parallel cerium silicide nanowire arrays on Si(110)-16 × 2 surfaces.

PubMed

Hong, Ie-Hong; Liao, Yung-Cheng; Tsai, Yung-Feng

2013-11-05

The perfectly ordered parallel arrays of periodic Ce silicide nanowires can self-organize with atomic precision on single-domain Si(110)-16 × 2 surfaces. The growth evolution of self-ordered parallel Ce silicide nanowire arrays is investigated over a broad range of Ce coverages on single-domain Si(110)-16 × 2 surfaces by scanning tunneling microscopy (STM). Three different types of well-ordered parallel arrays, consisting of uniformly spaced and atomically identical Ce silicide nanowires, are self-organized through the heteroepitaxial growth of Ce silicides on a long-range grating-like 16 × 2 reconstruction at the deposition of various Ce coverages. Each atomically precise Ce silicide nanowire consists of a bundle of chains and rows with different atomic structures. The atomic-resolution dual-polarity STM images reveal that the interchain coupling leads to the formation of the registry-aligned chain bundles within individual Ce silicide nanowire. The nanowire width and the interchain coupling can be adjusted systematically by varying the Ce coverage on a Si(110) surface. This natural template-directed self-organization of perfectly regular parallel nanowire arrays allows for the precise control of the feature size and positions within ±0.2 nm over a large area. Thus, it is a promising route to produce parallel nanowire arrays in a straightforward, low-cost, high-throughput process.
Template-directed atomically precise self-organization of perfectly ordered parallel cerium silicide nanowire arrays on Si(110)-16 × 2 surfaces

PubMed Central

2013-01-01

The perfectly ordered parallel arrays of periodic Ce silicide nanowires can self-organize with atomic precision on single-domain Si(110)-16 × 2 surfaces. The growth evolution of self-ordered parallel Ce silicide nanowire arrays is investigated over a broad range of Ce coverages on single-domain Si(110)-16 × 2 surfaces by scanning tunneling microscopy (STM). Three different types of well-ordered parallel arrays, consisting of uniformly spaced and atomically identical Ce silicide nanowires, are self-organized through the heteroepitaxial growth of Ce silicides on a long-range grating-like 16 × 2 reconstruction at the deposition of various Ce coverages. Each atomically precise Ce silicide nanowire consists of a bundle of chains and rows with different atomic structures. The atomic-resolution dual-polarity STM images reveal that the interchain coupling leads to the formation of the registry-aligned chain bundles within individual Ce silicide nanowire. The nanowire width and the interchain coupling can be adjusted systematically by varying the Ce coverage on a Si(110) surface. This natural template-directed self-organization of perfectly regular parallel nanowire arrays allows for the precise control of the feature size and positions within ±0.2 nm over a large area. Thus, it is a promising route to produce parallel nanowire arrays in a straightforward, low-cost, high-throughput process. PMID:24188092
Anisotropic transverse mixing and its effect on reaction rates in multi-scale, 3D heterogeneous porous media

NASA Astrophysics Data System (ADS)

Engdahl, N. B.

2016-12-01

Mixing rates in porous media have been a heavily research topic in recent years covering analytic, random, and structured fields. However, there are some persistent assumptions and common features to these models that raise some questions about the generality of the results. One of these commonalities is the orientation of the flow field with respect to the heterogeneity structure, which are almost always defined to be parallel each other if there is an elongated axis of permeability correlation. Given the vastly different tortuosities for flow parallel to bedding and flow transverse to bedding, this assumption of parallel orientation may have significant effects on reaction rates when natural flows deviate from this assumed setting. This study investigates the role of orientation on mixing and reaction rates in multi-scale, 3D heterogeneous porous media with varying degrees of anisotropy in the correlation structure. Ten realizations of a small flow field, with three anisotropy levels, were simulated for flow parallel and transverse to bedding. Transport was simulated in each model with an advective-diffusive random walk and reactions were simulated using the chemical Langevin equation. The reaction system is a vertically segregated, transverse mixing problem between two mobile reactants. The results show that different transport behaviors and reaction rates are obtained by simply rotating the direction of flow relative to bedding, even when the net flux in both directions is the same. This kind of behavior was observed for three different weightings of the initial condition: 1) uniform, 2) flux-based, and 3) travel time based. The different schemes resulted in 20-50% more mass formation in the transverse direction than the longitudinal. The greatest variability in mass was observed for the flux weights and these were proportionate to the level of anisotropy. The implications of this study are that flux or travel time weights do not provide any guarantee of a fair comparison in this kind of a mixing scenario and that the role of directional tendencies on reaction rates can be significant. Further, it may be necessary to include anisotropy in future upscaled models to create robust methods that give representative reaction rates for any flow direction relative to geologic bedding.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P

DOE Office of Scientific and Technical Information (OSTI.GOV)

Candel, A.; Kabel, A.; Lee, L.

In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Solid State Mini-RPV Color Imaging System

DTIC Science & Technology

1975-09-12

completed in the design and construction phase . Con- siderations are now in progress for conducting field tests of the equipment against "real world...Simplified Parallel Injection Configuration 2-21 CID Parallel Injection Configuration 2-23 Element Rate Timing 2-25 Horizontal Input and Phase Line...Timing 2-26 Line Reset /Injection Timing 2-27 Line Rate Timing (Start of Readout) 2-28 Driver A4 Block Diagram 2-31 Element Scan Time Base
Real-time electron dynamics for massively parallel excited-state simulations

NASA Astrophysics Data System (ADS)

Andrade, Xavier

The simulation of the real-time dynamics of electrons, based on time dependent density functional theory (TDDFT), is a powerful approach to study electronic excited states in molecular and crystalline systems. What makes the method attractive is its flexibility to simulate different kinds of phenomena beyond the linear-response regime, including strongly-perturbed electronic systems and non-adiabatic electron-ion dynamics. Electron-dynamics simulations are also attractive from a computational point of view. They can run efficiently on massively parallel architectures due to the low communication requirements. Our implementations of electron dynamics, based on the codes Octopus (real-space) and Qball (plane-waves), allow us to simulate systems composed of thousands of atoms and to obtain good parallel scaling up to 1.6 million processor cores. Due to the versatility of real-time electron dynamics and its parallel performance, we expect it to become the method of choice to apply the capabilities of exascale supercomputers for the simulation of electronic excited states.
Parallel optoelectronic trinary signed-digit division

NASA Astrophysics Data System (ADS)

Alam, Mohammad S.

1999-03-01

The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.

A time-parallel approach to strong-constraint four-dimensional variational data assimilation

NASA Astrophysics Data System (ADS)

Rao, Vishwas; Sandu, Adrian

2016-05-01

A parallel-in-time algorithm based on an augmented Lagrangian approach is proposed to solve four-dimensional variational (4D-Var) data assimilation problems. The assimilation window is divided into multiple sub-intervals that allows parallelization of cost function and gradient computations. The solutions to the continuity equations across interval boundaries are added as constraints. The augmented Lagrangian approach leads to a different formulation of the variational data assimilation problem than the weakly constrained 4D-Var. A combination of serial and parallel 4D-Vars to increase performance is also explored. The methodology is illustrated on data assimilation problems involving the Lorenz-96 and the shallow water models.
Serial and parallel attentive visual searches: evidence from cumulative distribution functions of response times.

PubMed

Sung, Kyongje

2008-12-01

Participants searched a visual display for a target among distractors. Each of 3 experiments tested a condition proposed to require attention and for which certain models propose a serial search. Serial versus parallel processing was tested by examining effects on response time means and cumulative distribution functions. In 2 conditions, the results suggested parallel rather than serial processing, even though the tasks produced significant set-size effects. Serial processing was produced only in a condition with a difficult discrimination and a very large set-size effect. The results support C. Bundesen's (1990) claim that an extreme set-size effect leads to serial processing. Implications for parallel models of visual selection are discussed.
A WENO-Limited, ADER-DT, Finite-Volume Scheme for Efficient, Robust, and Communication-Avoiding Multi-Dimensional Transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

Norman, Matthew R

2014-01-01

The novel ADER-DT time discretization is applied to two-dimensional transport in a quadrature-free, WENO- and FCT-limited, Finite-Volume context. Emphasis is placed on (1) the serial and parallel computational properties of ADER-DT and this framework and (2) the flexibility of ADER-DT and this framework in efficiently balancing accuracy with other constraints important to transport applications. This study demonstrates a range of choices for the user when approaching their specific application while maintaining good parallel properties. In this method, genuine multi-dimensionality, single-step and single-stage time stepping, strict positivity, and a flexible range of limiting are all achieved with only one parallel synchronizationmore » and data exchange per time step. In terms of parallel data transfers per simulated time interval, this improves upon multi-stage time stepping and post-hoc filtering techniques such as hyperdiffusion. This method is evaluated with standard transport test cases over a range of limiting options to demonstrate quantitatively and qualitatively what a user should expect when employing this method in their application.« less
Space shuttle system program definition. Volume 4: Cost and schedule report

NASA Technical Reports Server (NTRS)

1972-01-01

The supporting cost and schedule data for the second half of the Space Shuttle System Phase B Extension Study is summarized. The major objective for this period was to address the cost/schedule differences affecting final selection of the HO orbiter space shuttle system. The contending options under study included the following booster launch configurations: (1) series burn ballistic recoverable booster (BRB), (2) parallel burn ballistic recoverable booster (BRB), (3) series burn solid rocket motors (SRM's), and (4) parallel burn solid rocket motors (SRM's). The implications of varying payload bay sizes for the orbiter, engine type for the ballistics recoverable booster, and SRM motors for the solid booster were examined.
Elliptically polarizing adjustable phase insertion device

DOEpatents

Carr, R.

1995-01-17

An insertion device for extracting polarized electromagnetic energy from a beam of particles is disclosed. The insertion device includes four linear arrays of magnets which are aligned with the particle beam. The magnetic field strength to which the particles are subjected is adjusted by altering the relative alignment of the arrays in a direction parallel to that of the particle beam. Both the energy and polarization of the extracted energy may be varied by moving the relevant arrays parallel to the beam direction. The present invention requires a substantially simpler and more economical superstructure than insertion devices in which the magnetic field strength is altered by changing the gap between arrays of magnets. 3 figures.
Estimating water flow through a hillslope using the massively parallel processor

NASA Technical Reports Server (NTRS)

Devaney, Judy E.; Camillo, P. J.; Gurney, R. J.

1988-01-01

A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all modelled, and the rainfall rates are allowed to vary spatially. Previous models of this type had always been very limited computationally. This model takes less than a minute to model all the components of the hillslope water flow for a day. The model can now be used in sensitivity studies to specify which measurements should be taken and how accurate they should be to describe such flows for environmental studies.
Tunable high-q superconducting notch filter

DOEpatents

Pang, C.S.; Falco, C.M.; Kampwirth, R.T.; Schuller, I.K.

1979-11-29

A superconducting notch filter is made of three substrates disposed in a cryogenic environment. A superconducting material is disposed on one substrate in a pattern of a circle and an annular ring connected together. The second substrate has a corresponding pattern to form a parallel plate capacitor and the second substrate has the circle and annular ring connected by a superconducting spiral that forms an inductor. The third substrate has a superconducting spiral that is placed parallel to the first superconducting spiral to form a transformer. Relative motion of the first substrate with respect to the second is effected from outside the cryogenic environment to vary the capacitance and hence the frequency of the resonant circuit formed by the superconducting devices.
Jet Noise Source Localization Using Linear Phased Array

NASA Technical Reports Server (NTRS)

Agboola, Ferni A.; Bridges, James

2004-01-01

A study was conducted to further clarify the interpretation and application of linear phased array microphone results, for localizing aeroacoustics sources in aircraft exhaust jet. Two model engine nozzles were tested at varying power cycles with the array setup parallel to the jet axis. The array position was varied as well to determine best location for the array. The results showed that it is possible to resolve jet noise sources with bypass and other components separation. The results also showed that a focused near field image provides more realistic noise source localization at low to mid frequencies.
MMS Observations of Parallel Electric Fields During a Quasi-Perpendicular Bow Shock Crossing

NASA Astrophysics Data System (ADS)

Goodrich, K.; Schwartz, S. J.; Ergun, R.; Wilder, F. D.; Holmes, J.; Burch, J. L.; Gershman, D. J.; Giles, B. L.; Khotyaintsev, Y. V.; Le Contel, O.; Lindqvist, P. A.; Strangeway, R. J.; Russell, C.; Torbert, R. B.

2016-12-01

Previous observations of the terrestrial bow shock have frequently shown large-amplitude fluctuations in the parallel electric field. These parallel electric fields are seen as both nonlinear solitary structures, such as double layers and electron phase-space holes, and short-wavelength waves, which can reach amplitudes greater than 100 mV/m. The Magnetospheric Multi-Scale (MMS) Mission has crossed the Earth's bow shock more than 200 times. The parallel electric field signatures observed in these crossings are seen in very discrete packets and evolve over time scales of less than a second, indicating the presence of a wealth of kinetic-scale activity. The high time resolution of the Fast Particle Instrument (FPI) available on MMS offers greater detail of the kinetic-scale physics that occur at bow shocks than ever before, allowing greater insight into the overall effect of these observed electric fields. We present a characterization of these parallel electric fields found in a single bow shock event and how it reflects the kinetic-scale activity that can occur at the terrestrial bow shock.
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM

NASA Astrophysics Data System (ADS)

de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl

2002-03-01

We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
Program For Parallel Discrete-Event Simulation

NASA Technical Reports Server (NTRS)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Real-time trajectory optimization on parallel processors

NASA Technical Reports Server (NTRS)

Psiaki, Mark L.

1993-01-01

A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.
Parallel processing and expert systems

NASA Technical Reports Server (NTRS)

Lau, Sonie; Yan, Jerry C.

1991-01-01

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.
Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

PubMed Central

Su, Huayou; Wen, Mei; Wu, Nan; Ren, Ju; Zhang, Chunyuan

2014-01-01

Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design. PMID:24757432
A visual parallel-BCI speller based on the time-frequency coding strategy

NASA Astrophysics Data System (ADS)

Xu, Minpeng; Chen, Long; Zhang, Lixin; Qi, Hongzhi; Ma, Lan; Tang, Jiabei; Wan, Baikun; Ming, Dong

2014-04-01

Objective. Spelling is one of the most important issues in brain-computer interface (BCI) research. This paper is to develop a visual parallel-BCI speller system based on the time-frequency coding strategy in which the sub-speller switching among four simultaneously presented sub-spellers and the character selection are identified in a parallel mode. Approach. The parallel-BCI speller was constituted by four independent P300+SSVEP-B (P300 plus SSVEP blocking) spellers with different flicker frequencies, thereby all characters had a specific time-frequency code. To verify its effectiveness, 11 subjects were involved in the offline and online spellings. A classification strategy was designed to recognize the target character through jointly using the canonical correlation analysis and stepwise linear discriminant analysis. Main results. Online spellings showed that the proposed parallel-BCI speller had a high performance, reaching the highest information transfer rate of 67.4 bit min-1, with an average of 54.0 bit min-1 and 43.0 bit min-1 in the three rounds and five rounds, respectively. Significance. The results indicated that the proposed parallel-BCI could be effectively controlled by users with attention shifting fluently among the sub-spellers, and highly improved the BCI spelling performance.
A comparison between orthogonal and parallel plating methods for distal humerus fractures: a prospective randomized trial.

PubMed

Lee, Sang Ki; Kim, Kap Jung; Park, Kyung Hoon; Choy, Won Sik

2014-10-01

With the continuing improvements in implants for distal humerus fractures, it is expected that newer types of plates, which are anatomically precontoured, thinner and less irritating to soft tissue, would have comparable outcomes when used in a clinical study. The purpose of this study was to compare the clinical and radiographic outcomes in patients with distal humerus fractures who were treated with orthogonal and parallel plating methods using precontoured distal humerus plates. Sixty-seven patients with a mean age of 55.4 years (range 22-90 years) were included in this prospective study. The subjects were randomly assigned to receive 1 of 2 treatments: orthogonal or parallel plating. The following results were assessed: operating time, time to fracture union, presence of a step or gap at the articular margin, varus-valgus angulation, functional recovery, and complications. No intergroup differences were observed based on radiological and clinical results between the groups. In our practice, no significant differences were found between the orthogonal and parallel plating methods in terms of clinical outcomes, mean operation time, union time, or complication rates. There were no cases of fracture nonunion in either group; heterotrophic ossification was found 3 patients in orthogonal plating group and 2 patients in parallel plating group. In our practice, no significant differences were found between the orthogonal and parallel plating methods in terms of clinical outcomes or complication rates. However, orthogonal plating method may be preferred in cases of coronal shear fractures, where posterior to anterior fixation may provide additional stability to the intraarticular fractures. Additionally, parallel plating method may be the preferred technique used for fractures that occur at the most distal end of the humerus.
Estimation of Time-Varying, Intrinsic and Reflex Dynamic Joint Stiffness during Movement. Application to the Ankle Joint

PubMed Central

Guarín, Diego L.; Kearney, Robert E.

2017-01-01

Dynamic joint stiffness determines the relation between joint position and torque, and plays a vital role in the control of posture and movement. Dynamic joint stiffness can be quantified during quasi-stationary conditions using disturbance experiments, where small position perturbations are applied to the joint and the torque response is recorded. Dynamic joint stiffness is composed of intrinsic and reflex mechanisms that act and change together, so that nonlinear, mathematical models and specialized system identification techniques are necessary to estimate their relative contributions to overall joint stiffness. Quasi-stationary experiments have demonstrated that dynamic joint stiffness is heavily modulated by joint position and voluntary torque. Consequently, during movement, when joint position and torque change rapidly, dynamic joint stiffness will be Time-Varying (TV). This paper introduces a new method to quantify the TV intrinsic and reflex components of dynamic joint stiffness during movement. The algorithm combines ensemble and deterministic approaches for estimation of TV systems; and uses a TV, parallel-cascade, nonlinear system identification technique to separate overall dynamic joint stiffness into intrinsic and reflex components from position and torque records. Simulation studies of a stiffness model, whose parameters varied with time as is expected during walking, demonstrated that the new algorithm accurately tracked the changes in dynamic joint stiffness using as little as 40 gait cycles. The method was also used to estimate the intrinsic and reflex dynamic ankle stiffness from an experiment with a healthy subject during which ankle movements were imposed while the subject maintained a constant muscle contraction. The method identified TV stiffness model parameters that predicted the measured torque very well, accounting for more than 95% of its variance. Moreover, both intrinsic and reflex dynamic stiffness were heavily modulated through the movement in a manner that could not be predicted from quasi-stationary experiments. The new method provides the tool needed to explore the role of dynamic stiffness in the control of movement. PMID:28649196
Evolving Concepts of Asthma

PubMed Central

Ray, Anuradha; Wenzel, Sally E.

2015-01-01

Our understanding of asthma has evolved over time from a singular disease to a complex of various phenotypes, with varied natural histories, physiologies, and responses to treatment. Early therapies treated most patients with asthma similarly, with bronchodilators and corticosteroids, but these therapies had varying degrees of success. Similarly, despite initial studies that identified an underlying type 2 inflammation in the airways of patients with asthma, biologic therapies targeted toward these type 2 pathways were unsuccessful in all patients. These observations led to increased interest in phenotyping asthma. Clinical approaches, both biased and later unbiased/statistical approaches to large asthma patient cohorts, identified a variety of patient characteristics, but they also consistently identified the importance of age of onset of disease and the presence of eosinophils in determining clinically relevant phenotypes. These paralleled molecular approaches to phenotyping that developed an understanding that not all patients share a type 2 inflammatory pattern. Using biomarkers to select patients with type 2 inflammation, repeated trials of biologics directed toward type 2 cytokine pathways saw newfound success, confirming the importance of phenotyping in asthma. Further research is needed to clarify additional clinical and molecular phenotypes, validate predictive biomarkers, and identify new areas for possible interventions. PMID:26161792
Feedback topology and XOR-dynamics in Boolean networks with varying input structure

NASA Astrophysics Data System (ADS)

Ciandrini, L.; Maffi, C.; Motta, A.; Bassetti, B.; Cosentino Lagomarsino, M.

2009-08-01

We analyze a model of fixed in-degree random Boolean networks in which the fraction of input-receiving nodes is controlled by the parameter γ . We investigate analytically and numerically the dynamics of graphs under a parallel XOR updating scheme. This scheme is interesting because it is accessible analytically and its phenomenology is at the same time under control and as rich as the one of general Boolean networks. We give analytical formulas for the dynamics on general graphs, showing that with a XOR-type evolution rule, dynamic features are direct consequences of the topological feedback structure, in analogy with the role of relevant components in Kauffman networks. Considering graphs with fixed in-degree, we characterize analytically and numerically the feedback regions using graph decimation algorithms (Leaf Removal). With varying γ , this graph ensemble shows a phase transition that separates a treelike graph region from one in which feedback components emerge. Networks near the transition point have feedback components made of disjoint loops, in which each node has exactly one incoming and one outgoing link. Using this fact, we provide analytical estimates of the maximum period starting from topological considerations.
Feedback topology and XOR-dynamics in Boolean networks with varying input structure.

PubMed

Ciandrini, L; Maffi, C; Motta, A; Bassetti, B; Cosentino Lagomarsino, M

2009-08-01

We analyze a model of fixed in-degree random Boolean networks in which the fraction of input-receiving nodes is controlled by the parameter gamma. We investigate analytically and numerically the dynamics of graphs under a parallel XOR updating scheme. This scheme is interesting because it is accessible analytically and its phenomenology is at the same time under control and as rich as the one of general Boolean networks. We give analytical formulas for the dynamics on general graphs, showing that with a XOR-type evolution rule, dynamic features are direct consequences of the topological feedback structure, in analogy with the role of relevant components in Kauffman networks. Considering graphs with fixed in-degree, we characterize analytically and numerically the feedback regions using graph decimation algorithms (Leaf Removal). With varying gamma , this graph ensemble shows a phase transition that separates a treelike graph region from one in which feedback components emerge. Networks near the transition point have feedback components made of disjoint loops, in which each node has exactly one incoming and one outgoing link. Using this fact, we provide analytical estimates of the maximum period starting from topological considerations.

Relationships between pathology and crystal structure in breast calcifications: an in situ X-ray diffraction study in histological sections

PubMed Central

Scott, Robert; Stone, Nicholas; Kendall, Catherine; Geraki, Kalotina; Rogers, Keith

2016-01-01

Calcifications are not only one of the most important early diagnostic markers of breast cancer, but are also increasingly believed to aggravate the proliferation of cancer cells and invasion of surrounding tissue. Moreover, this influence appears to vary with calcification composition. Despite this, remarkably little is known about the composition and crystal structure of the most common type of breast calcifications, and how this differs between benign and malignant lesions. We sought to determine how the phase composition and crystallographic parameters within calcifications varies with pathology, using synchrotron X-ray diffraction. This is the first time crystallite size and lattice parameters have been measured in breast calcifications, and we found that these both parallel closely the changes in these parameters with age observed in fetal bone. We also discovered that these calcifications contain a small proportion of magnesium whitlockite, and that this proportion increases from benign to in situ to invasive cancer. When combined with other recent evidence on the effect of magnesium on hydroxyapatite precipitation, this suggests a mechanism explaining observations that carbonate levels within breast calcifications are lower in malignant specimens. PMID:28721386
Protecting complex infrastructures against multiple strategic attackers

NASA Astrophysics Data System (ADS)

Hausken, Kjell

2011-01-01

Infrastructures are analysed subject to defence by a strategic defender and attack by multiple strategic attackers. A framework is developed where each agent determines how much to invest in defending versus attacking each of multiple targets. A target can have economic, human and symbolic values, which generally vary across agents. Investment expenditure functions for each agent can be linear in the investment effort, concave, convex, logistic, can increase incrementally, or can be subject to budget constraints. Contest success functions (e.g., ratio and difference forms) determine the probability of a successful attack on each target, dependent on the relative investments of the defender and attackers on each target, and on characteristics of the contest. Targets can be in parallel, in series, interlinked, interdependent or independent. The defender minimises the expected damage plus the defence expenditures. Each attacker maximises the expected damage minus the attack expenditures. The number of free choice variables equals the number of agents times the number of targets, or lower if there are budget constraints. Each agent is interested in how his investments vary across the targets, and the impact on his utilities. Alternative optimisation programmes are discussed, together with repeated games, dynamic games and incomplete information. An example is provided for illustration.
OceanXtremes: Scalable Anomaly Detection in Oceanographic Time-Series

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Armstrong, E. M.; Chin, T. M.; Gill, K. M.; Greguska, F. R., III; Huang, T.; Jacob, J. C.; Quach, N.

2016-12-01

The oceanographic community must meet the challenge to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data-intensive reality, we are developing an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of 15 to 30-year ocean science datasets.Our parallel analytics engine is extending the NEXUS system and exploits multiple open-source technologies: Apache Cassandra as a distributed spatial "tile" cache, Apache Spark for in-memory parallel computation, and Apache Solr for spatial search and storing pre-computed tile statistics and other metadata. OceanXtremes provides these key capabilities: Parallel generation (Spark on a compute cluster) of 15 to 30-year Ocean Climatologies (e.g. sea surface temperature or SST) in hours or overnight, using simple pixel averages or customizable Gaussian-weighted "smoothing" over latitude, longitude, and time; Parallel pre-computation, tiling, and caching of anomaly fields (daily variables minus a chosen climatology) with pre-computed tile statistics; Parallel detection (over the time-series of tiles) of anomalies or phenomena by regional area-averages exceeding a specified threshold (e.g. high SST in El Nino or SST "blob" regions), or more complex, custom data mining algorithms; Shared discovery and exploration of ocean phenomena and anomalies (facet search using Solr), along with unexpected correlations between key measured variables; Scalable execution for all capabilities on a hybrid Cloud, using our on-premise OpenStack Cloud cluster or at Amazon. The key idea is that the parallel data-mining operations will be run "near" the ocean data archives (a local "network" hop) so that we can efficiently access the thousands of files making up a three decade time-series. The presentation will cover the architecture of OceanXtremes, parallelization of the climatology computation and anomaly detection algorithms using Spark, example results for SST and other time-series, and parallel performance metrics.
An embedded multi-core parallel model for real-time stereo imaging

NASA Astrophysics Data System (ADS)

He, Wenjing; Hu, Jian; Niu, Jingyu; Li, Chuanrong; Liu, Guangyu

2018-04-01

The real-time processing based on embedded system will enhance the application capability of stereo imaging for LiDAR and hyperspectral sensor. The task partitioning and scheduling strategies for embedded multiprocessor system starts relatively late, compared with that for PC computer. In this paper, aimed at embedded multi-core processing platform, a parallel model for stereo imaging is studied and verified. After analyzing the computing amount, throughout capacity and buffering requirements, a two-stage pipeline parallel model based on message transmission is established. This model can be applied to fast stereo imaging for airborne sensors with various characteristics. To demonstrate the feasibility and effectiveness of the parallel model, a parallel software was designed using test flight data, based on the 8-core DSP processor TMS320C6678. The results indicate that the design performed well in workload distribution and had a speed-up ratio up to 6.4.
Message-passing-interface-based parallel FDTD investigation on the EM scattering from a 1-D rough sea surface using uniaxial perfectly matched layer absorbing boundary.

PubMed

Li, J; Guo, L-X; Zeng, H; Han, X-B

2009-06-01

A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
An Intrinsic Algorithm for Parallel Poisson Disk Sampling on Arbitrary Surfaces.

PubMed

Ying, Xiang; Xin, Shi-Qing; Sun, Qian; He, Ying

2013-03-08

Poisson disk sampling plays an important role in a variety of visual computing, due to its useful statistical property in distribution and the absence of aliasing artifacts. While many effective techniques have been proposed to generate Poisson disk distribution in Euclidean space, relatively few work has been reported to the surface counterpart. This paper presents an intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces. We propose a new technique for parallelizing the dart throwing. Rather than the conventional approaches that explicitly partition the spatial domain to generate the samples in parallel, our approach assigns each sample candidate a random and unique priority that is unbiased with regard to the distribution. Hence, multiple threads can process the candidates simultaneously and resolve conflicts by checking the given priority values. It is worth noting that our algorithm is accurate as the generated Poisson disks are uniformly and randomly distributed without bias. Our method is intrinsic in that all the computations are based on the intrinsic metric and are independent of the embedding space. This intrinsic feature allows us to generate Poisson disk distributions on arbitrary surfaces. Furthermore, by manipulating the spatially varying density function, we can obtain adaptive sampling easily.
Dynamical Generation of Quasi-Stationary Alfvenic Double Layers and Charge Holes and Unified Theory of Quasi-Static and Alfvenic Auroral Arc Formation

NASA Astrophysics Data System (ADS)

Song, Y.; Lysak, R. L.

2015-12-01

Parallel E-fields play a crucial role for the acceleration of charged particles, creating discrete aurorae. However, once the parallel electric fields are produced, they will disappear right away, unless the electric fields can be continuously generated and sustained for a fairly long time. Thus, the crucial question in auroral physics is how to generate such a powerful and self-sustained parallel electric fields which can effectively accelerate charge particles to high energy during a fairly long time. We propose that nonlinear interaction of incident and reflected Alfven wave packets in inhomogeneous auroral acceleration region can produce quasi-stationary non-propagating electromagnetic plasma structures, such as Alfvenic double layers (DLs) and Charge Holes. Such Alfvenic quasi-static structures often constitute powerful high energy particle accelerators. The Alfvenic DL consists of localized self-sustained powerful electrostatic electric fields nested in a low density cavity and surrounded by enhanced magnetic and mechanical stresses. The enhanced magnetic and velocity fields carrying the free energy serve as a local dynamo, which continuously create the electrostatic parallel electric field for a fairly long time. The generated parallel electric fields will deepen the seed low density cavity, which then further quickly boosts the stronger parallel electric fields creating both Alfvenic and quasi-static discrete aurorae. The parallel electrostatic electric field can also cause ion outflow, perpendicular ion acceleration and heating, and may excite Auroral Kilometric Radiation.
Parallel Spectral Acquisition with an Ion Cyclotron Resonance Cell Array.

PubMed

Park, Sung-Gun; Anderson, Gordon A; Navare, Arti T; Bruce, James E

2016-01-19

Mass measurement accuracy is a critical analytical figure-of-merit in most areas of mass spectrometry application. However, the time required for acquisition of high-resolution, high mass accuracy data limits many applications and is an aspect under continual pressure for development. Current efforts target implementation of higher electrostatic and magnetic fields because ion oscillatory frequencies increase linearly with field strength. As such, the time required for spectral acquisition of a given resolving power and mass accuracy decreases linearly with increasing fields. Mass spectrometer developments to include multiple high-resolution detectors that can be operated in parallel could further decrease the acquisition time by a factor of n, the number of detectors. Efforts described here resulted in development of an instrument with a set of Fourier transform ion cyclotron resonance (ICR) cells as detectors that constitute the first MS array capable of parallel high-resolution spectral acquisition. ICR cell array systems consisting of three or five cells were constructed with printed circuit boards and installed within a single superconducting magnet and vacuum system. Independent ion populations were injected and trapped within each cell in the array. Upon filling the array, all ions in all cells were simultaneously excited and ICR signals from each cell were independently amplified and recorded in parallel. Presented here are the initial results of successful parallel spectral acquisition, parallel mass spectrometry (MS) and MS/MS measurements, and parallel high-resolution acquisition with the MS array system.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

PubMed Central

Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
ELECTROSTRICTION VALVE

DOEpatents

Kippenhan, D.O.

1962-09-25

An accurately controlled, pulse gas valve is designed capable of delivering output pulses which vary in length from one-tenth millisecond to one second or more, repeated at intervals of a few milliseconds or- more. The pulsed gas valve comprises a column formed of barium titanate discs mounted in stacked relation and electrically connected in parallel, with means for applying voltage across the discs to cause them to expand and effect a mechanical elongation axially of the column. The column is mounted within an enclosure having an inlet port and an outlet port with an internal seat in communication with the outlet port, such that a plug secured to the end of the column will engage the seat of the outlet port to close the outlet port in response to the application of voltage is regulated by a conventional electronic timing circuit connected to the column. (AEC)
Influence of the ablation plume on the removal process during ArF-excimer laser photoablation

NASA Astrophysics Data System (ADS)

Doerbecker, Christina; Lubatschowski, Holger; Lohmann, Stefan; Ruff, Christine; Kermani, Omid; Ertmer, Wolfgang

1996-01-01

Correction of myopia with the ArF-excimer laser (PRK) sometimes leads to a so called 'central island' formation on the anterior corneal surface. The attenuation of the laser beam by the ablation plume might be one reason for this phenomenon. The attenuation properties of the ablation plume were investigated by a probe beam parallel to the surface of the tissue probe. By varying the laser parameters (fluence, repetition rate, spot size) and the target tissue (cornea, PMMA) the attenuation of the probe beam was measured time and spatial resolved. As a result of this study, a significant influence of the removal process due to scattering and absorption within the ablation plume can be assumed as a function of repetition rate, spot size and air flow on the tissue surface.
Method and apparatus for ultrasonic doppler velocimetry using speed of sound and reflection mode pulsed wideband doppler

DOEpatents

Shekarriz, Alireza; Sheen, David M.

2000-01-01

According to the present invention, a method and apparatus rely upon tomographic measurement of the speed of sound and fluid velocity in a pipe. The invention provides a more accurate profile of velocity within flow fields where the speed of sound varies within the cross-section of the pipe. This profile is obtained by reconstruction of the velocity profile from the local speed of sound measurement simultaneously with the flow velocity. The method of the present invention is real-time tomographic ultrasonic Doppler velocimetry utilizing a to plurality of ultrasonic transmission and reflection measurements along two orthogonal sets of parallel acoustic lines-of-sight. The fluid velocity profile and the acoustic velocity profile are determined by iteration between determining a fluid velocity profile and measuring local acoustic velocity until convergence is reached.
Unsteady Heat and Mass Transfer of Chemically Reacting Micropolar Fluid in a Porous Channel with Hall and Ion Slip Currents

PubMed Central

2014-01-01

This paper presents an incompressible two-dimensional heat and mass transfer of an electrically conducting micropolar fluid flow in a porous medium between two parallel plates with chemical reaction, Hall and ion slip effects. Let there be periodic injection or suction at the lower and upper plates and the nonuniform temperature and concentration at the plates are varying periodically with time. The flow field equations are reduced to nonlinear ordinary differential equations using similarity transformations and then solved numerically by quasilinearization technique. The profiles of velocity components, microrotation, temperature distribution and concentration are studied for different values of fluid and geometric parameters such as Hartmann number, Hall and ion slip parameters, inverse Darcy parameter, Prandtl number, Schmidt number, and chemical reaction rate and shown in the form of graphs. PMID:27419211
Photosynthetic light reactions increase total lipid accumulation in carbon-supplemented batch cultures of Chlorella vulgaris.

PubMed

Woodworth, Benjamin D; Mead, Rebecca L; Nichols, Courtney N; Kolling, Derrick R J

2015-03-01

Microalgae are an attractive biofuel feedstock because of their high lipid to biomass ratios, lipid compositions that are suitable for biodiesel production, and the ability to grow on varied carbon sources. While algae can grow autotrophically, supplying an exogenous carbon source can increase growth rates and allow heterotrophic growth in the absence of light. Time course analyses of dextrose-supplemented Chlorella vulgaris batch cultures demonstrate that light availability directly influences growth rate, chlorophyll production, and total lipid accumulation. Parallel photomixotrophic and heterotrophic cultures grown to stationary phase reached the same amount of biomass, but total lipid content was higher for algae grown in the presence of light (an average of 1.90 mg/mL vs. 0.77 mg/mL over 5 days of stationary phase growth). Copyright © 2014 Elsevier Ltd. All rights reserved.
Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster.

PubMed

Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu

2018-04-20

A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
Converging Oceaniac Internal Waves, Somalia, Africa

NASA Image and Video Library

1988-10-03

The arculate fronts of these apparently converging internal waves off the northeast coast of Somalia (11.5N, 51.5E) probably were produced by interaction with two parallel submarine canyons off the Horn of Africa. Internal waves are packets of tidally generated waves traveling within the ocean at varying depths and are not detectable by any surface disturbance.
Experimental and Numerical Analysis of Electric Currents and Electromagnetic Blunting of Cracks in Thin Plates

DTIC Science & Technology

1984-12-01

currents are assumed to flow parallel to midsurface of the plate. 6. The normal component of the induced magnetic field does not vary across the...is coincident with the midsurface of the plate. The relationship between the two coordinates is given by: X = x(a, B) ^ y = y(c’, e) Z
Lexical Competition during Second-Language Listening: Sentence Context, but Not Proficiency, Constrains Interference from the Native Lexicon

ERIC Educational Resources Information Center

Chambers, Craig G.; Cooke, Hilary

2009-01-01

A spoken language eye-tracking methodology was used to evaluate the effects of sentence context and proficiency on parallel language activation during spoken language comprehension. Nonnative speakers with varying proficiency levels viewed visual displays while listening to French sentences (e.g., "Marie va decrire la poule" [Marie will…
Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density

NASA Astrophysics Data System (ADS)

Hohl, A.; Delmelle, E. M.; Tang, W.

2015-07-01

Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.
Accelerating EPI distortion correction by utilizing a modern GPU-based parallel computation.

PubMed

Yang, Yao-Hao; Huang, Teng-Yi; Wang, Fu-Nien; Chuang, Tzu-Chao; Chen, Nan-Kuei

2013-04-01

The combination of phase demodulation and field mapping is a practical method to correct echo planar imaging (EPI) geometric distortion. However, since phase dispersion accumulates in each phase-encoding step, the calculation complexity of phase modulation is Ny-fold higher than conventional image reconstructions. Thus, correcting EPI images via phase demodulation is generally a time-consuming task. Parallel computing by employing general-purpose calculations on graphics processing units (GPU) can accelerate scientific computing if the algorithm is parallelized. This study proposes a method that incorporates the GPU-based technique into phase demodulation calculations to reduce computation time. The proposed parallel algorithm was applied to a PROPELLER-EPI diffusion tensor data set. The GPU-based phase demodulation method reduced the EPI distortion correctly, and accelerated the computation. The total reconstruction time of the 16-slice PROPELLER-EPI diffusion tensor images with matrix size of 128 × 128 was reduced from 1,754 seconds to 101 seconds by utilizing the parallelized 4-GPU program. GPU computing is a promising method to accelerate EPI geometric correction. The resulting reduction in computation time of phase demodulation should accelerate postprocessing for studies performed with EPI, and should effectuate the PROPELLER-EPI technique for clinical practice. Copyright © 2011 by the American Society of Neuroimaging.

Process for hydraulically mining coal. [28 claims

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shoji, K.; Sieling, R.E.; Taylor, J.T.

The invention is a method for the hydraulic mining of coal of varying hardness. It is described in particular as to coal of the type occurring in the Balmer seam in British Columbia. By the method at least two parallel spaced entries are driven upward through a seam of coal. Monitors are positioned in each entry. Each monitor is horizontally and vertically pivotable, and has nozzle means from which a jet of water under a pressure of about 1900 to 2200 psi is emitted. The high pressure jet cuts the coal, which is then fed to a machine that breaksmore » and crushes the coal into sizes wherein the resultant coal/water slurry will flow down a sloped flume into a dewatering station. The method further embodies differentially retreating along adjacent parallel entries by increments of desirably at least about 40 feet each. By the different retreat system, as a panel of coal is hydraulically mined in one entry, the monitor and associated equipment in a second adjacent parallel entry are moved back the desired increment to the next working position (retreated). When the panel of coal in the first entry is mined, the monitor is retreated in the same manner and hydraulic mining commences in the second adjacent parallel entry. The operation is thus alternated along the length of the parallel entries. 28 claims, 4 figures.« less
Bayesian tomography by interacting Markov chains

NASA Astrophysics Data System (ADS)

Romary, T.

2017-12-01

In seismic tomography, we seek to determine the velocity of the undergound from noisy first arrival travel time observations. In most situations, this is an ill posed inverse problem that admits several unperfect solutions. Given an a priori distribution over the parameters of the velocity model, the Bayesian formulation allows to state this problem as a probabilistic one, with a solution under the form of a posterior distribution. The posterior distribution is generally high dimensional and may exhibit multimodality. Moreover, as it is known only up to a constant, the only sensible way to addressthis problem is to try to generate simulations from the posterior. The natural tools to perform these simulations are Monte Carlo Markov chains (MCMC). Classical implementations of MCMC algorithms generally suffer from slow mixing: the generated states are slow to enter the stationary regime, that is to fit the observations, and when one mode of the posterior is eventually identified, it may become difficult to visit others. Using a varying temperature parameter relaxing the constraint on the data may help to enter the stationary regime. Besides, the sequential nature of MCMC makes them ill fitted toparallel implementation. Running a large number of chains in parallel may be suboptimal as the information gathered by each chain is not mutualized. Parallel tempering (PT) can be seen as a first attempt to make parallel chains at different temperatures communicate but only exchange information between current states. In this talk, I will show that PT actually belongs to a general class of interacting Markov chains algorithm. I will also show that this class enables to design interacting schemes that can take advantage of the whole history of the chain, by authorizing exchanges toward already visited states. The algorithms will be illustrated with toy examples and an application to first arrival traveltime tomography.
Artifacts in time-resolved NUS: A case study of NOE build-up curves from 2D NOESY.

PubMed

Dass, Rupashree; Kasprzak, Paweł; Koźmiński, Wiktor; Kazimierczuk, Krzysztof

2016-04-01

Multidimensional NMR spectroscopy requires time-consuming sampling of indirect dimensions and so is usually used to study stable samples. However, dynamically changing compounds or their mixtures commonly occur in problems of natural science. Monitoring them requires the use multidimensional NMR in a time-resolved manner - in other words, a series of quick spectra must be acquired at different points in time. Among the many solutions that have been proposed to achieve this goal, time-resolved non-uniform sampling (TR-NUS) is one of the simplest. In a TR-NUS experiment, the signal is sampled using a shuffled random schedule and then divided into overlapping subsets. These subsets are then processed using one of the NUS reconstruction methods, for example compressed sensing (CS). The resulting stack of spectra forms a temporal "pseudo-dimension" that shows the changes caused by the process occurring in the sample. CS enables the use of small subsets of data, which minimizes the averaging of the effects studied. Yet, even within these limited timeframes, the sample undergoes certain changes. In this paper we discuss the effect of varying signal amplitude in a TR-NUS experiment. Our theoretical calculations show that the variations within the subsets lead to t1-noise, which is dependent on the rate of change of the signal amplitude. We verify these predictions experimentally. As a model case we choose a novel 2D TR-NOESY experiment in which mixing time is varied in parallel with shuffled NUS in the indirect dimension. The experiment, performed on a sample of strychnine, provides a near-continuous NOE build-up curve, whose shape closely reflects the t1-noise level. 2D TR-NOESY reduces the measurement time compared to the conventional approach and makes it possible to verify the theoretical predictions about signal variations during TR-NUS. Copyright © 2016 Elsevier Inc. All rights reserved.
Growth and recombinant protein expression with Escherichia coli in different batch cultivation media.

PubMed

Hortsch, Ralf; Weuster-Botz, Dirk

2011-04-01

Parallel operated milliliter-scale stirred tank bioreactors were applied for recombinant protein expression studies in simple batch experiments without pH titration. An enzymatic glucose release system (EnBase), a complex medium, and the frequently used LB and TB media were compared with regard to growth of Escherichia coli and recombinant protein expression (alcohol dehydrogenase (ADH) from Lactobacillus brevis and formate dehydrogenase (FDH) from Candida boidinii). Dissolved oxygen and pH were recorded online, optical densities were measured at-line, and the activities of ADH and FDH were analyzed offline. Best growth was observed in a complex medium with maximum dry cell weight concentrations of 14 g L(-1). EnBase cultivations enabled final dry cell weight concentrations between 6 and 8 g L(-1). The pH remained nearly constant in EnBase cultivations due to the continuous glucose release, showing the usefulness of this glucose release system especially for pH-sensitive bioprocesses. Cell-specific enzyme activities varied considerably depending on the different media used. Maximum specific ADH activities were measured with the complex medium, 6 h after induction with IPTG, whereas the highest specific FDH activities were achieved with the EnBase medium at low glucose release profiles 24 h after induction. Hence, depending on the recombinant protein, different medium compositions, times for induction, and times for cell harvest have to be evaluated to achieve efficient expression of recombinant proteins in E. coli. A rapid experimental evaluation can easily be performed with parallel batch operated small-scale stirred tank bioreactors.
Using Clustering to Establish Climate Regimes from PCM Output

NASA Technical Reports Server (NTRS)

Oglesby, Robert; Arnold, James E. (Technical Monitor); Hoffman, Forrest; Hargrove, W. W.; Erickson, D.

2002-01-01

A multivariate statistical clustering technique--based on the k-means algorithm of Hartigan has been used to extract patterns of climatological significance from 200 years of general circulation model (GCM) output. Originally developed and implemented on a Beowulf-style parallel computer constructed by Hoffman and Hargrove from surplus commodity desktop PCs, the high performance parallel clustering algorithm was previously applied to the derivation of ecoregions from map stacks of 9 and 25 geophysical conditions or variables for the conterminous U.S. at a resolution of 1 sq km. Now applied both across space and through time, the clustering technique yields temporally-varying climate regimes predicted by transient runs of the Parallel Climate Model (PCM). Using a business-as-usual (BAU) scenario and clustering four fields of significance to the global water cycle (surface temperature, precipitation, soil moisture, and snow depth) from 1871 through 2098, the authors' analysis shows an increase in spatial area occupied by the cluster or climate regime which typifies desert regions (i.e., an increase in desertification) and a decrease in the spatial area occupied by the climate regime typifying winter-time high latitude perma-frost regions. The patterns of cluster changes have been analyzed to understand the predicted variability in the water cycle on global and continental scales. In addition, representative climate regimes were determined by taking three 10-year averages of the fields 100 years apart for northern hemisphere winter (December, January, and February) and summer (June, July, and August). The result is global maps of typical seasonal climate regimes for 100 years in the past, for the present, and for 100 years into the future. Using three-dimensional data or phase space representations of these climate regimes (i.e., the cluster centroids), the authors demonstrate the portion of this phase space occupied by the land surface at all points in space and time. Any single spot on the globe will exist in one of these climate regimes at any single point in time. By incrementing time, that same spot will trace out a trajectory or orbit between and among these climate regimes (or atmospheric states) in phase (or state) space. When a geographic region enters a state it never previously visited, a climatic change is said to have occurred. Tracing out the entire trajectory of a single spot on the globe yields a 'manifold' in state space representing the shape of its predicted climate occupancy. This sort of analysis enables a researcher to more easily grasp the multivariate behavior of the climate system.
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions

NASA Astrophysics Data System (ADS)

Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin

2017-01-01

We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
Dynamic modeling of parallel robots for computed-torque control implementation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Codourey, A.

1998-12-01

In recent years, increased interest in parallel robots has been observed. Their control with modern theory, such as the computed-torque method, has, however, been restrained, essentially due to the difficulty in establishing a simple dynamic model that can be calculated in real time. In this paper, a simple method based on the virtual work principle is proposed for modeling parallel robots. The mass matrix of the robot, needed for decoupling control strategies, does not explicitly appear in the formulation; however, it can be computed separately, based on kinetic energy considerations. The method is applied to the DELTA parallel robot, leadingmore » to a very efficient model that has been implemented in a real-time computed-torque control algorithm.« less
National Combustion Code Parallel Performance Enhancements

NASA Technical Reports Server (NTRS)

Quealy, Angela; Benyo, Theresa (Technical Monitor)

2002-01-01

The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. The unstructured grid, reacting flow code uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC code to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This report describes recent parallel processing modifications to NCC that have improved the parallel scalability of the code, enabling a two hour turnaround for a 1.3 million element fully reacting combustion simulation on an SGI Origin 2000.
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Experimental Studies Of Pilot Performance At Collision Avoidance During Closely Spaced Parallel Approaches

NASA Technical Reports Server (NTRS)

Pritchett, Amy R.; Hansman, R. John

1997-01-01

Efforts to increase airport capacity include studies of aircraft systems that would enable simultaneous approaches to closely spaced parallel runway in Instrument Meteorological Conditions (IMC). The time-critical nature of a parallel approach results in key design issues for current and future collision avoidance systems. Two part-task flight simulator studies have examined the procedural and display issues inherent in such a time-critical task, the interaction of the pilot with a collision avoidance system, and the alerting criteria and avoidance maneuvers preferred by subjects.
Implementation of a 3D mixing layer code on parallel computers

NASA Technical Reports Server (NTRS)

Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.

1995-01-01

This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.
Design of a real-time wind turbine simulator using a custom parallel architecture

NASA Technical Reports Server (NTRS)

Hoffman, John A.; Gluck, R.; Sridhar, S.

1995-01-01

The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Thermal conductivity of layered organic superconductor β-(BDA-TTP)2SbF6 in a parallel magnetic field: Anomalous effect of coreless vortices

NASA Astrophysics Data System (ADS)

Tanatar, M. A.; Ishiguro, T.; Toita, T.; Yamada, J.

2005-01-01

Thermal conductivity κ of the organic superconductor β-(BDA-TTP)2SbF6 was studied down to 0.3 K in magnetic fields H of varying orientation with respect to the superconducting plane. Anomalous plateau shape of the field dependence, κ vs H , is found for orientation of magnetic fields precisely parallel to the plane, in contrast to usual behavior observed in the perpendicular fields. We show that the lack of magnetic-field effect on the heat conduction results from coreless structure of vortices, causing both negligible scattering of phonons and constant in field electronic conduction up to the fields close to the upper critical field Hc2 . Usual behavior is recovered on approaching Hc2 and on slight field inclination from parallel direction, when normal cores are restored. This behavior points to the lack of bulk quasiparticle excitations induced by magnetic field, consistent with the conventional superconducting state.
Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Sankar, Lakshmi N.; Hixon, Duane

1991-01-01

Efficient iterative solution methods are being developed for the numerical solution of two- and three-dimensional compressible Navier-Stokes equations. Iterative time marching methods have several advantages over classical multi-step explicit time marching schemes, and non-iterative implicit time marching schemes. Iterative schemes have better stability characteristics than non-iterative explicit and implicit schemes. Thus, the extra work required by iterative schemes can also be designed to perform efficiently on current and future generation scalable, missively parallel machines. An obvious candidate for iteratively solving the system of coupled nonlinear algebraic equations arising in CFD applications is the Newton method. Newton's method was implemented in existing finite difference and finite volume methods. Depending on the complexity of the problem, the number of Newton iterations needed per step to solve the discretized system of equations can, however, vary dramatically from a few to several hundred. Another popular approach based on the classical conjugate gradient method, known as the GMRES (Generalized Minimum Residual) algorithm is investigated. The GMRES algorithm was used in the past by a number of researchers for solving steady viscous and inviscid flow problems with considerable success. Here, the suitability of this algorithm is investigated for solving the system of nonlinear equations that arise in unsteady Navier-Stokes solvers at each time step. Unlike the Newton method which attempts to drive the error in the solution at each and every node down to zero, the GMRES algorithm only seeks to minimize the L2 norm of the error. In the GMRES algorithm the changes in the flow properties from one time step to the next are assumed to be the sum of a set of orthogonal vectors. By choosing the number of vectors to a reasonably small value N (between 5 and 20) the work required for advancing the solution from one time step to the next may be kept to (N+1) times that of a noniterative scheme. Many of the operations required by the GMRES algorithm such as matrix-vector multiplies, matrix additions and subtractions can all be vectorized and parallelized efficiently.
PCLIPS: Parallel CLIPS

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

1994-01-01

A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Parallel processing and expert systems

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Lau, Sonie

1991-01-01

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
An integrated runtime and compile-time approach for parallelizing structured and block structured applications

NASA Technical Reports Server (NTRS)

Agrawal, Gagan; Sussman, Alan; Saltz, Joel

1993-01-01

Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.
Effects of a parallel resistor on electrical characteristics of a piezoelectric transformer in open-circuit transient state.

PubMed

Chang, Kuo-Tsai

2007-01-01

This paper investigates electrical transient characteristics of a Rosen-type piezoelectric transformer (PT), including maximum voltages, time constants, energy losses and average powers, and their improvements immediately after turning OFF. A parallel resistor connected to both input terminals of the PT is needed to improve the transient characteristics. An equivalent circuit for the PT is first given. Then, an open-circuit voltage, involving a direct current (DC) component and an alternating current (AC) component, and its related energy losses are derived from the equivalent circuit with initial conditions. Moreover, an AC power control system, including a DC-to-AC resonant inverter, a control switch and electronic instruments, is constructed to determine the electrical characteristics of the OFF transient state. Furthermore, the effects of the parallel resistor on the transient characteristics at different parallel resistances are measured. The advantages of adding the parallel resistor also are discussed. From the measured results, the DC time constant is greatly decreased from 9 to 0.04 ms by a 10 k(omega) parallel resistance under open output.
A hybrid parallel framework for the cellular Potts model simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Yi; He, Kejing; Dong, Shoubin

2009-01-01

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
NDL-v2.0: A new version of the numerical differentiation library for parallel architectures

NASA Astrophysics Data System (ADS)

Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.

2014-07-01

We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.

On some methods for improving time of reachability sets computation for the dynamic system control problem

NASA Astrophysics Data System (ADS)

Zimovets, Artem; Matviychuk, Alexander; Ushakov, Vladimir

2016-12-01

The paper presents two different approaches to reduce the time of computer calculation of reachability sets. First of these two approaches use different data structures for storing the reachability sets in the computer memory for calculation in single-threaded mode. Second approach is based on using parallel algorithms with reference to the data structures from the first approach. Within the framework of this paper parallel algorithm of approximate reachability set calculation on computer with SMP-architecture is proposed. The results of numerical modelling are presented in the form of tables which demonstrate high efficiency of parallel computing technology and also show how computing time depends on the used data structure.
Buffered coscheduling for parallel programming and enhanced fault tolerance

DOEpatents

Petrini, Fabrizio [Los Alamos, NM; Feng, Wu-chun [Los Alamos, NM

2006-01-31

A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors
Two Parallel Olfactory Pathways for Processing General Odors in a Cockroach

PubMed Central

Watanabe, Hidehiro; Nishino, Hiroshi; Mizunami, Makoto; Yokohari, Fumio

2017-01-01

In animals, sensory processing via parallel pathways, including the olfactory system, is a common design. However, the mechanisms that parallel pathways use to encode highly complex and dynamic odor signals remain unclear. In the current study, we examined the anatomical and physiological features of parallel olfactory pathways in an evolutionally basal insect, the cockroach Periplaneta americana. In this insect, the entire system for processing general odors, from olfactory sensory neurons to higher brain centers, is anatomically segregated into two parallel pathways. Two separate populations of secondary olfactory neurons, type1 and type2 projection neurons (PNs), with dendrites in distinct glomerular groups relay olfactory signals to segregated areas of higher brain centers. We conducted intracellular recordings, revealing olfactory properties and temporal patterns of both types of PNs. Generally, type1 PNs exhibit higher odor-specificities to nine tested odorants than type2 PNs. Cluster analyses revealed that odor-evoked responses were temporally complex and varied in type1 PNs, while type2 PNs exhibited phasic on-responses with either early or late latencies to an effective odor. The late responses are 30–40 ms later than the early responses. Simultaneous intracellular recordings from two different PNs revealed that a given odor activated both types of PNs with different temporal patterns, and latencies of early and late responses in type2 PNs might be precisely controlled. Our results suggest that the cockroach is equipped with two anatomically and physiologically segregated parallel olfactory pathways, which might employ different neural strategies to encode odor information. PMID:28529476
On Parallelizing Single Dynamic Simulation Using HPC Techniques and APIs of Commercial Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Diao, Ruisheng; Jin, Shuangshuang; Howell, Frederic

Time-domain simulations are heavily used in today’s planning and operation practices to assess power system transient stability and post-transient voltage/frequency profiles following severe contingencies to comply with industry standards. Because of the increased modeling complexity, it is several times slower than real time for state-of-the-art commercial packages to complete a dynamic simulation for a large-scale model. With the growing stochastic behavior introduced by emerging technologies, power industry has seen a growing need for performing security assessment in real time. This paper presents a parallel implementation framework to speed up a single dynamic simulation by leveraging the existing stability model librarymore » in commercial tools through their application programming interfaces (APIs). Several high performance computing (HPC) techniques are explored such as parallelizing the calculation of generator current injection, identifying fast linear solvers for network solution, and parallelizing data outputs when interacting with APIs in the commercial package, TSAT. The proposed method has been tested on a WECC planning base case with detailed synchronous generator models and exhibits outstanding scalable performance with sufficient accuracy.« less
Progress in the Simulation of Steady and Time-Dependent Flows with 3D Parallel Unstructured Cartesian Methods

NASA Technical Reports Server (NTRS)

Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)

2002-01-01

The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.
Parareal in time 3D numerical solver for the LWR Benchmark neutron diffusion transient model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baudron, Anne-Marie, E-mail: anne-marie.baudron@cea.fr; CEA-DRN/DMT/SERMA, CEN-Saclay, 91191 Gif sur Yvette Cedex; Lautard, Jean-Jacques, E-mail: jean-jacques.lautard@cea.fr

2014-12-15

In this paper we present a time-parallel algorithm for the 3D neutrons calculation of a transient model in a nuclear reactor core. The neutrons calculation consists in numerically solving the time dependent diffusion approximation equation, which is a simplified transport equation. The numerical resolution is done with finite elements method based on a tetrahedral meshing of the computational domain, representing the reactor core, and time discretization is achieved using a θ-scheme. The transient model presents moving control rods during the time of the reaction. Therefore, cross-sections (piecewise constants) are taken into account by interpolations with respect to the velocity ofmore » the control rods. The parallelism across the time is achieved by an adequate use of the parareal in time algorithm to the handled problem. This parallel method is a predictor corrector scheme that iteratively combines the use of two kinds of numerical propagators, one coarse and one fine. Our method is made efficient by means of a coarse solver defined with large time step and fixed position control rods model, while the fine propagator is assumed to be a high order numerical approximation of the full model. The parallel implementation of our method provides a good scalability of the algorithm. Numerical results show the efficiency of the parareal method on large light water reactor transient model corresponding to the Langenbuch–Maurer–Werner benchmark.« less
Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units

NASA Astrophysics Data System (ADS)

Corral, Roque; Gisbert, Fernando; Pueblas, Jesus

2017-02-01

The implementation of an edge-based three-dimensional Reynolds Average Navier-Stokes solver for unstructured grids able to run on multiple graphics processing units (GPUs) is presented. Loops over edges, which are the most time-consuming part of the solver, have been written to exploit the massively parallel capabilities of GPUs. Non-blocking communications between parallel processes and between the GPU and the central processor unit (CPU) have been used to enhance code scalability. The code is written using a mixture of C++ and OpenCL, to allow the execution of the source code on GPUs. The Message Passage Interface (MPI) library is used to allow the parallel execution of the solver on multiple GPUs. A comparative study of the solver parallel performance is carried out using a cluster of CPUs and another of GPUs. It is shown that a single GPU is up to 64 times faster than a single CPU core. The parallel scalability of the solver is mainly degraded due to the loss of computing efficiency of the GPU when the size of the case decreases. However, for large enough grid sizes, the scalability is strongly improved. A cluster featuring commodity GPUs and a high bandwidth network is ten times less costly and consumes 33% less energy than a CPU-based cluster with an equivalent computational power.
Crustal origin of trench-parallel shear-wave fast polarizations in the Central Andes

NASA Astrophysics Data System (ADS)

Wölbern, I.; Löbl, U.; Rümpker, G.

2014-04-01

In this study, SKS and local S phases are analyzed to investigate variations of shear-wave splitting parameters along two dense seismic profiles across the central Andean Altiplano and Puna plateaus. In contrast to previous observations, the vast majority of the measurements reveal fast polarizations sub-parallel to the subduction direction of the Nazca plate with delay times between 0.3 and 1.2 s. Local phases show larger variations of fast polarizations and exhibit delay times ranging between 0.1 and 1.1 s. Two 70 km and 100 km wide sections along the Altiplano profile exhibit larger delay times and are characterized by fast polarizations oriented sub-parallel to major fault zones. Based on finite-difference wavefield calculations for anisotropic subduction zone models we demonstrate that the observations are best explained by fossil slab anisotropy with fast symmetry axes oriented sub-parallel to the slab movement in combination with a significant component of crustal anisotropy of nearly trench-parallel fast-axis orientation. From the modeling we exclude a sub-lithospheric origin of the observed strong anomalies due to the short-scale variations of the fast polarizations. Instead, our results indicate that anisotropy in the Central Andes generally reflects the direction of plate motion while the observed trench-parallel fast polarizations likely originate in the continental crust above the subducting slab.
Parallelizing Timed Petri Net simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1993-01-01

The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
A multi-satellite orbit determination problem in a parallel processing environment

NASA Technical Reports Server (NTRS)

Deakyne, M. S.; Anderle, R. J.

1988-01-01

The Engineering Orbit Analysis Unit at GE Valley Forge used an Intel Hypercube Parallel Processor to investigate the performance and gain experience of parallel processors with a multi-satellite orbit determination problem. A general study was selected in which major blocks of computation for the multi-satellite orbit computations were used as units to be assigned to the various processors on the Hypercube. Problems encountered or successes achieved in addressing the orbit determination problem would be more likely to be transferable to other parallel processors. The prime objective was to study the algorithm to allow processing of observations later in time than those employed in the state update. Expertise in ephemeris determination was exploited in addressing these problems and the facility used to bring a realism to the study which would highlight the problems which may not otherwise be anticipated. Secondary objectives were to gain experience of a non-trivial problem in a parallel processor environment, to explore the necessary interplay of serial and parallel sections of the algorithm in terms of timing studies, to explore the granularity (coarse vs. fine grain) to discover the granularity limit above which there would be a risk of starvation where the majority of nodes would be idle or under the limit where the overhead associated with splitting the problem may require more work and communication time than is useful.
Parallel computing method for simulating hydrological processesof large rivers under climate change

NASA Astrophysics Data System (ADS)

Wang, H.; Chen, Y.

2016-12-01

Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Owen, Jeffrey E.

1988-01-01

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
Dynamic performance of high speed solenoid valve with parallel coils

NASA Astrophysics Data System (ADS)

Kong, Xiaowu; Li, Shizhen

2014-07-01

The methods of improving the dynamic performance of high speed on/off solenoid valve include increasing the magnetic force of armature and the slew rate of coil current, decreasing the mass and stroke of moving parts. The increase of magnetic force usually leads to the decrease of current slew rate, which could increase the delay time of the dynamic response of solenoid valve. Using a high voltage to drive coil can solve this contradiction, but a high driving voltage can also lead to more cost and a decrease of safety and reliability. In this paper, a new scheme of parallel coils is investigated, in which the single coil of solenoid is replaced by parallel coils with same ampere turns. Based on the mathematic model of high speed solenoid valve, the theoretical formula for the delay time of solenoid valve is deduced. Both the theoretical analysis and the dynamic simulation show that the effect of dividing a single coil into N parallel sub-coils is close to that of driving the single coil with N times of the original driving voltage as far as the delay time of solenoid valve is concerned. A specific test bench is designed to measure the dynamic performance of high speed on/off solenoid valve. The experimental results also prove that both the delay time and switching time of the solenoid valves can be decreased greatly by adopting the parallel coil scheme. This research presents a simple and practical method to improve the dynamic performance of high speed on/off solenoid valve.
Multilayer gyroid cubic membrane organization in green alga Zygnema.

PubMed

Zhan, Ting; Lv, Wenhua; Deng, Yuru

2017-09-01

Biological cubic membranes (CM), which are fluid membranes draped onto the 3D periodic parallel surface geometries with cubic symmetry, have been observed within subcellular organelles, including mitochondria, endoplasmic reticulum, and thylakoids. CM transition tends to occur under various stress conditions; however, multilayer CM organizations often appear associated with light stress conditions. This report is about the characterization of a projected gyroid CM in a transmission electron microscopy study of the chloroplast membranes within green alga Zygnema (LB923) whose lamellar form of thylakoid membrane started to fold into multilayer gyroid CM in the culture at the end of log phase of cell growth. Using the techniques of computer simulation of transmission electron microscopy (TEM) and a direct template matching method, we show that these CM are based on the gyroid parallel surfaces. The single, double, and multilayer gyroid CM morphologies are observed in which space is continuously divided into two, three, and more subvolumes by either one, two, or several parallel membranes. The gyroid CM are continuous with varying amount of pseudo-grana with lamellar-like morphology. The relative amount and order of these two membrane morphologies seem to vary with the age of cell culture and are insensitive to ambient light condition. In addition, thylakoid gyroid CM continuously interpenetrates the pyrenoid body through stalk, bundle-like, morphologies. Inside the pyrenoid body, the membranes re-folded into gyroid CM. The appearance of these CM rearrangements due to the consequence of Zygnema cell response to various types of environmental stresses will be discussed. These stresses include nutrient limitation, temperature fluctuation, and ultraviolet (UV) exposure.
Mesh-free data transfer algorithms for partitioned multiphysics problems: Conservation, accuracy, and parallelism

DOE PAGES

Slattery, Stuart R.

2015-12-02

In this study we analyze and extend mesh-free algorithms for three-dimensional data transfer problems in partitioned multiphysics simulations. We first provide a direct comparison between a mesh-based weighted residual method using the common-refinement scheme and two mesh-free algorithms leveraging compactly supported radial basis functions: one using a spline interpolation and one using a moving least square reconstruction. Through the comparison we assess both the conservation and accuracy of the data transfer obtained from each of the methods. We do so for a varying set of geometries with and without curvature and sharp features and for functions with and without smoothnessmore » and with varying gradients. Our results show that the mesh-based and mesh-free algorithms are complementary with cases where each was demonstrated to perform better than the other. We then focus on the mesh-free methods by developing a set of algorithms to parallelize them based on sparse linear algebra techniques. This includes a discussion of fast parallel radius searching in point clouds and restructuring the interpolation algorithms to leverage data structures and linear algebra services designed for large distributed computing environments. The scalability of our new algorithms is demonstrated on a leadership class computing facility using a set of basic scaling studies. Finally, these scaling studies show that for problems with reasonable load balance, our new algorithms for both spline interpolation and moving least square reconstruction demonstrate both strong and weak scalability using more than 100,000 MPI processes with billions of degrees of freedom in the data transfer operation.« less
Research on parallel algorithm for sequential pattern mining

NASA Astrophysics Data System (ADS)

Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

2008-03-01

Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.
Restricted access Improved hydrogeophysical characterization and monitoring through parallel modeling and inversion of time-domain resistivity andinduced-polarization data

USGS Publications Warehouse

Johnson, Timothy C.; Versteeg, Roelof J.; Ward, Andy; Day-Lewis, Frederick D.; Revil, André

2010-01-01

Electrical geophysical methods have found wide use in the growing discipline of hydrogeophysics for characterizing the electrical properties of the subsurface and for monitoring subsurface processes in terms of the spatiotemporal changes in subsurface conductivity, chargeability, and source currents they govern. Presently, multichannel and multielectrode data collections systems can collect large data sets in relatively short periods of time. Practitioners, however, often are unable to fully utilize these large data sets and the information they contain because of standard desktop-computer processing limitations. These limitations can be addressed by utilizing the storage and processing capabilities of parallel computing environments. We have developed a parallel distributed-memory forward and inverse modeling algorithm for analyzing resistivity and time-domain induced polar-ization (IP) data. The primary components of the parallel computations include distributed computation of the pole solutions in forward mode, distributed storage and computation of the Jacobian matrix in inverse mode, and parallel execution of the inverse equation solver. We have tested the corresponding parallel code in three efforts: (1) resistivity characterization of the Hanford 300 Area Integrated Field Research Challenge site in Hanford, Washington, U.S.A., (2) resistivity characterization of a volcanic island in the southern Tyrrhenian Sea in Italy, and (3) resistivity and IP monitoring of biostimulation at a Superfund site in Brandywine, Maryland, U.S.A. Inverse analysis of each of these data sets would be limited or impossible in a standard serial computing environment, which underscores the need for parallel high-performance computing to fully utilize the potential of electrical geophysical methods in hydrogeophysical applications.
Community Detection on the GPU

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh

We present and evaluate a new GPU algorithm based on the Louvain method for community detection. Our algorithm is the first for this problem that parallelizes the access to individual edges. In this way we can fine tune the load balance when processing networks with nodes of highly varying degrees. This is achieved by scaling the number of threads assigned to each node according to its degree. Extensive experiments show that we obtain speedups up to a factor of 270 compared to the sequential algorithm. The algorithm consistently outperforms other recent shared memory implementations and is only one order ofmore » magnitude slower than the current fastest parallel Louvain method running on a Blue Gene/Q supercomputer using more than 500K threads.« less
The Role of Nonlinear Gradients in Parallel Imaging: A k-Space Based Analysis.

PubMed

Galiana, Gigi; Stockmann, Jason P; Tam, Leo; Peters, Dana; Tagare, Hemant; Constable, R Todd

2012-09-01

Sequences that encode the spatial information of an object using nonlinear gradient fields are a new frontier in MRI, with potential to provide lower peripheral nerve stimulation, windowed fields of view, tailored spatially-varying resolution, curved slices that mirror physiological geometry, and, most importantly, very fast parallel imaging with multichannel coils. The acceleration for multichannel images is generally explained by the fact that curvilinear gradient isocontours better complement the azimuthal spatial encoding provided by typical receiver arrays. However, the details of this complementarity have been more difficult to specify. We present a simple and intuitive framework for describing the mechanics of image formation with nonlinear gradients, and we use this framework to review some the main classes of nonlinear encoding schemes.
A proposed experimental search for chameleons using asymmetric parallel plates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burrage, Clare; Copeland, Edmund J.; Stevenson, James A., E-mail: Clare.Burrage@nottingham.ac.uk, E-mail: ed.copeland@nottingham.ac.uk, E-mail: james.stevenson@nottingham.ac.uk

2016-08-01

Light scalar fields coupled to matter are a common consequence of theories of dark energy and attempts to solve the cosmological constant problem. The chameleon screening mechanism is commonly invoked in order to suppress the fifth forces mediated by these scalars, sufficiently to avoid current experimental constraints, without fine tuning. The force is suppressed dynamically by allowing the mass of the scalar to vary with the local density. Recently it has been shown that near future cold atoms experiments using atom-interferometry have the ability to access a large proportion of the chameleon parameter space. In this work we demonstrate howmore » experiments utilising asymmetric parallel plates can push deeper into the remaining parameter space available to the chameleon.« less

Reverse time migration: A seismic processing application on the connection machine

NASA Technical Reports Server (NTRS)

Fiebrich, Rolf-Dieter

1987-01-01

The implementation of a reverse time migration algorithm on the Connection Machine, a massively parallel computer is described. Essential architectural features of this machine as well as programming concepts are presented. The data structures and parallel operations for the implementation of the reverse time migration algorithm are described. The algorithm matches the Connection Machine architecture closely and executes almost at the peak performance of this machine.
Study of Electromagnetic Repulsion Switch to High Speed Reclosing and Recover Time Characteristics of Superconductor

NASA Astrophysics Data System (ADS)

Koyama, Tomonori; Kaiho, Katsuyuki; Yamaguchi, Iwao; Yanabu, Satoru

Using a high-temperature superconductor, we constructed and tested a model superconducting fault current limiter (SFCL). The superconductor and vacuum interrupter as the commutation switch were connected in parallel using a bypass coil. When the fault current flows in this equipment, the superconductor is quenched and the current is then transferred to the parallel coil due to the voltage drop in the superconductor. This large current in the parallel coil actuates the magnetic repulsion mechanism of the vacuum interrupter and the current in the superconductor is broken. Using this equipment, the current flow time in the superconductor can be easily minimized. On the other hand, the fault current is also easily limited by large reactance of the parallel coil. This system has many merits. So, we introduced to electromagnetic repulsion switch. There is duty of high speed re-closing after interrupting fault current in the electrical power system. So the SFCL should be recovered to superconducting state before high speed re-closing. But, superconductor generated heat at the time of quench. It takes time to recover superconducting state. Therefore it is a matter of recovery time. In this paper, we studied recovery time of superconductor. Also, we proposed electromagnetic repulsion switch with reclosing system.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chrisochoides, N.; Sukup, F.

In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less
Accelerating the discovery of space-time patterns of infectious diseases using parallel computing.

PubMed

Hohl, Alexander; Delmelle, Eric; Tang, Wenwu; Casas, Irene

2016-11-01

Infectious diseases have complex transmission cycles, and effective public health responses require the ability to monitor outbreaks in a timely manner. Space-time statistics facilitate the discovery of disease dynamics including rate of spread and seasonal cyclic patterns, but are computationally demanding, especially for datasets of increasing size, diversity and availability. High-performance computing reduces the effort required to identify these patterns, however heterogeneity in the data must be accounted for. We develop an adaptive space-time domain decomposition approach for parallel computation of the space-time kernel density. We apply our methodology to individual reported dengue cases from 2010 to 2011 in the city of Cali, Colombia. The parallel implementation reaches significant speedup compared to sequential counterparts. Density values are visualized in an interactive 3D environment, which facilitates the identification and communication of uneven space-time distribution of disease events. Our framework has the potential to enhance the timely monitoring of infectious diseases. Copyright © 2016 Elsevier Ltd. All rights reserved.
Feynman’s clock, a new variational principle, and parallel-in-time quantum dynamics

PubMed Central

McClean, Jarrod R.; Parkhill, John A.; Aspuru-Guzik, Alán

2013-01-01

We introduce a discrete-time variational principle inspired by the quantum clock originally proposed by Feynman and use it to write down quantum evolution as a ground-state eigenvalue problem. The construction allows one to apply ground-state quantum many-body theory to quantum dynamics, extending the reach of many highly developed tools from this fertile research area. Moreover, this formalism naturally leads to an algorithm to parallelize quantum simulation over time. We draw an explicit connection between previously known time-dependent variational principles and the time-embedded variational principle presented. Sample calculations are presented, applying the idea to a hydrogen molecule and the spin degrees of freedom of a model inorganic compound, demonstrating the parallel speedup of our method as well as its flexibility in applying ground-state methodologies. Finally, we take advantage of the unique perspective of this variational principle to examine the error of basis approximations in quantum dynamics. PMID:24062428
Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models.

PubMed

Ferrucci, Filomena; Salza, Pasquale; Sarro, Federica

2017-06-29

The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs' computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider.
Relation of Parallel Discrete Event Simulation algorithms with physical models

NASA Astrophysics Data System (ADS)

Shchur, L. N.; Shchur, L. V.

2015-09-01

We extend concept of local simulation times in parallel discrete event simulation (PDES) in order to take into account architecture of the current hardware and software in high-performance computing. We shortly review previous research on the mapping of PDES on physical problems, and emphasise how physical results may help to predict parallel algorithms behaviour.
On the Parallel Deterioration of Lexico-Semantic Processes in the Bilinguals' Two Languages: Evidence from Alzheimer's Disease

ERIC Educational Resources Information Center

Costa, Albert; Calabria, Marco; Marne, Paula; Hernandez, Mireia; Juncadella, Montserrat; Gascon-Bayarri, Jordi; Lleo, Alberto; Ortiz-Gil, Jordi; Ugas, Lidia; Blesa, Rafael; Rene, Ramon

2012-01-01

In this article we aimed to assess how Alzheimer's disease (AD), which is neurodegenerative, affects the linguistic performance of early, high-proficient bilinguals in their two languages. To this end, we compared the Picture Naming and Word Translation performances of two groups of AD patients varying in disease progression (Mild and Moderate)…
Elementary School Teachers as "Targets and Agents of Change": Teachers' Learning in Interaction with Reform Science Curriculum

ERIC Educational Resources Information Center

Metz, Kathleen E.

2009-01-01

This article examines teachers' perspectives on the challenges of using a science reform curriculum, as well as their learning in interaction with the curriculum and parallel professional development program. As case studies, I selected 4 veteran teachers of 2nd or 3rd grade, with varying science backgrounds (including 2 with essentially none).…
S-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Simon, H.

1998-01-01

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
Sandstone dykes in siwalik sandstone-sedimentology and basin analysis-subansiri district (NEFA), Eastern Himalaya

NASA Astrophysics Data System (ADS)

Kumar, Surendar; Singh, Trilochan

1982-11-01

Sandstone dykes (including sills) of varied thickness and with tapering ends are present either transecting or (sills) parallel to bedding in the Siwalik sandstone of Arunachal Pradesh (NEFA), Eastern Himalaya. The different sedimentary and microstructural analyses show varied conditions of deposition with changing facies from fluvial channel, to alluvial fan, to coastal plain-fan delta. The non-marine and shallow marine environments are indicated by the presence of organised and disorganised gradation and the presence of sandstone dykes in the interface regions. The orientations of the longer axes of the conglomerate along with the sand bedding indicate palaeoflow.
Solving complex band structure problems with the FEAST eigenvalue algorithm

NASA Astrophysics Data System (ADS)

Laux, S. E.

2012-08-01

With straightforward extension, the FEAST eigenvalue algorithm [Polizzi, Phys. Rev. B 79, 115112 (2009)] is capable of solving the generalized eigenvalue problems representing traveling-wave problems—as exemplified by the complex band-structure problem—even though the matrices involved are complex, non-Hermitian, and singular, and hence outside the originally stated range of applicability of the algorithm. The obtained eigenvalues/eigenvectors, however, contain spurious solutions which must be detected and removed. The efficiency and parallel structure of the original algorithm are unaltered. The complex band structures of Si layers of varying thicknesses and InAs nanowires of varying radii are computed as test problems.
Facial Redness Increases Men's Perceived Healthiness and Attractiveness.

PubMed

Thorstenson, Christopher A; Pazda, Adam D; Elliot, Andrew J; Perrett, David I

2017-06-01

Past research has shown that peripheral and facial redness influences perceptions of attractiveness for men viewing women. The current research investigated whether a parallel effect is present when women rate men with varying facial redness. In four experiments, women judged the attractiveness of men's faces, which were presented with varying degrees of redness. We also examined perceived healthiness and other candidate variables as mediators of the red-attractiveness effect. The results show that facial redness positively influences ratings of men's attractiveness. Additionally, perceived healthiness was documented as a mediator of this effect, independent of other potential mediator variables. The current research emphasizes facial coloration as an important feature of social judgments.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

PubMed Central

Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

2016-01-01

With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.

PubMed

Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

2016-04-07

With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
Developing parallel GeoFEST(P) using the PYRAMID AMR library

NASA Technical Reports Server (NTRS)

Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Tisdale, Robert E.

2004-01-01

The PYRAMID parallel unstructured adaptive mesh refinement (AMR) library has been coupled with the GeoFEST geophysical finite element simulation tool to support parallel active tectonics simulations. Specifically, we have demonstrated modeling of coseismic and postseismic surface displacement due to a simulated Earthquake for the Landers system of interacting faults in Southern California. The new software demonstrated a 25-times resolution improvement and a 4-times reduction in time to solution over the sequential baseline milestone case. Simulations on workstations using a few tens of thousands of stress displacement finite elements can now be expanded to multiple millions of elements with greater than 98% scaled efficiency on various parallel platforms over many hundreds of processors. Our most recent work has demonstrated that we can dynamically adapt the computational grid as stress grows on a fault. In this paper, we will describe the major issues and challenges associated with coupling these two programs to create GeoFEST(P). Performance and visualization results will also be described.
The role of bed-parallel slip in the development of complex normal fault zones

NASA Astrophysics Data System (ADS)

Delogkos, Efstratios; Childs, Conrad; Manzocchi, Tom; Walsh, John J.; Pavlides, Spyros

2017-04-01

Normal faults exposed in Kardia lignite mine, Ptolemais Basin, NW Greece formed at the same time as bed-parallel slip-surfaces, so that while the normal faults grew they were intermittently offset by bed-parallel slip. Following offset by a bed-parallel slip-surface, further fault growth is accommodated by reactivation on one or both of the offset fault segments. Where one fault is reactivated the site of bed-parallel slip is a bypassed asperity. Where both faults are reactivated, they propagate past each other to form a volume between overlapping fault segments that displays many of the characteristics of relay zones, including elevated strains and transfer of displacement between segments. Unlike conventional relay zones, however, these structures contain either a repeated or a missing section of stratigraphy which has a thickness equal to the throw of the fault at the time of the bed-parallel slip event, and the displacement profiles along the relay-bounding fault segments have discrete steps at their intersections with bed-parallel slip-surfaces. With further increase in displacement, the overlapping fault segments connect to form a fault-bound lens. Conventional relay zones form during initial fault propagation, but with coeval bed-parallel slip, relay-like structures can form later in the growth of a fault. Geometrical restoration of cross-sections through selected faults shows that repeated bed-parallel slip events during fault growth can lead to complex internal fault zone structure that masks its origin. Bed-parallel slip, in this case, is attributed to flexural-slip arising from hanging-wall rollover associated with a basin-bounding fault outside the study area.
Estimation of snow albedo reduction by light absorbing impurities using Monte Carlo radiative transfer model

NASA Astrophysics Data System (ADS)

Sengupta, D.; Gao, L.; Wilcox, E. M.; Beres, N. D.; Moosmüller, H.; Khlystov, A.

2017-12-01

Radiative forcing and climate change greatly depends on earth's surface albedo and its temporal and spatial variation. The surface albedo varies greatly depending on the surface characteristics ranging from 5-10% for calm ocean waters to 80% for some snow-covered areas. Clean and fresh snow surfaces have the highest albedo and are most sensitive to contamination with light absorbing impurities that can greatly reduce surface albedo and change overall radiative forcing estimates. Accurate estimation of snow albedo as well as understanding of feedbacks on climate from changes in snow-covered areas is important for radiative forcing, snow energy balance, predicting seasonal snowmelt, and run off rates. Such information is essential to inform timely decision making of stakeholders and policy makers. Light absorbing particles deposited onto the snow surface can greatly alter snow albedo and have been identified as a major contributor to regional climate forcing if seasonal snow cover is involved. However, uncertainty associated with quantification of albedo reduction by these light absorbing particles is high. Here, we use Mie theory (under the assumption of spherical snow grains) to reconstruct the single scattering parameters of snow (i.e., single scattering albedo ῶ and asymmetry parameter g) from observation-based size distribution information and retrieved refractive index values. The single scattering parameters of impurities are extracted with the same approach from datasets obtained during laboratory combustion of biomass samples. Instead of using plane-parallel approximation methods to account for multiple scattering, we have used the simple "Monte Carlo ray/photon tracing approach" to calculate the snow albedo. This simple approach considers multiple scattering to be the "collection" of single scattering events. Using this approach, we vary the effective snow grain size and impurity concentrations to explore the evolution of snow albedo over a wide wavelength range (300 nm - 2000 nm). Results will be compared with the SNICAR model to better understand the differences in snow albedo computation between plane-parallel methods and the statistical Monte Carlo methods.
The paradigm compiler: Mapping a functional language for the connection machine

NASA Technical Reports Server (NTRS)

Dennis, Jack B.

1989-01-01

The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fijany, A.; Milman, M.; Redding, D.

1994-12-31

In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.