ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
A tool for simulating parallel branch-and-bound methods
NASA Astrophysics Data System (ADS)
Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail
2016-01-01
The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
NASA Astrophysics Data System (ADS)
Liu, Wei; Li, Ying-jun; Jia, Zhen-yuan; Zhang, Jun; Qian, Min
2011-01-01
In working process of huge heavy-load manipulators, such as the free forging machine, hydraulic die-forging press, forging manipulator, heavy grasping manipulator, large displacement manipulator, measurement of six-dimensional heavy force/torque and real-time force feedback of the operation interface are basis to realize coordinate operation control and force compliance control. It is also an effective way to raise the control accuracy and achieve highly efficient manufacturing. Facing to solve dynamic measurement problem on six-dimensional time-varying heavy load in extremely manufacturing process, the novel principle of parallel load sharing on six-dimensional heavy force/torque is put forward. The measuring principle of six-dimensional force sensor is analyzed, and the spatial model is built and decoupled. The load sharing ratios are analyzed and calculated in vertical and horizontal directions. The mapping relationship between six-dimensional heavy force/torque value to be measured and output force value is built. The finite element model of parallel piezoelectric six-dimensional heavy force/torque sensor is set up, and its static characteristics are analyzed by ANSYS software. The main parameters, which affect load sharing ratio, are analyzed. The experiments for load sharing with different diameters of parallel axis are designed. The results show that the six-dimensional heavy force/torque sensor has good linearity. Non-linearity errors are less than 1%. The parallel axis makes good effect of load sharing. The larger the diameter is, the better the load sharing effect is. The results of experiments are in accordance with the FEM analysis. The sensor has advantages of large measuring range, good linearity, high inherent frequency, and high rigidity. It can be widely used in extreme environments for real-time accurate measurement of six-dimensional time-varying huge loads on manipulators.
Load balancing for massively-parallel soft-real-time systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hailperin, M.
1988-09-01
Global load balancing, if practical, would allow the effective use of massively-parallel ensemble architectures for large soft-real-problems. The challenge is to replace quick global communications, which is impractical in a massively-parallel system, with statistical techniques. In this vein, the author proposes a novel approach to decentralized load balancing based on statistical time-series analysis. Each site estimates the system-wide average load using information about past loads of individual sites and attempts to equal that average. This estimation process is practical because the soft-real-time systems of interest naturally exhibit loads that are periodic, in a statistical sense akin to seasonality in econometrics.more » It is shown how this load-characterization technique can be the foundation for a load-balancing system in an architecture employing cut-through routing and an efficient multicast protocol.« less
Data decomposition method for parallel polygon rasterization considering load balancing
NASA Astrophysics Data System (ADS)
Zhou, Chen; Chen, Zhenjie; Liu, Yongxue; Li, Feixue; Cheng, Liang; Zhu, A.-xing; Li, Manchun
2015-12-01
It is essential to adopt parallel computing technology to rapidly rasterize massive polygon data. In parallel rasterization, it is difficult to design an effective data decomposition method. Conventional methods ignore load balancing of polygon complexity in parallel rasterization and thus fail to achieve high parallel efficiency. In this paper, a novel data decomposition method based on polygon complexity (DMPC) is proposed. First, four factors that possibly affect the rasterization efficiency were investigated. Then, a metric represented by the boundary number and raster pixel number in the minimum bounding rectangle was developed to calculate the complexity of each polygon. Using this metric, polygons were rationally allocated according to the polygon complexity, and each process could achieve balanced loads of polygon complexity. To validate the efficiency of DMPC, it was used to parallelize different polygon rasterization algorithms and tested on different datasets. Experimental results showed that DMPC could effectively parallelize polygon rasterization algorithms. Furthermore, the implemented parallel algorithms with DMPC could achieve good speedup ratios of at least 15.69 and generally outperformed conventional decomposition methods in terms of parallel efficiency and load balancing. In addition, the results showed that DMPC exhibited consistently better performance for different spatial distributions of polygons.
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Parallel Processing of Adaptive Meshes with Load Balancing
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology, and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration.
Method of up-front load balancing for local memory parallel processors
NASA Technical Reports Server (NTRS)
Baffes, Paul Thomas (Inventor)
1990-01-01
In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.
Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jiang; Guo, Hanqi; Yuan, Xiaoru
Particle tracing is a fundamental technique in flow field data visualization. In this work, we present a novel dynamic load balancing method for parallel particle tracing. Specifically, we employ a constrained k-d tree decomposition approach to dynamically redistribute tasks among processes. Each process is initially assigned a regularly partitioned block along with duplicated ghost layer under the memory limit. During particle tracing, the k-d tree decomposition is dynamically performed by constraining the cutting planes in the overlap range of duplicated data. This ensures that each process is reassigned particles as even as possible, and on the other hand the newmore » assigned particles for a process always locate in its block. Result shows good load balance and high efficiency of our method.« less
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems
NASA Technical Reports Server (NTRS)
Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.
2000-01-01
The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
Klemen, Jane; Büchel, Christian; Bühler, Mira; Menz, Mareike M; Rose, Michael
2010-03-01
Attentional interference between tasks performed in parallel is known to have strong and often undesired effects. As yet, however, the mechanisms by which interference operates remain elusive. A better knowledge of these processes may facilitate our understanding of the effects of attention on human performance and the debilitating consequences that disruptions to attention can have. According to the load theory of cognitive control, processing of task-irrelevant stimuli is increased by attending in parallel to a relevant task with high cognitive demands. This is due to the relevant task engaging cognitive control resources that are, hence, unavailable to inhibit the processing of task-irrelevant stimuli. However, it has also been demonstrated that a variety of types of load (perceptual and emotional) can result in a reduction of the processing of task-irrelevant stimuli, suggesting a uniform effect of increased load irrespective of the type of load. In the present study, we concurrently presented a relevant auditory matching task [n-back working memory (WM)] of low or high cognitive load (1-back or 2-back WM) and task-irrelevant images at one of three object visibility levels (0%, 50%, or 100%). fMRI activation during the processing of the task-irrelevant visual stimuli was measured in the lateral occipital cortex and found to be reduced under high, compared to low, WM load. In combination with previous findings, this result is suggestive of a more generalized load theory, whereby cognitive load, as well as other types of load (e.g., perceptual), can result in a reduction of the processing of task-irrelevant stimuli, in line with a uniform effect of increased load irrespective of the type of load.
A parallel implementation of an off-lattice individual-based model of multicellular populations
NASA Astrophysics Data System (ADS)
Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe
2015-07-01
As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
A transient-enhanced NMOS low dropout voltage regulator with parallel feedback compensation
NASA Astrophysics Data System (ADS)
Han, Wang; Lin, Tan
2016-02-01
This paper presents a transient-enhanced NMOS low-dropout regulator (LDO) for portable applications with parallel feedback compensation. The parallel feedback structure adds a dynamic zero to get an adequate phase margin with a load current variation from 0 to 1 A. A class-AB error amplifier and a fast charging/discharging unit are adopted to enhance the transient performance. The proposed LDO has been implemented in a 0.35 μm BCD process. From experimental results, the regulator can operate with a minimum dropout voltage of 150 mV at a maximum 1 A load and IQ of 165 μA. Under the full range load current step, the voltage undershoot and overshoot of the proposed LDO are reduced to 38 mV and 27 mV respectively.
ERIC Educational Resources Information Center
Klemen, Jane; Buchel, Christian; Buhler, Mira; Menz, Mareike M.; Rose, Michael
2010-01-01
Attentional interference between tasks performed in parallel is known to have strong and often undesired effects. As yet, however, the mechanisms by which interference operates remain elusive. A better knowledge of these processes may facilitate our understanding of the effects of attention on human performance and the debilitating consequences…
NASA Technical Reports Server (NTRS)
Harper, Richard
1989-01-01
In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
Probabilistic structural mechanics research for parallel processing computers
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Martin, William R.
1991-01-01
Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical.
Parallel processing methods for space based power systems
NASA Technical Reports Server (NTRS)
Berry, F. C.
1993-01-01
This report presents a method for doing load-flow analysis of a power system by using a decomposition approach. The power system for the Space Shuttle is used as a basis to build a model for the load-flow analysis. To test the decomposition method for doing load-flow analysis, simulations were performed on power systems of 16, 25, 34, 43, 52, 61, 70, and 79 nodes. Each of the power systems was divided into subsystems and simulated under steady-state conditions. The results from these tests have been found to be as accurate as tests performed using a standard serial simulator. The division of the power systems into different subsystems was done by assigning a processor to each area. There were 13 transputers available, therefore, up to 13 different subsystems could be simulated at the same time. This report has preliminary results for a load-flow analysis using a decomposition principal. The report shows that the decomposition algorithm for load-flow analysis is well suited for parallel processing and provides increases in the speed of execution.
NASA Astrophysics Data System (ADS)
Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.
2018-05-01
In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
NASA Astrophysics Data System (ADS)
Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.
2018-01-01
In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
Multiprocessing the Sieve of Eratosthenes
NASA Technical Reports Server (NTRS)
Bokhari, S.
1986-01-01
The Sieve of Eratosthenes for finding prime numbers in recent years has seen much use as a benchmark algorithm for serial computers while its intrinsically parallel nature has gone largely unnoticed. The implementation of a parallel version of this algorithm for a real parallel computer, the Flex/32, is described and its performance discussed. It is shown that the algorithm is sensitive to several fundamental performance parameters of parallel machines, such as spawning time, signaling time, memory access, and overhead of process switching. Because of the nature of the algorithm, it is impossible to get any speedup beyond 4 or 5 processors unless some form of dynamic load balancing is employed. We describe the performance of our algorithm with and without load balancing and compare it with theoretical lower bounds and simulated results. It is straightforward to understand this algorithm and to check the final results. However, its efficient implementation on a real parallel machine requires thoughtful design, especially if dynamic load balancing is desired. The fundamental operations required by the algorithm are very simple: this means that the slightest overhead appears prominently in performance data. The Sieve thus serves not only as a very severe test of the capabilities of a parallel processor but is also an interesting challenge for the programmer.
Operation Compatibility: A Neglected Contribution to Dual-Task Costs
ERIC Educational Resources Information Center
Pannebakker, Merel M.; Band, Guido P. H.; Ridderinkhof, K. Richard
2009-01-01
Traditionally, dual-task interference has been attributed to the consequences of task load exceeding capacity limitations. However, the current study demonstrates that in addition to task load, the mutual compatibility of the concurrent processes modulates whether 2 tasks can be performed in parallel. In 2 psychological refractory period…
NASA Astrophysics Data System (ADS)
Work, Paul R.
1991-12-01
This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
Image matrix processor for fast multi-dimensional computations
Roberson, George P.; Skeate, Michael F.
1996-01-01
An apparatus for multi-dimensional computation which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination.
Load Balancing Strategies for Multi-Block Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)
2002-01-01
The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
Multidimensional spectral load balancing
Hendrickson, Bruce A.; Leland, Robert W.
1996-12-24
A method of and apparatus for graph partitioning involving the use of a plurality of eigenvectors of the Laplacian matrix of the graph of the problem for which load balancing is desired. The invention is particularly useful for optimizing parallel computer processing of a problem and for minimizing total pathway lengths of integrated circuits in the design stage.
Digital Optical Circuit Technology.
1985-03-01
ordinateurs ct des syst~mcs de diffusion de donn’es qui soient I la fois numcriques, entierement optiques. tres rapides etI I’abri des interferences et des...F.A.Hopf SESSION 11 - OPTICAL LOGIC PROSPECTS FOR PARALLEL NONLINEAR OPTICAL SIGNAL PROCESSING USING GaAs ETALONS AND ZnS INTERFERENCE FILTERS by...talks 1, 8, and 9) interference filters for room-temperature parallel processing. If one imposes a maximum heat load of 100 W/cm 2 , consistent with
An architecture for real-time vision processing
NASA Technical Reports Server (NTRS)
Chien, Chiun-Hong
1994-01-01
To study the feasibility of developing an architecture for real time vision processing, a task queue server and parallel algorithms for two vision operations were designed and implemented on an i860-based Mercury Computing System 860VS array processor. The proposed architecture treats each vision function as a task or set of tasks which may be recursively divided into subtasks and processed by multiple processors coordinated by a task queue server accessible by all processors. Each idle processor subsequently fetches a task and associated data from the task queue server for processing and posts the result to shared memory for later use. Load balancing can be carried out within the processing system without the requirement for a centralized controller. The author concludes that real time vision processing cannot be achieved without both sequential and parallel vision algorithms and a good parallel vision architecture.
Parallel adaptive wavelet collocation method for PDEs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
Collectively loading programs in a multiple program multiple data environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Parallel processing approach to transform-based image coding
NASA Astrophysics Data System (ADS)
Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.
1991-06-01
This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
NASA Astrophysics Data System (ADS)
Chan, Chia-Hsin; Tu, Chun-Chuan; Tsai, Wen-Jiin
2017-01-01
High efficiency video coding (HEVC) not only improves the coding efficiency drastically compared to the well-known H.264/AVC but also introduces coding tools for parallel processing, one of which is tiles. Tile partitioning is allowed to be arbitrary in HEVC, but how to decide tile boundaries remains an open issue. An adaptive tile boundary (ATB) method is proposed to select a better tile partitioning to improve load balancing (ATB-LoadB) and coding efficiency (ATB-Gain) with a unified scheme. Experimental results show that, compared to ordinary uniform-space partitioning, the proposed ATB can save up to 17.65% of encoding times in parallel encoding scenarios and can reduce up to 0.8% of total bit rates for coding efficiency.
Image matrix processor for fast multi-dimensional computations
Roberson, G.P.; Skeate, M.F.
1996-10-15
An apparatus for multi-dimensional computation is disclosed which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination. 10 figs.
NASA Astrophysics Data System (ADS)
Wang, Liping; Jiang, Yao; Li, Tiemin
2014-09-01
Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.
Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu
1995-01-01
As computing technology continues to advance, computational modeling of scientific and engineering problems produces data of increasing complexity: large in size and unstructured in shape. Volume visualization of such data is a challenging problem. This paper proposes a distributed parallel solution that makes ray-casting volume rendering of unstructured-grid data practical. Both the data and the rendering process are distributed among processors. At each processor, ray-casting of local data is performed independent of the other processors. The global image composing processes, which require inter-processor communication, are overlapped with the local ray-casting processes to achieve maximum parallel efficiency. This algorithm differs from previous ones in four ways: it is completely distributed, less view-dependent, reasonably scalable, and flexible. Without using dynamic load balancing, test results on the Intel Paragon using from two to 128 processors show, on average, about 60% parallel efficiency.
NASA Astrophysics Data System (ADS)
Furuichi, M.; Nishiura, D.
2015-12-01
Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++
NASA Technical Reports Server (NTRS)
Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.
1996-01-01
This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
Perceptual load interacts with stimulus processing across sensory modalities.
Klemen, J; Büchel, C; Rose, M
2009-06-01
According to perceptual load theory, processing of task-irrelevant stimuli is limited by the perceptual load of a parallel attended task if both the task and the irrelevant stimuli are presented to the same sensory modality. However, it remains a matter of debate whether the same principles apply to cross-sensory perceptual load and, more generally, what form cross-sensory attentional modulation in early perceptual areas takes in humans. Here we addressed these questions using functional magnetic resonance imaging. Participants undertook an auditory one-back working memory task of low or high perceptual load, while concurrently viewing task-irrelevant images at one of three object visibility levels. The processing of the visual and auditory stimuli was measured in the lateral occipital cortex (LOC) and auditory cortex (AC), respectively. Cross-sensory interference with sensory processing was observed in both the LOC and AC, in accordance with previous results of unisensory perceptual load studies. The present neuroimaging results therefore warrant the extension of perceptual load theory from a unisensory to a cross-sensory context: a validation of this cross-sensory interference effect through behavioural measures would consolidate the findings.
Scalable loading of a two-dimensional trapped-ion array
Bruzewicz, Colin D.; McConnell, Robert; Chiaverini, John; Sage, Jeremy M.
2016-01-01
Two-dimensional arrays of trapped-ion qubits are attractive platforms for scalable quantum information processing. Sufficiently rapid reloading capable of sustaining a large array, however, remains a significant challenge. Here with the use of a continuous flux of pre-cooled neutral atoms from a remotely located source, we achieve fast loading of a single ion per site while maintaining long trap lifetimes and without disturbing the coherence of an ion quantum bit in an adjacent site. This demonstration satisfies all major criteria necessary for loading and reloading extensive two-dimensional arrays, as will be required for large-scale quantum information processing. Moreover, the already high loading rate can be increased by loading ions in parallel with only a concomitant increase in photo-ionization laser power and no need for additional atomic flux. PMID:27677357
Modelling of loading, stress relaxation and stress recovery in a shape memory polymer.
Sweeney, J; Bonner, M; Ward, I M
2014-09-01
A multi-element constitutive model for a lactide-based shape memory polymer has been developed that represents loading to large tensile deformations, stress relaxation and stress recovery at 60, 65 and 70°C. The model consists of parallel Maxwell arms each comprising neo-Hookean and Eyring elements. Guiu-Pratt analysis of the stress relaxation curves yields Eyring parameters. When these parameters are used to define the Eyring process in a single Maxwell arm, the resulting model yields at too low a stress, but gives good predictions for longer times. Stress dip tests show a very stiff response on unloading by a small strain decrement. This would create an unrealistically high stress on loading to large strain if it were modelled by an elastic element. Instead it is modelled by an Eyring process operating via a flow rule that introduces strain hardening after yield. When this process is incorporated into a second parallel Maxwell arm, there results a model that fully represents both stress relaxation and stress dip tests at 60°C. At higher temperatures a third arm is required for valid predictions. Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.
1999-01-01
The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
Real-time multiplicity counter
Rowland, Mark S [Alamo, CA; Alvarez, Raymond A [Berkeley, CA
2010-07-13
A neutron multi-detector array feeds pulses in parallel to individual inputs that are tied to individual bits in a digital word. Data is collected by loading a word at the individual bit level in parallel. The word is read at regular intervals, all bits simultaneously, to minimize latency. The electronics then pass the word to a number of storage locations for subsequent processing, thereby removing the front-end problem of pulse pileup.
NASA Technical Reports Server (NTRS)
Telesman, Jack; Kantzos, Peter
1988-01-01
An in situ fatigue loading stage inside a scanning electron microscope (SEM) was used to determine the fatigue crack growth behavior of a PWA 1480 single-crystal nickel-based superalloy. The loading stage permits real-time viewing of the fatigue damage processes at high magnification. The PWA 1480 single-crystal, single-edge notch specimens were tested with the load axis parallel to the (100) orientation. Two distinct fatigue failure mechanisms were identified. The crack growth rate differed substantially when the failure occurred on a single slip system in comparison to multislip system failure. Two processes by which crack branching is produced were identified and are discussed. Also discussed are the observed crack closure mechanisms.
Automatic mesh refinement and parallel load balancing for Fokker-Planck-DSMC algorithm
NASA Astrophysics Data System (ADS)
Küchlin, Stephan; Jenny, Patrick
2018-06-01
Recently, a parallel Fokker-Planck-DSMC algorithm for rarefied gas flow simulation in complex domains at all Knudsen numbers was developed by the authors. Fokker-Planck-DSMC (FP-DSMC) is an augmentation of the classical DSMC algorithm, which mitigates the near-continuum deficiencies in terms of computational cost of pure DSMC. At each time step, based on a local Knudsen number criterion, the discrete DSMC collision operator is dynamically switched to the Fokker-Planck operator, which is based on the integration of continuous stochastic processes in time, and has fixed computational cost per particle, rather than per collision. In this contribution, we present an extension of the previous implementation with automatic local mesh refinement and parallel load-balancing. In particular, we show how the properties of discrete approximations to space-filling curves enable an efficient implementation. Exemplary numerical studies highlight the capabilities of the new code.
FPGA-based protein sequence alignment : A review
NASA Astrophysics Data System (ADS)
Isa, Mohd. Nazrin Md.; Muhsen, Ku Noor Dhaniah Ku; Saiful Nurdin, Dayana; Ahmad, Muhammad Imran; Anuar Zainol Murad, Sohiful; Nizam Mohyar, Shaiful; Harun, Azizi; Hussin, Razaidi
2017-11-01
Sequence alignment have been optimized using several techniques in order to accelerate the computation time to obtain the optimal score by implementing DP-based algorithm into hardware such as FPGA-based platform. During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration time will be the main focus in this paper. There are various approaches to enhance the PE configuration performance that have been done in previous works such as by using serial configuration chain and parallel configuration chain i.e. the configuration data will be loaded into each PEs sequentially and simultaneously respectively. Some researchers have proven that the performance using parallel configuration chain has optimized both the configuration time and area.
Scheduling Jobs with Variable Job Processing Times on Unrelated Parallel Machines
Zhang, Guang-Qian; Wang, Jian-Jun; Liu, Ya-Jing
2014-01-01
m unrelated parallel machines scheduling problems with variable job processing times are considered, where the processing time of a job is a function of its position in a sequence, its starting time, and its resource allocation. The objective is to determine the optimal resource allocation and the optimal schedule to minimize a total cost function that dependents on the total completion (waiting) time, the total machine load, the total absolute differences in completion (waiting) times on all machines, and total resource cost. If the number of machines is a given constant number, we propose a polynomial time algorithm to solve the problem. PMID:24982933
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; ...
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
Dynamic load balancing for petascale quantum Monte Carlo applications: The Alias method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sudheer, C. D.; Krishnan, S.; Srinivasan, A.
Diffusion Monte Carlo is the most accurate widely used Quantum Monte Carlo method for the electronic structure of materials, but it requires frequent load balancing or population redistribution steps to maintain efficiency and avoid accumulation of systematic errors on parallel machines. The load balancing step can be a significant factor affecting performance, and will become more important as the number of processing elements increases. We propose a new dynamic load balancing algorithm, the Alias Method, and evaluate it theoretically and empirically. An important feature of the new algorithm is that the load can be perfectly balanced with each process receivingmore » at most one message. It is also optimal in the maximum size of messages received by any process. We also optimize its implementation to reduce network contention, a process facilitated by the low messaging requirement of the algorithm. Empirical results on the petaflop Cray XT Jaguar supercomputer at ORNL showing up to 30% improvement in performance on 120,000 cores. The load balancing algorithm may be straightforwardly implemented in existing codes. The algorithm may also be employed by any method with many near identical computational tasks that requires load balancing.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erdmann, Thorsten; Albert, Philipp J.; Schwarz, Ulrich S.
2013-11-07
Non-processive molecular motors have to work together in ensembles in order to generate appreciable levels of force or movement. In skeletal muscle, for example, hundreds of myosin II molecules cooperate in thick filaments. In non-muscle cells, by contrast, small groups with few tens of non-muscle myosin II motors contribute to essential cellular processes such as transport, shape changes, or mechanosensing. Here we introduce a detailed and analytically tractable model for this important situation. Using a three-state crossbridge model for the myosin II motor cycle and exploiting the assumptions of fast power stroke kinetics and equal load sharing between motors inmore » equivalent states, we reduce the stochastic reaction network to a one-step master equation for the binding and unbinding dynamics (parallel cluster model) and derive the rules for ensemble movement. We find that for constant external load, ensemble dynamics is strongly shaped by the catch bond character of myosin II, which leads to an increase of the fraction of bound motors under load and thus to firm attachment even for small ensembles. This adaptation to load results in a concave force-velocity relation described by a Hill relation. For external load provided by a linear spring, myosin II ensembles dynamically adjust themselves towards an isometric state with constant average position and load. The dynamics of the ensembles is now determined mainly by the distribution of motors over the different kinds of bound states. For increasing stiffness of the external spring, there is a sharp transition beyond which myosin II can no longer perform the power stroke. Slow unbinding from the pre-power-stroke state protects the ensembles against detachment.« less
The Distributed Diagonal Force Decomposition Method for Parallelizing Molecular Dynamics Simulations
Boršnik, Urban; Miller, Benjamin T.; Brooks, Bernard R.; Janežič, Dušanka
2011-01-01
Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007
Collectively loading an application in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2010 CFR
2010-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2012 CFR
2012-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2011 CFR
2011-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2014 CFR
2014-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2013 CFR
2013-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
Increased Energy Delivery for Parallel Battery Packs with No Regulated Bus
NASA Astrophysics Data System (ADS)
Hsu, Chung-Ti
In this dissertation, a new approach to paralleling different battery types is presented. A method for controlling charging/discharging of different battery packs by using low-cost bi-directional switches instead of DC-DC converters is proposed. The proposed system architecture, algorithms, and control techniques allow batteries with different chemistry, voltage, and SOC to be properly charged and discharged in parallel without causing safety problems. The physical design and cost for the energy management system is substantially reduced. Additionally, specific types of failures in the maximum power point tracking (MPPT) in a photovoltaic (PV) system when tracking only the load current of a DC-DC converter are analyzed. The periodic nonlinear load current will lead MPPT realized by the conventional perturb and observe (P&O) algorithm to be problematic. A modified MPPT algorithm is proposed and it still only requires typically measured signals, yet is suitable for both linear and periodic nonlinear loads. Moreover, for a modular DC-DC converter using several converters in parallel, the input power from PV panels is processed and distributed at the module level. Methods for properly implementing distributed MPPT are studied. A new approach to efficient MPPT under partial shading conditions is presented. The power stage architecture achieves fast input current change rate by combining a current-adjustable converter with a few converters operating at a constant current.
A Domain Decomposition Parallelization of the Fast Marching Method
NASA Technical Reports Server (NTRS)
Herrmann, M.
2003-01-01
In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
2017-04-13
modelling code, a parallel benchmark , and a communication avoiding version of the QR algorithm. Further, several improvements to the OmpSs model were...movement; and a port of the dynamic load balancing library to OmpSs. Finally, several updates to the tools infrastructure were accomplished, including: an...OmpSs: a basic algorithm on image processing applications, a mini application representative of an ocean modelling code, a parallel benchmark , and a
A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics
NASA Technical Reports Server (NTRS)
Kurdila, Andrew J.
1990-01-01
While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.
Formalization, equivalence and generalization of basic resonance electrical circuits
NASA Astrophysics Data System (ADS)
Penev, Dimitar; Arnaudov, Dimitar; Hinov, Nikolay
2017-12-01
In the work are presented basic resonance circuits, which are used in resonance energy converters. The following resonant circuits are considered: serial, serial with parallel load parallel capacitor, parallel and parallel with serial loaded inductance. For the circuits under consideration, expressions are generated for the frequencies of own oscillations and for the equivalence of the active power emitted in the load. Mathematical expressions are graphically constructed and verified using computer simulations. The results obtained are used in the model based design of resonant energy converters with DC or AC output. This guaranteed the output indicators of power electronic devices.
Farana, Roman; Jandacka, Daniel; Uchytil, Jaroslav; Zahradnik, David; Irwin, Gareth
2017-01-01
The aim of this study was to examine the biomechanical injury risk factors at the wrist, including joint kinetics, kinematics and stiffness in the first and second contact limb for parallel and T-shape round-off (RO) techniques. Seven international-level female gymnasts performed 10 trials of the RO to back handspring with parallel and T-shape hand positions. Synchronised kinematic (3D motion analysis system; 247 Hz) and kinetic (two force plates; 1235 Hz) data were collected for each trial. A two-way repeated measure analysis of variance (ANOVA) assessed differences in the kinematic and kinetic parameters between the techniques for each contact limb. The main findings highlighted that in both the RO techniques, the second contact limb wrist joint is exposed to higher mechanical loads than the first contact limb demonstrated by increased axial compression force and loading rate. In the parallel technique, the second contact limb wrist joint is exposed to higher axial compression load. Differences between wrist joint kinetics highlight that the T-shape technique may potentially lead to reducing these bio-physical loads and consequently protect the second contact limb wrist joint from overload and biological failure. Highlighting the biomechanical risk factors facilitates the process of technique selection making more objective and safe.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Sohn, Andrew
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load imbalance among processors on a parallel machine. This paper describes the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution cost is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35% of the mesh is randomly adapted. For large-scale scientific computations, our load balancing strategy gives almost a sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remapper yields processor assignments that are less than 3% off the optimal solutions but requires only 1% of the computational time.
Knee Joint Kinetics in Relation to Commonly Prescribed Squat Loads and Depths
Cotter, Joshua A.; Chaudhari, Ait M.; Jamison, Steve T.; Devor, Steven T.
2014-01-01
Controversy exists regarding the safety and performance benefits of performing the squat exercise to depths beyond 90° of knee flexion. Our aim was to compare the net peak external knee flexion moments (pEKFM) experienced over typical ranges of squat loads and depths. Sixteen recreationally trained males (n = 16; 22.7 ± 1.1 yrs; 85.4 ± 2.1 kg; 177.6 ± 0.96 cm; mean ± SEM) with no previous lower limb surgeries or other orthopedic issues and at least one year of consistent resistance training experience while utilizing the squat exercise performed single repetition squat trials in a random order at squat depths of above parallel, parallel, and below parallel. Less than one week before testing, one repetition maximum (1RM) values were found for each squat depth. Subsequent testing required subjects to perform squats at the three depths with three different loads: unloaded, 50% 1RM, and 85% 1RM (nine total trials). Force platform and kinematic data were collected to calculate pEKFM. To assess differences among loads and depths, a two-factor (load and depth) repeated-measures ANOVA with significance set at the P < 0.05 level was used. Squat 1RM significantly decreased 13.6% from the above parallel to parallel squat and another 3.6% from the parallel to the below parallel squat (P < 0.05). Net peak external knee flexion moments significantly increased as both squat depth and load were increased (P ≤ 0.02). Slopes of pEKFM were greater from unloaded to 50% 1RM than when progressing from 50% to 85% 1RM (P < 0.001). The results suggest that that typical decreases in squat loads used with increasing depths are not enough to offset increases in pEKFM. PMID:23085977
Planning and Resource Management in an Intelligent Automated Power Management System
NASA Technical Reports Server (NTRS)
Morris, Robert A.
1991-01-01
Power system management is a process of guiding a power system towards the objective of continuous supply of electrical power to a set of loads. Spacecraft power system management requires planning and scheduling, since electrical power is a scarce resource in space. The automation of power system management for future spacecraft has been recognized as an important R&D goal. Several automation technologies have emerged including the use of expert systems for automating human problem solving capabilities such as rule based expert system for fault diagnosis and load scheduling. It is questionable whether current generation expert system technology is applicable for power system management in space. The objective of the ADEPTS (ADvanced Electrical Power management Techniques for Space systems) is to study new techniques for power management automation. These techniques involve integrating current expert system technology with that of parallel and distributed computing, as well as a distributed, object-oriented approach to software design. The focus of the current study is the integration of new procedures for automatically planning and scheduling loads with procedures for performing fault diagnosis and control. The objective is the concurrent execution of both sets of tasks on separate transputer processors, thus adding parallelism to the overall management process.
NASA Astrophysics Data System (ADS)
Furuichi, Mikito; Nishiura, Daisuke
2017-10-01
We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
Dynamic load balancing of applications
Wheat, Stephen R.
1997-01-01
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated.
A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database
Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...
2013-01-01
Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order in amore » 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less
Rosso, Diego; Lothman, Sarah E; Jeung, Matthew K; Pitt, Paul; Gellner, W James; Stone, Alan L; Howard, Don
2011-11-15
Integrated fixed-film activated sludge (IFAS) processes are becoming more popular for both secondary and sidestream treatment in wastewater facilities. These processes are a combination of biofilm reactors and activated sludge processes, achieved by introducing and retaining biofilm carrier media in activated sludge reactors. A full-scale train of three IFAS reactors equipped with AnoxKaldnes media and coarse-bubble aeration was tested using off-gas analysis. This was operated independently in parallel to an existing full-scale activated sludge process. Both processes achieved the same percent removal of COD and ammonia, despite the double oxygen demand on the IFAS reactors. In order to prevent kinetic limitations associated with DO diffusional gradients through the IFAS biofilm, this systems was operated at an elevated dissolved oxygen concentration, in line with the manufacturer's recommendation. Also, to avoid media coalescence on the reactor surface and promote biofilm contact with the substrate, high mixing requirements are specified. Therefore, the air flux in the IFAS reactors was much higher than that of the parallel activated sludge reactors. However, the standardized oxygen transfer efficiency in process water was almost same for both processes. In theory, when the oxygen transfer efficiency is the same, the air used per unit load removed should be the same. However, due to the high DO and mixing requirements, the IFAS reactors were characterized by elevated air flux and air use per unit load treated. This directly reflected in the relative energy footprint for aeration, which in this case was much higher for the IFAS system than activated sludge. Copyright © 2011 Elsevier Ltd. All rights reserved.
Visualization Co-Processing of a CFD Simulation
NASA Technical Reports Server (NTRS)
Vaziri, Arsi
1999-01-01
OVERFLOW, a widely used CFD simulation code, is combined with a visualization system, pV3, to experiment with an environment for simulation/visualization co-processing on a SGI Origin 2000 computer(O2K) system. The shared memory version of the solver is used with the O2K 'pfa' preprocessor invoked to automatically discover parallelism in the source code. No other explicit parallelism is enabled. In order to study the scaling and performance of the visualization co-processing system, sample runs are made with different processor groups in the range of 1 to 254 processors. The data exchange between the visualization system and the simulation system is rapid enough for user interactivity when the problem size is small. This shared memory version of OVERFLOW, with minimal parallelization, does not scale well to an increasing number of available processors. The visualization task takes about 18 to 30% of the total processing time and does not appear to be a major contributor to the poor scaling. Improper load balancing and inter-processor communication overhead are contributors to this poor performance. Work is in progress which is aimed at obtaining improved parallel performance of the solver and removing the limitations of serial data transfer to pV3 by examining various parallelization/communication strategies, including the use of the explicit message passing.
Gathmann, Bettina; Schulte, Frank P; Maderwald, Stefan; Pawlikowski, Mirko; Starcke, Katrin; Schäfer, Lena C; Schöler, Tobias; Wolf, Oliver T; Brand, Matthias
2014-03-01
Stress and additional load on the executive system, produced by a parallel working memory task, impair decision making under risk. However, the combination of stress and a parallel task seems to preserve the decision-making performance [e.g., operationalized by the Game of Dice Task (GDT)] from decreasing, probably by a switch from serial to parallel processing. The question remains how the brain manages such demanding decision-making situations. The current study used a 7-tesla magnetic resonance imaging (MRI) system in order to investigate the underlying neural correlates of the interaction between stress (induced by the Trier Social Stress Test), risky decision making (GDT), and a parallel executive task (2-back task) to get a better understanding of those behavioral findings. The results show that on a behavioral level, stressed participants did not show significant differences in task performance. Interestingly, when comparing the stress group (SG) with the control group, the SG showed a greater increase in neural activation in the anterior prefrontal cortex when performing the 2-back task simultaneously with the GDT than when performing each task alone. This brain area is associated with parallel processing. Thus, the results may suggest that in stressful dual-tasking situations, where a decision has to be made when in parallel working memory is demanded, a stronger activation of a brain area associated with parallel processing takes place. The findings are in line with the idea that stress seems to trigger a switch from serial to parallel processing in demanding dual-tasking situations.
NASA Astrophysics Data System (ADS)
Koltsov, A. G.; Shamutdinov, A. H.; Blokhin, D. A.; Krivonos, E. V.
2018-01-01
A new classification of parallel kinematics mechanisms on symmetry coefficient, being proportional to mechanism stiffness and accuracy of the processing product using the technological equipment under study, is proposed. A new version of the Stewart platform with a high symmetry coefficient is presented for analysis. The workspace of the mechanism under study is described, this space being a complex solid figure. The workspace end points are reached by the center of the mobile platform which moves in parallel related to the base plate. Parameters affecting the processing accuracy, namely the static and dynamic stiffness, natural vibration frequencies are determined. The capability assessment of the mechanism operation under various loads, taking into account resonance phenomena at different points of the workspace, was conducted. The study proved that stiffness and therefore, processing accuracy with the use of the above mentioned mechanisms are comparable with the stiffness and accuracy of medium-sized series-produced machines.
Performance enhancement of various real-time image processing techniques via speculative execution
NASA Astrophysics Data System (ADS)
Younis, Mohamed F.; Sinha, Purnendu; Marlowe, Thomas J.; Stoyenko, Alexander D.
1996-03-01
In real-time image processing, an application must satisfy a set of timing constraints while ensuring the semantic correctness of the system. Because of the natural structure of digital data, pure data and task parallelism have been used extensively in real-time image processing to accelerate the handling time of image data. These types of parallelism are based on splitting the execution load performed by a single processor across multiple nodes. However, execution of all parallel threads is mandatory for correctness of the algorithm. On the other hand, speculative execution is an optimistic execution of part(s) of the program based on assumptions on program control flow or variable values. Rollback may be required if the assumptions turn out to be invalid. Speculative execution can enhance average, and sometimes worst-case, execution time. In this paper, we target various image processing techniques to investigate applicability of speculative execution. We identify opportunities for safe and profitable speculative execution in image compression, edge detection, morphological filters, and blob recognition.
Multi-Kilowatt Power Module for High-Power Hall Thrusters
NASA Technical Reports Server (NTRS)
Pinero, Luis R.; Bowers, Glen E.
2005-01-01
Future NASA missions will require high-performance electric propulsion systems. Hall thrusters are being developed at NASA Glenn for high-power, high-specific impulse operation. These thrusters operate at power levels up to 50 kW of power and discharge voltages in excess of 600 V. A parallel effort is being conducted to develop power electronics for these thrusters that push the technology beyond the 5kW state-of-the-art power level. A 10 kW power module was designed to produce an output of 500 V and 20 A from a nominal 100 V input. Resistive load tests revealed efficiencies in excess of 96 percent. Load current share and phase synchronization circuits were designed and tested that will allow connecting multiple modules in parallel to process higher power.
Dynamic load balancing of applications
Wheat, S.R.
1997-05-13
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers is disclosed. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. 13 figs.
NASA Technical Reports Server (NTRS)
Ishai, O.; Garg, A.; Nelson, H. G.
1986-01-01
The critical load levels and associated cracking beyond which a multidirectional laminate can be considered as structurally failed has been determined by loading graphite fiber-reinforced epoxy laminates to different strain levels up to ultimate failure. Transverse matrix cracking was monitored by acoustic and optical methods. The residual stiffness and strength parallel and perpendicular to the cracks were determined and related to the environmental/loading history. Within the range of experimental conditions studied, it is concluded that the transverse cracking process does not have a crucial effect on the structural performance of multidirectional composite laminates.
Parallel Performance Optimizations on Unstructured Mesh-based Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas
2015-01-01
© The Authors. Published by Elsevier B.V. This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cachemore » efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
NASA Astrophysics Data System (ADS)
Li, S. H.; Zhu, W. C.; Niu, L. L.; Yu, M.; Chen, C. F.
2018-06-01
A split Hopkinson pressure bar apparatus driven by a pendulum hammer was used to perform uniaxial compression tests to examine the degradation process of green sandstone subjected to repetitive impact loading. The acoustic characteristics, dissipated energy, deformation characteristics, and microstructure evolution were investigated. The representative stress-strain curve can be broken into five stages that were characterized by changes in the axial strain response during impact loading. Both the ultrasonic wave velocity and cumulative dissipated energy exhibited obvious three-stage behavior with respect to the impact number. As the impact number increased, more than one peak was observed in the frequency spectra, and the relative weight of the peak frequency increased in the low-frequency range. According to the evolution of the ultrasonic wave velocity, the degradation process was divided into three stages. By comparing the intact stage I and early stage II microcrack development patterns, the initiation of new cracks and elongation of existing cracks were identified as the main degradation mechanisms. Furthermore, a slight increase in the number of cracks was observed, and microcrack lengths steadily increased. Moreover, due to the low level of microcrack damage, the deformation mechanism was mainly characterized by volume compression during impact loading. In late stage II, the main degradation mechanism was the elongation of existing cracks. Additionally, as microcracks accumulated in the rock samples, cracks were arranged parallel to the loading direction, which led to volume dilation. In stage III, microcracks continued to elongate nearly parallel to the loading direction and then linked to each other, which led to intense degradation in the rock samples. In this stage, rock sample deformation was mainly characterized by volume dilation during impact loading. Finally, rock samples were split into blocks with fractures oriented subparallel to the loading direction. These results can improve the understanding of the stability evaluations of rock structures subjected to repetitive impact loading.
Algorithms for parallel flow solvers on message passing architectures
NASA Technical Reports Server (NTRS)
Vanderwijngaart, Rob F.
1995-01-01
The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those immediately adjacent to them, then the first processor in the pipeline will receive a computational load that is less than that of subsequent processors, magnifying the pipeline slowdown effect. Extra compensation is needed for grid boundary effects, even if all grid blocks are equally sized.
Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Sohn, Andrew
1996-01-01
Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.
Knee Kinetics during Squats of Varying Loads and Depths in Recreationally Trained Females.
Flores, Victoria; Becker, James; Burkhardt, Eric; Cotter, Joshua
2018-03-06
The back squat exercise is typically practiced with varying squat depths and barbell loads. However, depth has been inconsistently defined, resulting in unclear safety precautions when squatting with loads. Additionally, females exhibit anatomical and kinematic differences to males which may predispose them to knee joint injuries. The purpose of this study was to characterize peak knee extensor moments (pKEMs) at three commonly practiced squat depths of above parallel, parallel, and full depth, and with three loads of 0% (unloaded), 50%, and 85% depth-specific one repetition maximum (1RM) in recreationally active females. Nineteen females (age, 25.1 ± 5.8 years; body mass, 62.5 ± 10.2 kg; height, 1.6 ± 0.10 m; mean ± SD) performed squats of randomized depth and load. Inverse dynamics were used to obtain pKEMs from three-dimensional knee kinematics. Depth and load had significant interaction effects on pKEMs (p = 0.014). Significantly greater pKEMs were observed at full depth compared to parallel depth with 50% 1RM load (p = 0.001, d = 0.615), and 85% 1RM load (p = 0.010, d = 0.714). Greater pKEMs were also observed at full depth compared to above parallel depth with 50% 1RM load (p = 0.003, d = 0.504). Results indicate effect of load on female pKEMs do not follow a progressively increasing pattern with either increasing depth or load. Therefore, when high knee loading is a concern, individuals are must carefully consider both the depth of squat being performed and the relative load they are using.
Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.
1999-01-01
The ability to dynamically adapt an unstructured grid is a powerful tool for efficiently solving computational problems with evolving physical features. In this paper, we report on our experience parallelizing an edge-based adaptation scheme, called 3D_TAG. using message passing. Results show excellent speedup when a realistic helicopter rotor mesh is randomly refined. However. performance deteriorates when the mesh is refined using a solution-based error indicator since mesh adaptation for practical problems occurs in a localized region., creating a severe load imbalance. To address this problem, we have developed PLUM, a global dynamic load balancing framework for adaptive numerical computations. Even though PLUM primarily balances processor workloads for the solution phase, it reduces the load imbalance problem within mesh adaptation by repartitioning the mesh after targeting edges for refinement but before the actual subdivision. This dramatically improves the performance of parallel 3D_TAG since refinement occurs in a more load balanced fashion. We also present optimal and heuristic algorithms that, when applied to the default mapping of a parallel repartitioner, significantly reduce the data redistribution overhead. Finally, portability is examined by comparing performance on three state-of-the-art parallel machines.
Data Partitioning and Load Balancing in Parallel Disk Systems
NASA Technical Reports Server (NTRS)
Scheuermann, Peter; Weikum, Gerhard; Zabback, Peter
1997-01-01
Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible waves, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent, self-reliant file system that aims to optimize striping by taking into account the requirements of the applications and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces.
Jayaprakash, Namita; Ali, Rashid; Kashyap, Rahul; Bennett, Courtney; Kogan, Alexander; Gajic, Ognjen
2016-08-31
Diagnostic error and delay are critical impediments to the safety of critically ill patients. Checklist for early recognition and treatment of acute illness and injury (CERTAIN) has been developed as a tool that facilitates timely and error-free evaluation of critically ill patients. While the focused history is an essential part of the CERTAIN framework, it is not clear how best to choreograph this step in the process of evaluation and treatment of the acutely decompensating patient. An un-blinded crossover clinical simulation study was designed in which volunteer critical care clinicians (fellows and attendings) were randomly assigned to start with either obtaining a focused history choreographed in series (after) or in parallel to the primary survey. A focused history was obtained using the standardized SAMPLE model that is incorporated into American College of Trauma Life Support (ATLS) and Pediatric Advanced Life Support (PALS). Clinicians were asked to assess six acutely decompensating patients using pre - determined clinical scenarios (three in series choreography, three in parallel). Once the initial choreography was completed the clinician would crossover to the alternative choreography. The primary outcome was the cognitive burden assessed through the NASA task load index. Secondary outcome was time to completion of a focused history. A total of 84 simulated cases (42 in parallel, 42 in series) were tested on 14 clinicians. Both the overall cognitive load and time to completion improved with each successive practice scenario, however no difference was observed between the series versus parallel choreographies. The median (IQR) overall NASA TLX task load index for series was 39 (17 - 58) and for parallel 43 (27 - 52), p = 0.57. The median (IQR) time to completion of the tasks in series was 125 (112 - 158) seconds and in parallel 122 (108 - 158) seconds, p = 0.92. In this clinical simulation study assessing the incorporation of a focused history into the primary survey of a non-trauma critically ill patient, there was no difference in cognitive burden or time to task completion when using series choreography (after the exam) compared to parallel choreography (concurrent with the primary survey physical exam). However, with repetition of the task both overall task load and time to completion improved in each of the choreographies.
Parallel-Batch Scheduling and Transportation Coordination with Waiting Time Constraint
Gong, Hua; Chen, Daheng; Xu, Ke
2014-01-01
This paper addresses a parallel-batch scheduling problem that incorporates transportation of raw materials or semifinished products before processing with waiting time constraint. The orders located at the different suppliers are transported by some vehicles to a manufacturing facility for further processing. One vehicle can load only one order in one shipment. Each order arriving at the facility must be processed in the limited waiting time. The orders are processed in batches on a parallel-batch machine, where a batch contains several orders and the processing time of the batch is the largest processing time of the orders in it. The goal is to find a schedule to minimize the sum of the total flow time and the production cost. We prove that the general problem is NP-hard in the strong sense. We also demonstrate that the problem with equal processing times on the machine is NP-hard. Furthermore, a dynamic programming algorithm in pseudopolynomial time is provided to prove its ordinarily NP-hardness. An optimal algorithm in polynomial time is presented to solve a special case with equal processing times and equal transportation times for each order. PMID:24883385
Comprehensive analysis of helicopters with bearingless rotors
NASA Technical Reports Server (NTRS)
Murthy, V. R.
1988-01-01
A modified Galerkin method is developed to analyze the dynamic problems of multiple-load-path bearingless rotor blades. The development and selection of functions are quite parallel to CAMRAD procedures, greatly facilitating the implementation of the method into the CAMRAD program. A software is developed implementing the modified Galerkin method to determine free vibration characteristics of multiple-load-path rotor blades undergoing coupled flapwise bending, chordwise bending, twisting, and extensional motions. Results are in the process of being obtained by debugging the software.
Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER.
Ferreira, Miguel; Roma, Nuno; Russo, Luis M S
2014-05-30
HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar's striped processing pattern with Intel SSE2 instruction set extension. A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model's size.
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers
NASA Technical Reports Server (NTRS)
Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)
1997-01-01
The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
Böhme, David; Geimer, Markus; Arnold, Lukas; ...
2016-07-20
Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present amore » scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.« less
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Böhme, David; Geimer, Markus; Arnold, Lukas
Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present amore » scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. Ultimately, by replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.« less
The Feasibility of Adaptive Unstructured Computations On Petaflops Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Heber, Gerd; Gao, Guang; Saini, Subhash (Technical Monitor)
1999-01-01
This viewgraph presentation covers the advantages of mesh adaptation, unstructured grids, and dynamic load balancing. It illustrates parallel adaptive communications, and explains PLUM (Parallel dynamic load balancing for adaptive unstructured meshes), and PSAW (Proper Self Avoiding Walks).
3-D modeling of ductile tearing using finite elements: Computational aspects and techniques
NASA Astrophysics Data System (ADS)
Gullerud, Arne Stewart
This research focuses on the development and application of computational tools to perform large-scale, 3-D modeling of ductile tearing in engineering components under quasi-static to mild loading rates. Two standard models for ductile tearing---the computational cell methodology and crack growth controlled by the crack tip opening angle (CTOA)---are described and their 3-D implementations are explored. For the computational cell methodology, quantification of the effects of several numerical issues---computational load step size, procedures for force release after cell deletion, and the porosity for cell deletion---enables construction of computational algorithms to remove the dependence of predicted crack growth on these issues. This work also describes two extensions of the CTOA approach into 3-D: a general 3-D method and a constant front technique. Analyses compare the characteristics of the extensions, and a validation study explores the ability of the constant front extension to predict crack growth in thin aluminum test specimens over a range of specimen geometries, absolutes sizes, and levels of out-of-plane constraint. To provide a computational framework suitable for the solution of these problems, this work also describes the parallel implementation of a nonlinear, implicit finite element code. The implementation employs an explicit message-passing approach using the MPI standard to maintain portability, a domain decomposition of element data to provide parallel execution, and a master-worker organization of the computational processes to enhance future extensibility. A linear preconditioned conjugate gradient (LPCG) solver serves as the core of the solution process. The parallel LPCG solver utilizes an element-by-element (EBE) structure of the computations to permit a dual-level decomposition of the element data: domain decomposition of the mesh provides efficient coarse-grain parallel execution, while decomposition of the domains into blocks of similar elements (same type, constitutive model, etc.) provides fine-grain parallel computation on each processor. A major focus of the LPCG solver is a new implementation of the Hughes-Winget element-by-element (HW) preconditioner. The implementation employs a weighted dependency graph combined with a new coloring algorithm to provide load-balanced scheduling for the preconditioner and overlapped communication/computation. This approach enables efficient parallel application of the HW preconditioner for arbitrary unstructured meshes.
Comparing the Performance of Two Dynamic Load Distribution Methods
NASA Technical Reports Server (NTRS)
Kale, L. V.
1987-01-01
Parallel processing of symbolic computations on a message-passing multi-processor presents one challenge: To effectively utilize the available processors, the load must be distributed uniformly to all the processors. However, the structure of these computations cannot be predicted in advance. go, static scheduling methods are not applicable. In this paper, we compare the performance of two dynamic, distributed load balancing methods with extensive simulation studies. The two schemes are: the Contracting Within a Neighborhood (CWN) scheme proposed by us, and the Gradient Model proposed by Lin and Keller. We conclude that although simpler, the CWN is significantly more effective at distributing the work than the Gradient model.
NASA Technical Reports Server (NTRS)
Krasteva, Denitza T.
1998-01-01
Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
Distributing an executable job load file to compute nodes in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gooding, Thomas M.
Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Rizk, Yehia M.
1999-01-01
The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
Rabalais, R David; Burger, Evalina; Lu, Yun; Mansour, Alfred; Baratta, Richard V
2008-02-01
This study compared the biomechanical properties of 2 tension-band techniques with stainless steel wire and ultra high molecular weight polyethylene (UHMWPE) cable in a patella fracture model. Transverse patella fractures were simulated in 8 cadaver knees and fixated with figure-of-8 and parallel wire configurations in combination with Kirschner wires. Identical configurations were tested with UHMWPE cable. Specimens were mounted to a testing apparatus and the quadriceps was used to extend the knees from 90 degrees to 0 degrees; 4 knees were tested under monotonic loading, and 4 knees were tested under cyclic loading. Under monotonic loading, average fracture gap was 0.50 and 0.57 mm for steel wire and UHMWPE cable, respectively, in the figure-of-8 construct compared with 0.16 and 0.04 mm, respectively, in the parallel wire construct. Under cyclic loading, average fracture gap was 1.45 and 1.66 mm for steel wire and UHMWPE cable, respectively, in the figure-of-8 construct compared with 0.45 and 0.60 mm, respectively, in the parallel wire construct. A statistically significant effect of technique was found, with the parallel wire construct performing better than the figure-of-8 construct in both loading models. There was no effect of material or interaction. In this biomechanical model, parallel wires performed better than the figure-of-8 configuration in both loading regimens, and UHMWPE cable performed similarly to 18-gauge steel wire.
NASA Technical Reports Server (NTRS)
Watson, Brian; Kamat, M. P.
1990-01-01
Element-by-element preconditioned conjugate gradient (EBE-PCG) algorithms have been advocated for use in parallel/vector processing environments as being superior to the conventional LDL(exp T) decomposition algorithm for single load cases. Although there may be some advantages in using such algorithms for a single load case, when it comes to situations involving multiple load cases, the LDL(exp T) decomposition algorithm would appear to be decidedly more cost-effective. The authors have outlined an EBE-PCG algorithm suitable for multiple load cases and compared its effectiveness to the highly efficient LDL(exp T) decomposition scheme. The proposed algorithm offers almost no advantages over the LDL(exp T) algorithm for the linear problems investigated on the Alliant FX/8. However, there may be some merit in the algorithm in solving nonlinear problems with load incrementation, but that remains to be investigated.
O'Sullivan, G.A.; O'Sullivan, J.A.
1999-07-27
In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.
O'Sullivan, George A.; O'Sullivan, Joseph A.
1999-01-01
In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
Mechanical signals in plant development: a new method for single cell studies
NASA Technical Reports Server (NTRS)
Lynch, T. M.; Lintilhac, P. M.
1997-01-01
Cell division, which is critical to plant development and morphology, requires the orchestration of hundreds of intracellular processes. In the end, however, cells must make critical decisions, based on a discrete set of mechanical signals such as stress, strain, and shear, to divide in such a way that they will survive the mechanical loads generated by turgor pressure and cell enlargement within the growing tissues. Here we report on a method whereby tobacco protoplasts swirled into a 1.5% agarose entrapment medium will survive and divide. The application of a controlled mechanical load to agarose blocks containing protoplasts orients the primary division plane of the embedded cells. Photoelastic analysis of the agarose entrapment medium can identify the lines of principal stress within the agarose, confirming the hypothesis that cells divide either parallel or perpendicular to the principal stress tensors. The coincidence between the orientation of the new division wall and the orientation of the principal stress tensors suggests that the perception of mechanical stress is a characteristic of individual plant cells. The ability of a cell to determine a shear-free orientation for a new partition wall may be related to the applied load through the deformation of the matrix material. In an isotropic matrix a uniaxial load will produce a rotationally symmetric strain field, which will define a shear-free plane. Where high stress intensities combine with the loading geometry to produce multiaxial loads there will be no axis of rotational symmetry and hence no shear free plane. This suggests that two mechanisms may be orienting the division plane, one a mechanism that works in rotationally symmetrical fields, yielding divisions perpendicular to the compressive tensor, parallel to the long axis of the cell, and one in asymmetric fields, yielding divisions parallel to the short axis of the cell and the compressive tensor.
NASA Astrophysics Data System (ADS)
Kala, Jiří; Kala, Zdeněk
2011-09-01
The objective of the paper is to analyze the influence of initial imperfections on the behaviour of thin-walled girders welded of slender plate elements. In parallel with experiments, one of the ultimate load tests was computer modelled. In so doing, the girder was modelled, using the geometrically and materially non-linear variant of the shell finite element method, by the ANSYS program. The shape changing during loading process is often accompanying with sudden "snap-through" i. e. rapid curvature change.
Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER
2014-01-01
Background HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar’s striped processing pattern with Intel SSE2 instruction set extension. Results A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. Conclusions The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model’s size. PMID:24884826
A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam
In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.
Hielscher, Andreas H; Bartel, Sebastian
2004-02-01
Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kodavasal, Janardhan; Harms, Kevin; Srivastava, Priyesh
A closed-cycle gasoline compression ignition engine simulation near top dead center (TDC) was used to profile the performance of a parallel commercial engine computational fluid dynamics code, as it was scaled on up to 4096 cores of an IBM Blue Gene/Q supercomputer. The test case has 9 million cells near TDC, with a fixed mesh size of 0.15 mm, and was run on configurations ranging from 128 to 4096 cores. Profiling was done for a small duration of 0.11 crank angle degrees near TDC during ignition. Optimization of input/output performance resulted in a significant speedup in reading restart files, andmore » in an over 100-times speedup in writing restart files and files for post-processing. Improvements to communication resulted in a 1400-times speedup in the mesh load balancing operation during initialization, on 4096 cores. An improved, “stiffness-based” algorithm for load balancing chemical kinetics calculations was developed, which results in an over 3-times faster run-time near ignition on 4096 cores relative to the original load balancing scheme. With this improvement to load balancing, the code achieves over 78% scaling efficiency on 2048 cores, and over 65% scaling efficiency on 4096 cores, relative to 256 cores.« less
García-Grajales, Julián A.; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine
2015-01-01
With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite—explicit and implicit—were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented dendritic tree, and a damaged axon. The capabilities of the program to deal with large scale scenarios, segmented neuronal structures, and functional deficits under mechanical loading are specifically highlighted. PMID:25680098
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations
NASA Technical Reports Server (NTRS)
Chrisochoides, Nikos
1995-01-01
We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
14 CFR 23.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2011 CFR
2011-01-01
...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2011-01-01 2011-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...
14 CFR 23.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2013 CFR
2013-01-01
...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...
14 CFR 23.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2014 CFR
2014-01-01
...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...
14 CFR 23.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2012 CFR
2012-01-01
...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...
14 CFR 23.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2010 CFR
2010-01-01
...) K=24 for vertical surfaces; (2) K=12 for horizontal surfaces; and (3) W=weight of the movable... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Loads parallel to hinge line. 23.393 Section 23.393 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF TRANSPORTATION...
Tile-based Level of Detail for the Parallel Age
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niski, K; Cohen, J D
Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Parallel discrete-event simulation of FCFS stochastic queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jain, Atul K.
The overall objectives of this DOE funded project is to combine scientific and computational challenges in climate modeling by expanding our understanding of the biogeophysical-biogeochemical processes and their interactions in the northern high latitudes (NHLs) using an earth system modeling (ESM) approach, and by adopting an adaptive parallel runtime system in an ESM to achieve efficient and scalable climate simulations through improved load balancing algorithms.
Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study
DOE Office of Scientific and Technical Information (OSTI.GOV)
E, J. C.; Huang, J. Y.; Bie, B. X.
Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less
Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study
E, J. C.; Huang, J. Y.; Bie, B. X.; ...
2016-08-02
Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters
Bajaj, Chandrajit
2009-01-01
Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers
1991-07-01
DTIC ~ ELECTE OCT 2 41991 AD-A242 045 Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers’ Edoardo S. Biagioni Jan F. Prins...Department of Computer Science University of North Carolina Chapel Hill, N.C. 27599-3175 USA biagioni @cs.unc.edu prinsOcs.unc.edu Abstract Scan Directed...MasPar Computer Corpora- tion. Bibliography [1] Edoardo S. Biagioni . Scan Directed Load Balancing. PhD thesis., University of North Carolina, Chapel Hill
Analysis and performance of paralleling circuits for modular inverter-converter systems
NASA Technical Reports Server (NTRS)
Birchenough, A. G.; Gourash, F.
1972-01-01
As part of a modular inverter-converter development program, control techniques were developed to provide load sharing among paralleled inverters or converters. An analysis of the requirements of paralleling circuits and a discussion of the circuits developed and their performance are included in this report. The current sharing was within 5.6 percent of rated-load current for the ac modules and 7.4 percent for the dc modules for an initial output voltage unbalance of 5 volts.
Multicoil resonance-based parallel array for smart wireless power delivery.
Mirbozorgi, S A; Sawan, M; Gosselin, B
2013-01-01
This paper presents a novel resonance-based multicoil structure as a smart power surface to wirelessly power up apparatus like mobile, animal headstage, implanted devices, etc. The proposed powering system is based on a 4-coil resonance-based inductive link, the resonance coil of which is formed by an array of several paralleled coils as a smart power transmitter. The power transmitter employs simple circuit connections and includes only one power driver circuit per multicoil resonance-based array, which enables higher power transfer efficiency and power delivery to the load. The power transmitted by the driver circuit is proportional to the load seen by the individual coil in the array. Thus, the transmitted power scales with respect to the load of the electric/electronic system to power up, and does not divide equally over every parallel coils that form the array. Instead, only the loaded coils of the parallel array transmit significant part of total transmitted power to the receiver. Such adaptive behavior enables superior power, size and cost efficiency then other solutions since it does not need to use complex detection circuitry to find the location of the load. The performance of the proposed structure is verified by measurement results. Natural load detection and covering 4 times bigger area than conventional topologies with a power transfer efficiency of 55% are the novelties of presented paper.
A Parallel Pipelined Renderer for the Time-Varying Volume Data
NASA Technical Reports Server (NTRS)
Chiueh, Tzi-Cker; Ma, Kwan-Liu
1997-01-01
This paper presents a strategy for efficiently rendering time-varying volume data sets on a distributed-memory parallel computer. Time-varying volume data take large storage space and visualizing them requires reading large files continuously or periodically throughout the course of the visualization process. Instead of using all the processors to collectively render one volume at a time, a pipelined rendering process is formed by partitioning processors into groups to render multiple volumes concurrently. In this way, the overall rendering time may be greatly reduced because the pipelined rendering tasks are overlapped with the I/O required to load each volume into a group of processors; moreover, parallelization overhead may be reduced as a result of partitioning the processors. We modify an existing parallel volume renderer to exploit various levels of rendering parallelism and to study how the partitioning of processors may lead to optimal rendering performance. Two factors which are important to the overall execution time are re-source utilization efficiency and pipeline startup latency. The optimal partitioning configuration is the one that balances these two factors. Tests on Intel Paragon computers show that in general optimal partitionings do exist for a given rendering task and result in 40-50% saving in overall rendering time.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC
NASA Astrophysics Data System (ADS)
Alruwaili, Manal
With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
High loading uranium fuel plate
Wiencek, Thomas C.; Domagala, Robert F.; Thresh, Henry R.
1990-01-01
Two embodiments of a high uranium fuel plate are disclosed which contain a meat comprising structured uranium compound confined between a pair of diffusion bonded ductile metal cladding plates uniformly covering the meat, the meat having a uniform high fuel loading comprising a content of uranium compound greater than about 45 Vol. % at a porosity not greater than about 10 Vol. %. In a first embodiment, the meat is a plurality of parallel wires of uranium compound. In a second embodiment, the meat is a dispersion compact containing uranium compound. The fuel plates are fabricated by a hot isostatic pressing process.
Hyttinen, Mika M; Holopainen, Jaakko; René van Weeren, P; Firth, Elwyn C; Helminen, Heikki J; Brama, Pieter A J
2009-01-01
The aim of this study was to record growth-related changes in collagen network organization and proteoglycan distribution in intermittently peak-loaded and continuously lower-level-loaded articular cartilage. Cartilage from the proximal phalangeal bone of the equine metacarpophalangeal joint at birth, at 5, 11 and 18 months, and at 6–10 years of age was collected from two sites. Site 1, at the joint margin, is unloaded at slow gaits but is subjected to high-intensity loading during athletic activity; site 2 is a continuously but less intensively loaded site in the centre of the joint. The degree of collagen parallelism was determined with quantitative polarized light microscopy and the parallelism index for collagen fibrils was computed from the cartilage surface to the osteochondral junction. Concurrent changes in the proteoglycan distribution were quantified with digital densitometry. We found that the parallelism index increased significantly with age (up to 90%). At birth, site 2 exhibited a more organized collagen network than site 1. In adult horses this situation was reversed. The superficial and intermediate zones exhibited the greatest reorganization of collagen. Site 1 had a higher proteoglycan content than site 2 at birth but here too the situation was reversed in adult horses. We conclude that large changes in joint loading during growth and maturation in the period from birth to adulthood profoundly affect the architecture of the collagen network in equine cartilage. In addition, the distribution and content of proteoglycans are modified significantly by altered joint use. Intermittent peak-loading with shear seems to induce higher collagen parallelism and a lower proteoglycan content in cartilage than more constant weight-bearing. Therefore, we hypothesize that the formation of mature articular cartilage with a highly parallel collagen network and relatively low proteoglycan content in the peak-loaded area of a joint is needed to withstand intermittent stress and shear, whereas a constantly weight-bearing joint area benefits from lower collagen parallelism and a higher proteoglycan content. PMID:19732210
Lattice strains and load partitioning in bovine trabecular bone.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Akhtar, R.; Daymond, M. R.; Almer, J. D.
2012-02-01
Microdamage and failure mechanisms have been well characterized in bovine trabecular bone. However, little is known about how elastic strains develop in the apatite crystals of the trabecular struts and their relationship with different deformation mechanisms. In this study, wide-angle high-energy synchrotron X-ray diffraction has been used to determine bulk elastic strains under in situ compression. Dehydrated bone is compared to hydrated bone in terms of their response to load. During compression, load is initially borne by trabeculae aligned parallel to loading direction with non-parallel trabeculae deforming by bending. Ineffective load partitioning is noted in dehydrated bone whereas hydrated bonemore » behaves like a plastically yielding foam« less
Hallock, Michael J.; Stone, John E.; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida
2014-01-01
Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli. Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems. PMID:24882911
Hallock, Michael J; Stone, John E; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida
2014-05-01
Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli . Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems.
Towards scalable Byzantine fault-tolerant replication
NASA Astrophysics Data System (ADS)
Zbierski, Maciej
2017-08-01
Byzantine fault-tolerant (BFT) replication is a powerful technique, enabling distributed systems to remain available and correct even in the presence of arbitrary faults. Unfortunately, existing BFT replication protocols are mostly load-unscalable, i.e. they fail to respond with adequate performance increase whenever new computational resources are introduced into the system. This article proposes a universal architecture facilitating the creation of load-scalable distributed services based on BFT replication. The suggested approach exploits parallel request processing to fully utilize the available resources, and uses a load balancer module to dynamically adapt to the properties of the observed client workload. The article additionally provides a discussion on selected deployment scenarios, and explains how the proposed architecture could be used to increase the dependability of contemporary large-scale distributed systems.
Load Balancing Strategies for Multiphase Flows on Structured Grids
NASA Astrophysics Data System (ADS)
Olshefski, Kristopher; Owkes, Mark
2017-11-01
The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.
Kinematics and dynamics analysis of a quadruped walking robot with parallel leg mechanism
NASA Astrophysics Data System (ADS)
Wang, Hongbo; Sang, Lingfeng; Hu, Xing; Zhang, Dianfan; Yu, Hongnian
2013-09-01
It is desired to require a walking robot for the elderly and the disabled to have large capacity, high stiffness, stability, etc. However, the existing walking robots cannot achieve these requirements because of the weight-payload ratio and simple function. Therefore, Improvement of enhancing capacity and functions of the walking robot is an important research issue. According to walking requirements and combining modularization and reconfigurable ideas, a quadruped/biped reconfigurable walking robot with parallel leg mechanism is proposed. The proposed robot can be used for both a biped and a quadruped walking robot. The kinematics and performance analysis of a 3-UPU parallel mechanism which is the basic leg mechanism of a quadruped walking robot are conducted and the structural parameters are optimized. The results show that performance of the walking robot is optimal when the circumradius R, r of the upper and lower platform of leg mechanism are 161.7 mm, 57.7 mm, respectively. Based on the optimal results, the kinematics and dynamics of the quadruped walking robot in the static walking mode are derived with the application of parallel mechanism and influence coefficient theory, and the optimal coordination distribution of the dynamic load for the quadruped walking robot with over-determinate inputs is analyzed, which solves dynamic load coupling caused by the branches’ constraint of the robot in the walk process. Besides laying a theoretical foundation for development of the prototype, the kinematics and dynamics studies on the quadruped walking robot also boost the theoretical research of the quadruped walking and the practical applications of parallel mechanism.
NASA Astrophysics Data System (ADS)
Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang
2018-04-01
The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
A compositional reservoir simulator on distributed memory parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rame, M.; Delshad, M.
1995-12-31
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Li, Ying-Jun; Yang, Cong; Wang, Gui-Cong; Zhang, Hui; Cui, Huan-Yong; Zhang, Yong-Liang
2017-09-01
This paper presents a novel integrated piezoelectric six-dimensional force sensor which can realize dynamic measurement of multi-dimensional space load. Firstly, the composition of the sensor, the spatial layout of force-sensitive components, and measurement principle are analyzed and designed. There is no interference of piezoelectric six-dimensional force sensor in theoretical analysis. Based on the principle of actual work and deformation compatibility coherence, this paper deduces the parallel load sharing principle of the piezoelectric six-dimensional force sensor. The main effect factors which affect the load sharing ratio are obtained. The finite element model of the piezoelectric six-dimensional force sensor is established. In order to verify the load sharing principle of the sensor, a load sharing test device of piezoelectric force sensor is designed and fabricated. The load sharing experimental platform is set up. The experimental results are in accordance with the theoretical analysis and simulation results. The experiments show that the multi-dimensional and heavy force measurement can be realized by the parallel arrangement of the load sharing ring and the force sensitive element in the novel integrated piezoelectric six-dimensional force sensor. The ideal load sharing effect of the sensor can be achieved by appropriate size parameters. This paper has an important guide for the design of the force measuring device according to the load sharing mode. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
New Parallel computing framework for radiation transport codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostin, M.A.; /Michigan State U., NSCL; Mokhov, N.V.
A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility canmore » be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
Load Balancing Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pearce, Olga Tkachyshyn
2014-12-01
The largest supercomputers have millions of independent processors, and concurrency levels are rapidly increasing. For ideal efficiency, developers of the simulations that run on these machines must ensure that computational work is evenly balanced among processors. Assigning work evenly is challenging because many large modern parallel codes simulate behavior of physical systems that evolve over time, and their workloads change over time. Furthermore, the cost of imbalanced load increases with scale because most large-scale scientific simulations today use a Single Program Multiple Data (SPMD) parallel programming model, and an increasing number of processors will wait for the slowest one atmore » the synchronization points. To address load imbalance, many large-scale parallel applications use dynamic load balance algorithms to redistribute work evenly. The research objective of this dissertation is to develop methods to decide when and how to load balance the application, and to balance it effectively and affordably. We measure and evaluate the computational load of the application, and develop strategies to decide when and how to correct the imbalance. Depending on the simulation, a fast, local load balance algorithm may be suitable, or a more sophisticated and expensive algorithm may be required. We developed a model for comparison of load balance algorithms for a specific state of the simulation that enables the selection of a balancing algorithm that will minimize overall runtime.« less
MPgrafic: A parallel MPI version of Grafic-1
NASA Astrophysics Data System (ADS)
Prunet, Simon; Pichon, Christophe
2013-04-01
MPgrafic is a parallel MPI version of Grafic-1 which can produce large cosmological initial conditions on a cluster without requiring shared memory. The real Fourier transforms are carried in place using fftw while minimizing the amount of used memory (at the expense of performance) in the spirit of Grafic-1. The writing of the output file is also carried in parallel. In addition to the technical parallelization, it provides three extensions over Grafic-1: it can produce power spectra with baryon wiggles (DJ Eisenstein and W. Hu, Ap. J. 496);it has the optional ability to load a lower resolution noise map corresponding to the low frequency component which will fix the larger scale modes of the simulation (extra flag 0/1 at the end of the input process) in the spirit of Grafic-2;it can be used in conjunction with constrfield, which generates initial conditions phases from a list of local constraints on density, tidal field density gradient and velocity.
Multisensory architectures for action-oriented perception
NASA Astrophysics Data System (ADS)
Alba, L.; Arena, P.; De Fiore, S.; Listán, J.; Patané, L.; Salem, A.; Scordino, G.; Webb, B.
2007-05-01
In order to solve the navigation problem of a mobile robot in an unstructured environment a versatile sensory system and efficient locomotion control algorithms are necessary. In this paper an innovative sensory system for action-oriented perception applied to a legged robot is presented. An important problem we address is how to utilize a large variety and number of sensors, while having systems that can operate in real time. Our solution is to use sensory systems that incorporate analog and parallel processing, inspired by biological systems, to reduce the required data exchange with the motor control layer. In particular, as concerns the visual system, we use the Eye-RIS v1.1 board made by Anafocus, which is based on a fully parallel mixed-signal array sensor-processor chip. The hearing sensor is inspired by the cricket hearing system and allows efficient localization of a specific sound source with a very simple analog circuit. Our robot utilizes additional sensors for touch, posture, load, distance, and heading, and thus requires customized and parallel processing for concurrent acquisition. Therefore a Field Programmable Gate Array (FPGA) based hardware was used to manage the multi-sensory acquisition and processing. This choice was made because FPGAs permit the implementation of customized digital logic blocks that can operate in parallel allowing the sensors to be driven simultaneously. With this approach the multi-sensory architecture proposed can achieve real time capabilities.
Scalable software architecture for on-line multi-camera video processing
NASA Astrophysics Data System (ADS)
Camplani, Massimo; Salgado, Luis
2011-03-01
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead.
NASA Astrophysics Data System (ADS)
Kobchenko, M.; Pluymakers, A.; Cordonnier, B.; Tairova, A.; Renard, F.
2017-12-01
Time-lapse imaging of fracture network development in organic-rich shales at elevated temperatures while kerogen is retorted allows characterizing the development of microfractures and the onset of primary migration. When the solid organic matter is transformed to hydrocarbons with lower molecular weight, the local pore-pressure increases and drives the propagation of hydro-fractures sub-parallel to the shale lamination. On the scale of samples of several mm size, these fractures can be described as mode I opening, where fracture walls dilate in the direction of minimal compression. However, so far experiments coupled to microtomography in situ imaging have been performed on samples where no load was imposed. Here, an external load was applied perpendicular to the sample laminations and we show that this stress state slows down, but does not stop, the propagation of fracture along bedding. Conversely, microfractures also propagate sub-perpendicular to the shale lamination, creating a percolating network in three dimensions. To monitor this process we have used a uniaxial compaction rig combined with in-situ heating from 50 to 500 deg C, while capturing three-dimensional X-ray microtomography scans at a voxel resolution of 2.2 μm; Data were acquired at beamline ID19 at the European Synchrotron Radiation Facility. In total ten time-resolved experiments were performed at different vertical loading conditions, with and without lateral passive confinement and different heating rates. At high external load the sample fails by symmetric bulging, while at lower external load the reaction-induced fracture network develops with the presence of microfractures both sub-parallel and sub-perpendicular to the bedding direction. In addition, the variation of experimental conditions allows the decoupling of the effects of the hydrocarbon decomposition reaction on the deformation process from the influence of thermal stress heating on the weakening and failure mode of immature shale.
Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy
2017-10-06
The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.
Developing a Hadoop-based Middleware for Handling Multi-dimensional NetCDF
NASA Astrophysics Data System (ADS)
Li, Z.; Yang, C. P.; Schnase, J. L.; Duffy, D.; Lee, T. J.
2014-12-01
Climate observations and model simulations are collecting and generating vast amounts of climate data, and these data are ever-increasing and being accumulated in a rapid speed. Effectively managing and analyzing these data are essential for climate change studies. Hadoop, a distributed storage and processing framework for large data sets, has attracted increasing attentions in dealing with the Big Data challenge. The maturity of Infrastructure as a Service (IaaS) of cloud computing further accelerates the adoption of Hadoop in solving Big Data problems. However, Hadoop is designed to process unstructured data such as texts, documents and web pages, and cannot effectively handle the scientific data format such as array-based NetCDF files and other binary data format. In this paper, we propose to build a Hadoop-based middleware for transparently handling big NetCDF data by 1) designing a distributed climate data storage mechanism based on POSIX-enabled parallel file system to enable parallel big data processing with MapReduce, as well as support data access by other systems; 2) modifying the Hadoop framework to transparently processing NetCDF data in parallel without sequencing or converting the data into other file formats, or loading them to HDFS; and 3) seamlessly integrating Hadoop, cloud computing and climate data in a highly scalable and fault-tolerance framework.
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response
NASA Technical Reports Server (NTRS)
Abdi, Frank
1996-01-01
A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.
NASA Technical Reports Server (NTRS)
Sanger, Eugen
1932-01-01
In the present report the computation is actually carried through for the case of parallel spars of equal resistance in bending without direct loading, including plotting of the influence lines; for other cases the method of calculation is explained. The development of large size airplanes can be speeded up by accurate methods of calculation such as this.
Load Balancing Unstructured Adaptive Grids for CFD Problems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid
1996-01-01
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.
Real-time multi-mode neutron multiplicity counter
Rowland, Mark S; Alvarez, Raymond A
2013-02-26
Embodiments are directed to a digital data acquisition method that collects data regarding nuclear fission at high rates and performs real-time preprocessing of large volumes of data into directly useable forms for use in a system that performs non-destructive assaying of nuclear material and assemblies for mass and multiplication of special nuclear material (SNM). Pulses from a multi-detector array are fed in parallel to individual inputs that are tied to individual bits in a digital word. Data is collected by loading a word at the individual bit level in parallel, to reduce the latency associated with current shift-register systems. The word is read at regular intervals, all bits simultaneously, with no manipulation. The word is passed to a number of storage locations for subsequent processing, thereby removing the front-end problem of pulse pileup. The word is used simultaneously in several internal processing schemes that assemble the data in a number of more directly useable forms. The detector includes a multi-mode counter that executes a number of different count algorithms in parallel to determine different attributes of the count data.
Predicting Flows of Rarefied Gases
NASA Technical Reports Server (NTRS)
LeBeau, Gerald J.; Wilmoth, Richard G.
2005-01-01
DSMC Analysis Code (DAC) is a flexible, highly automated, easy-to-use computer program for predicting flows of rarefied gases -- especially flows of upper-atmospheric, propulsion, and vented gases impinging on spacecraft surfaces. DAC implements the direct simulation Monte Carlo (DSMC) method, which is widely recognized as standard for simulating flows at densities so low that the continuum-based equations of computational fluid dynamics are invalid. DAC enables users to model complex surface shapes and boundary conditions quickly and easily. The discretization of a flow field into computational grids is automated, thereby relieving the user of a traditionally time-consuming task while ensuring (1) appropriate refinement of grids throughout the computational domain, (2) determination of optimal settings for temporal discretization and other simulation parameters, and (3) satisfaction of the fundamental constraints of the method. In so doing, DAC ensures an accurate and efficient simulation. In addition, DAC can utilize parallel processing to reduce computation time. The domain decomposition needed for parallel processing is completely automated, and the software employs a dynamic load-balancing mechanism to ensure optimal parallel efficiency throughout the simulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kirka, Michael M.; Greeley, Duncan A.; Hawkins, Charles S.
Here in this study, the impact of texture (columnar/equiax grain structure) and influence of material orientation on the low cycle fatigue (LCF) behavior of hot isostatic pressed (HIP) and heat-treated Inconel 718 fabricated through electron beam melting (EBM) is investigated. Material was tested both parallel and perpendicular (transverse) to the build direction. In all instances, the EBM HIP and heat-treated Inconel 718 performed similarly or exceeded the LCF life of wrought Inconel 718 plate and bar stock under fully reversed strain-controlled loading at 650 °C. Amongst the textures, the columnar grains oriented parallel to the build direction exhibited the highestmore » life on average compared to the transverse columnar and equiax EBM material. Further, in relation to the reference wrought material the parallel columnar grain material exhibited a greater life. While a negligible life difference was observed in the equiax grained material between the two orientations, a consistently lower accumulated inelastic strain was measured for the material loaded parallel to the build direction than the transverse orientation. Failure of the parallel columnar material occurred in a transgranular manner with cracks emanating from the surface whereas the transverse columnar material failed in a intergranular manner, with crack growth occurring through repeated rupture of oxide at the crack-tip. Finally, in the case of the equiax material, an influence of material orientation was not observed on the failure mechanism with crack propagation occurring through a combination of debonded/cracked carbides and void formation along twin boundaries resulting in a mixture of intergranular and transgranular crack propagation.« less
Kirka, Michael M.; Greeley, Duncan A.; Hawkins, Charles S.; ...
2017-09-11
Here in this study, the impact of texture (columnar/equiax grain structure) and influence of material orientation on the low cycle fatigue (LCF) behavior of hot isostatic pressed (HIP) and heat-treated Inconel 718 fabricated through electron beam melting (EBM) is investigated. Material was tested both parallel and perpendicular (transverse) to the build direction. In all instances, the EBM HIP and heat-treated Inconel 718 performed similarly or exceeded the LCF life of wrought Inconel 718 plate and bar stock under fully reversed strain-controlled loading at 650 °C. Amongst the textures, the columnar grains oriented parallel to the build direction exhibited the highestmore » life on average compared to the transverse columnar and equiax EBM material. Further, in relation to the reference wrought material the parallel columnar grain material exhibited a greater life. While a negligible life difference was observed in the equiax grained material between the two orientations, a consistently lower accumulated inelastic strain was measured for the material loaded parallel to the build direction than the transverse orientation. Failure of the parallel columnar material occurred in a transgranular manner with cracks emanating from the surface whereas the transverse columnar material failed in a intergranular manner, with crack growth occurring through repeated rupture of oxide at the crack-tip. Finally, in the case of the equiax material, an influence of material orientation was not observed on the failure mechanism with crack propagation occurring through a combination of debonded/cracked carbides and void formation along twin boundaries resulting in a mixture of intergranular and transgranular crack propagation.« less
Aprà, E; Kowalski, K
2016-03-08
In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.
1989-01-01
Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
King, Robert Dean; DeDoncker, Rik Wivina Anna Adelson
1998-01-01
A battery load leveling arrangement for an electrically powered system in which battery loading is subject to intermittent high current loading utilizes a passive energy storage device and a diode connected in series with the storage device to conduct current from the storage device to the load when current demand forces a drop in battery voltage. A current limiting circuit is connected in parallel with the diode for recharging the passive energy storage device. The current limiting circuit functions to limit the average magnitude of recharge current supplied to the storage device. Various forms of current limiting circuits are disclosed, including a PTC resistor coupled in parallel with a fixed resistor. The current limit circuit may also include an SCR for switching regenerative braking current to the device when the system is connected to power an electric motor.
Time Warp Operating System, Version 2.5.1
NASA Technical Reports Server (NTRS)
Bellenot, Steven F.; Gieselman, John S.; Hawley, Lawrence R.; Peterson, Judy; Presley, Matthew T.; Reiher, Peter L.; Springer, Paul L.; Tupman, John R.; Wedel, John J., Jr.; Wieland, Frederick P.;
1993-01-01
Time Warp Operating System, TWOS, is special purpose computer program designed to support parallel simulation of discrete events. Complete implementation of Time Warp software mechanism, which implements distributed protocol for virtual synchronization based on rollback of processes and annihilation of messages. Supports simulations and other computations in which both virtual time and dynamic load balancing used. Program utilizes underlying resources of operating system. Written in C programming language.
Design of an Input-Parallel Output-Parallel LLC Resonant DC-DC Converter System for DC Microgrids
NASA Astrophysics Data System (ADS)
Juan, Y. L.; Chen, T. R.; Chang, H. M.; Wei, S. E.
2017-11-01
Compared with the centralized power system, the distributed modularized power system is composed of several power modules with lower power capacity to provide a totally enough power capacity for the load demand. Therefore, the current stress of the power components in each module can then be reduced, and the flexibility of system setup is also enhanced. However, the parallel-connected power modules in the conventional system are usually controlled to equally share the power flow which would result in lower efficiency in low loading condition. In this study, a modular power conversion system for DC micro grid is developed with 48 V dc low voltage input and 380 V dc high voltage output. However, in the developed system control strategy, the numbers of power modules enabled to share the power flow is decided according to the output power at lower load demand. Finally, three 350 W power modules are constructed and parallel-connected to setup a modular power conversion system. From the experimental results, compared with the conventional system, the efficiency of the developed power system in the light loading condition is greatly improved. The modularized design of the power system can also decrease the power loss ratio to the system capacity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-09-12
Mcqueuer is a simple tool that allows anyone from researchers to experienced developers to create multi-node/multi-core jobs by simply creating a file with a list of commands. Users simply combine tasks, which would otherwise each be their own job on the cluster, into a single file that is given to Mcqueuer. Mcqueuer then does the heavy lifting required to process the tasks in parallel in a single multi-node job. In addition, Mcqueuer provides load-balancing, which frees the user from having to worry about complex memory and CPU considerations, and instead focus on the processing itself.
Bintivanou, Aimilia; Pissiotis, Argirios; Michalakis, Konstantinos
2017-04-01
Parallel labiolingual walls and the preservation of the cingulum in anterior tooth preparations have been advocated. However, their contribution to retention and resistance form has not been evaluated. The purpose of this in vitro study was to evaluate the retention and resistance failure loads of 2 preparation designs for maxillary anterior teeth. Forty metal restorations were fabricated and paired with 40 cobalt-chromium prepared tooth analogs. Twenty of the specimens had parallel buccolingual walls at the cervical part (group PBLW; the control group), whereas the remaining 20 had converging buccolingual walls (group CBLW; the experimental group). The restorations were cemented to the tooth analogs with a resin-modified glass ionomer luting agent. Ten specimens from each group were subjected to tensile loading with a universal testing machine; the rest were subjected to compression loading until failure. Descriptive statistics and the independent t test (α=.05) were used to determine the effect of failure loads in the tested groups. The independent t test revealed statistically significant differences between the tested groups in tensile loading (P<.001) and in compressive loading (P<.001). The PBLW group presented a higher tensile failure load than the CBLW. On the contrary, the PBLW group presented a smaller compression failure load than the CBLW. Parallelism of the buccolingual axial walls in anterior maxillary teeth increased the retention form but decreased the resistance form. Copyright © 2016 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.
Towards a Better Distributed Framework for Learning Big Data
2017-06-14
UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This work aimed at solving issues in distributed machine learning. The PI’s team proposed...communication load. Finally, the team proposed the parallel least-squares policy iteration (parallel LSPI) to parallelize a reinforcement policy learning. 15
NASA Technical Reports Server (NTRS)
Mccormick, S.; Quinlan, D.
1989-01-01
The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids (global and local) to provide adaptive resolution and fast solution of PDEs. Like all such methods, it offers parallelism by using possibly many disconnected patches per level, but is hindered by the need to handle these levels sequentially. The finest levels must therefore wait for processing to be essentially completed on all the coarser ones. A recently developed asynchronous version of FAC, called AFAC, completely eliminates this bottleneck to parallelism. This paper describes timing results for AFAC, coupled with a simple load balancing scheme, applied to the solution of elliptic PDEs on an Intel iPSC hypercube. These tests include performance of certain processes necessary in adaptive methods, including moving grids and changing refinement. A companion paper reports on numerical and analytical results for estimating convergence factors of AFAC applied to very large scale examples.
NASA Technical Reports Server (NTRS)
Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos
1996-01-01
An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.
Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate
NASA Technical Reports Server (NTRS)
Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel
1994-01-01
This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.
Distributed process manager for an engineering network computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gait, J.
1987-08-01
MP is a manager for systems of cooperating processes in a local area network of engineering workstations. MP supports transparent continuation by maintaining multiple copies of each process on different workstations. Computational bandwidth is optimized by executing processes in parallel on different workstations. Responsiveness is high because workstations compete among themselves to respond to requests. The technique is to select a master from among a set of replicates of a process by a competitive election between the copies. Migration of the master when a fault occurs or when response slows down is effected by inducing the election of a newmore » master. Competitive response stabilizes system behavior under load, so MP exhibits realtime behaviors.« less
A comparison of queueing, cluster and distributed computing systems
NASA Technical Reports Server (NTRS)
Kaplan, Joseph A.; Nelson, Michael L.
1993-01-01
Using workstation clusters for distributed computing has become popular with the proliferation of inexpensive, powerful workstations. Workstation clusters offer both a cost effective alternative to batch processing and an easy entry into parallel computing. However, a number of workstations on a network does not constitute a cluster. Cluster management software is necessary to harness the collective computing power. A variety of cluster management and queuing systems are compared: Distributed Queueing Systems (DQS), Condor, Load Leveler, Load Balancer, Load Sharing Facility (LSF - formerly Utopia), Distributed Job Manager (DJM), Computing in Distributed Networked Environments (CODINE), and NQS/Exec. The systems differ in their design philosophy and implementation. Based on published reports on the different systems and conversations with the system's developers and vendors, a comparison of the systems are made on the integral issues of clustered computing.
NASA Astrophysics Data System (ADS)
Cheng, Jian-Long; Yang, Sheng-Qi; Chen, Kui; Ma, Dan; Li, Feng-Yuan; Wang, Li-Ming
2017-12-01
In this paper, uniaxial compression tests were carried out on a series of composite rock specimens with different dip angles, which were made from two types of rock-like material with different strength. The acoustic emission technique was used to monitor the acoustic signal characteristics of composite rock specimens during the entire loading process. At the same time, an optical non-contact 3D digital image correlation technique was used to study the evolution of axial strain field and the maximal strain field before and after the peak strength at different stress levels during the loading process. The effect of bedding plane inclination on the deformation and strength during uniaxial loading was analyzed. The methods of solving the elastic constants of hard and weak rock were described. The damage evolution process, deformation and failure mechanism, and failure mode during uniaxial loading were fully determined. The experimental results show that the θ = 0{°}-45{°} specimens had obvious plastic deformation during loading, and the brittleness of the θ = 60{°}-90{°} specimens gradually increased during the loading process. When the anisotropic angle θ increased from 0{°} to 90{°}, the peak strength, peak strain, and apparent elastic modulus all decreased initially and then increased. The failure mode of the composite rock specimen during uniaxial loading can be divided into three categories: tensile fracture across the discontinuities (θ = 0{°}-30{°}), sliding failure along the discontinuities (θ = 45{°}-75{°}), and tensile-split along the discontinuities (θ = 90{°}). The axial strain of the weak and hard rock layers in the composite rock specimen during the loading process was significantly different from that of the θ = 0{°}-45{°} specimens and was almost the same as that of the θ = 60{°}-90{°} specimens. As for the strain localization highlighted in the maximum principal strain field, the θ = 0{°}-30{°} specimens appeared in the rock matrix approximately parallel to the loading direction, while in the θ = 45{°}-90{°} specimens it appeared at the hard and weak rock layer interface.
Load and Pi control flux through the branched kinetic cycle of myosin V.
Kad, Neil M; Trybus, Kathleen M; Warshaw, David M
2008-06-20
Myosin V is a processive actin-based motor protein that takes multiple 36-nm steps to deliver intracellular cargo to its destination. In the laser trap, applied load slows myosin V heavy meromyosin stepping and increases the probability of backsteps. In the presence of 40 mm phosphate (P(i)), both forward and backward steps become less load-dependent. From these data, we infer that P(i) release commits myosin V to undergo a highly load-dependent transition from a state in which ADP is bound to both heads and its lead head trapped in a pre-powerstroke conformation. Increasing the residence time in this state by applying load increases the probability of backstepping or detachment. The kinetics of detachment indicate that myosin V can detach from actin at two distinct points in the cycle, one of which is turned off by the presence of P(i). We propose a branched kinetic model to explain these data. Our model includes P(i) release prior to the most load-dependent step in the cycle, implying that P(i) release and load both act as checkpoints that control the flux through two parallel pathways.
A simple hyperbolic model for communication in parallel processing environments
NASA Technical Reports Server (NTRS)
Stoica, Ion; Sultan, Florin; Keyes, David
1994-01-01
We introduce a model for communication costs in parallel processing environments called the 'hyperbolic model,' which generalizes two-parameter dedicated-link models in an analytically simple way. Dedicated interprocessor links parameterized by a latency and a transfer rate that are independent of load are assumed by many existing communication models; such models are unrealistic for workstation networks. The communication system is modeled as a directed communication graph in which terminal nodes represent the application processes that initiate the sending and receiving of the information and in which internal nodes, called communication blocks (CBs), reflect the layered structure of the underlying communication architecture. The direction of graph edges specifies the flow of the information carried through messages. Each CB is characterized by a two-parameter hyperbolic function of the message size that represents the service time needed for processing the message. The parameters are evaluated in the limits of very large and very small messages. Rules are given for reducing a communication graph consisting of many to an equivalent two-parameter form, while maintaining an approximation for the service time that is exact in both large and small limits. The model is validated on a dedicated Ethernet network of workstations by experiments with communication subprograms arising in scientific applications, for which a tight fit of the model predictions with actual measurements of the communication and synchronization time between end processes is demonstrated. The model is then used to evaluate the performance of two simple parallel scientific applications from partial differential equations: domain decomposition and time-parallel multigrid. In an appropriate limit, we also show the compatibility of the hyperbolic model with the recently proposed LogP model.
FALCON: A distributed scheduler for MIMD architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grimshaw, A.S.; Vivas, V.E. Jr.
1991-01-01
This paper describes FALCON (Fully Automatic Load COordinator for Networks), the scheduler for the Mentat parallel processing system. FALCON has a modular structure and is designed for systems that use a task scheduling mechanism. FALCON is distributed, stable, supports system heterogeneities, and employs a sender-initiated adaptive load sharing policy with static task assignment. FALCON is parameterizable and is implemented in Mentat, a working distributed system. We present the design and implementation of FALCON as well as a brief introduction to those features of the Mentat run-time system that influence FALCON. Performance measures under different scheduler configurations are also presented andmore » analyzed with respect to the system parameters. 36 refs., 8 figs.« less
Community Detection on the GPU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh
We present and evaluate a new GPU algorithm based on the Louvain method for community detection. Our algorithm is the first for this problem that parallelizes the access to individual edges. In this way we can fine tune the load balance when processing networks with nodes of highly varying degrees. This is achieved by scaling the number of threads assigned to each node according to its degree. Extensive experiments show that we obtain speedups up to a factor of 270 compared to the sequential algorithm. The algorithm consistently outperforms other recent shared memory implementations and is only one order ofmore » magnitude slower than the current fastest parallel Louvain method running on a Blue Gene/Q supercomputer using more than 500K threads.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.
A comparison of parallel and diverging screw angles in the stability of locked plate constructs.
Wähnert, D; Windolf, M; Brianza, S; Rothstock, S; Radtke, R; Brighenti, V; Schwieger, K
2011-09-01
We investigated the static and cyclical strength of parallel and angulated locking plate screws using rigid polyurethane foam (0.32 g/cm(3)) and bovine cancellous bone blocks. Custom-made stainless steel plates with two conically threaded screw holes with different angulations (parallel, 10° and 20° divergent) and 5 mm self-tapping locking screws underwent pull-out and cyclical pull and bending tests. The bovine cancellous blocks were only subjected to static pull-out testing. We also performed finite element analysis for the static pull-out test of the parallel and 20° configurations. In both the foam model and the bovine cancellous bone we found the significantly highest pull-out force for the parallel constructs. In the finite element analysis there was a 47% more damage in the 20° divergent constructs than in the parallel configuration. Under cyclical loading, the mean number of cycles to failure was significantly higher for the parallel group, followed by the 10° and 20° divergent configurations. In our laboratory setting we clearly showed the biomechanical disadvantage of a diverging locking screw angle under static and cyclical loading.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Zhang, Yingchen; Muljadi, Eduard
In this paper, a short-term load forecasting approach based network reconfiguration is proposed in a parallel manner. Specifically, a support vector regression (SVR) based short-term load forecasting approach is designed to provide an accurate load prediction and benefit the network reconfiguration. Because of the nonconvexity of the three-phase balanced optimal power flow, a second-order cone program (SOCP) based approach is used to relax the optimal power flow problem. Then, the alternating direction method of multipliers (ADMM) is used to compute the optimal power flow in distributed manner. Considering the limited number of the switches and the increasing computation capability, themore » proposed network reconfiguration is solved in a parallel way. The numerical results demonstrate the feasible and effectiveness of the proposed approach.« less
Microchannel cross load array with dense parallel input
Swierkowski, Stefan P.
2004-04-06
An architecture or layout for microchannel arrays using T or Cross (+) loading for electrophoresis or other injection and separation chemistry that are performed in microfluidic configurations. This architecture enables a very dense layout of arrays of functionally identical shaped channels and it also solves the problem of simultaneously enabling efficient parallel shapes and biasing of the input wells, waste wells, and bias wells at the input end of the separation columns. One T load architecture uses circular holes with common rows, but not columns, which allows the flow paths for each channel to be identical in shape, using multiple mirror image pieces. Another T load architecture enables the access hole array to be formed on a biaxial, collinear grid suitable for EDM micromachining (square holes), with common rows and columns.
NASA Technical Reports Server (NTRS)
Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele
2004-01-01
In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
Parallelized reliability estimation of reconfigurable computer networks
NASA Technical Reports Server (NTRS)
Nicol, David M.; Das, Subhendu; Palumbo, Dan
1990-01-01
A parallelized system, ASSURE, for computing the reliability of embedded avionics flight control systems which are able to reconfigure themselves in the event of failure is described. ASSURE accepts a grammar that describes a reliability semi-Markov state-space. From this it creates a parallel program that simultaneously generates and analyzes the state-space, placing upper and lower bounds on the probability of system failure. ASSURE is implemented on a 32-node Intel iPSC/860, and has achieved high processor efficiencies on real problems. Through a combination of improved algorithms, exploitation of parallelism, and use of an advanced microprocessor architecture, ASSURE has reduced the execution time on substantial problems by a factor of one thousand over previous workstation implementations. Furthermore, ASSURE's parallel execution rate on the iPSC/860 is an order of magnitude faster than its serial execution rate on a Cray-2 supercomputer. While dynamic load balancing is necessary for ASSURE's good performance, it is needed only infrequently; the particular method of load balancing used does not substantially affect performance.
Chen, Ai-Zheng; Wang, Guang-Ya; Wang, Shi-Bin; Li, Li; Liu, Yuan-Gang; Zhao, Chen
2012-01-01
Background The aim of this study was to improve the drug loading, encapsulation efficiency, and sustained-release properties of supercritical CO2-based drug-loaded polymer carriers via a process of suspension-enhanced dispersion by supercritical CO2 (SpEDS), which is an advanced version of solution-enhanced dispersion by supercritical CO2 (SEDS). Methods Methotrexate nanoparticles were successfully microencapsulated into poly (L-lactide)-poly(ethylene glycol)-poly(L-lactide) (PLLA-PEG-PLLA) by SpEDS. Methotrexate nanoparticles were first prepared by SEDS, then suspended in PLLA-PEG-PLLA solution, and finally microencapsulated into PLLA-PEG-PLLA via SpEDS, where an “injector” was utilized in the suspension delivery system. Results After microencapsulation, the composite methotrexate (MTX)-PLLA-PEG-PLLA microspheres obtained had a mean particle size of 545 nm, drug loading of 13.7%, and an encapsulation efficiency of 39.2%. After an initial burst release, with around 65% of the total methotrexate being released in the first 3 hours, the MTX-PLLA-PEG-PLLA microspheres released methotrexate in a sustained manner, with 85% of the total methotrexate dose released within 23 hours and nearly 100% within 144 hours. Conclusion Compared with a parallel study of the coprecipitation process, microencapsulation using SpEDS offered greater potential to manufacture drug-loaded polymer microspheres for a drug delivery system. PMID:22787397
Risse, Sarah; Hohenstein, Sven; Kliegl, Reinhold; Engbert, Ralf
2014-01-01
Eye-movement experiments suggest that the perceptual span during reading is larger than the fixated word, asymmetric around the fixation position, and shrinks in size contingent on the foveal processing load. We used the SWIFT model of eye-movement control during reading to test these hypotheses and their implications under the assumption of graded parallel processing of all words inside the perceptual span. Specifically, we simulated reading in the boundary paradigm and analysed the effects of denying the model to have valid preview of a parafoveal word n + 2 two words to the right of fixation. Optimizing the model parameters for the valid preview condition only, we obtained span parameters with remarkably realistic estimates conforming to the empirical findings on the size of the perceptual span. More importantly, the SWIFT model generated parafoveal processing up to word n + 2 without fitting the model to such preview effects. Our results suggest that asymmetry and dynamic modulation are plausible properties of the perceptual span in a parallel word-processing model such as SWIFT. Moreover, they seem to guide the flexible distribution of processing resources during reading between foveal and parafoveal words. PMID:24771996
Atalar, Ata C; Tunalı, Onur; Erşen, Ali; Kapıcıoğlu, Mehmet; Sağlam, Yavuz; Demirhan, Mehmet S
2017-01-01
In intraarticular distal humerus fractures, internal fixation with double plates is the gold standard treatment. However the optimal plate configuration is not clear in the literature. The aim of this study was to compare the biomechanical stability of the parallel and the orthogonal anatomical locking plating systems in intraarticular distal humerus fractures in artificial humerus models. Intraarticular distal humerus fracture (AO13-C2) with 5 mm metaphyseal defect was created in sixteen artificial humeral models. Models were fixed with either orthogonal or parallel plating systems with locking screws (Acumed elbow plating systems). Both systems were tested for their stiffness with loads in axial compression, varus, valgus, anterior and posterior bending. Then plastic deformation after cyclic loading in posterior bending and load to failure in posterior bending were tested. The failure mechanisms of all the samples were observed. Stiffness values in every direction were not significantly different among the orthogonal and the parallel plating groups. There was no statistical difference between the two groups in plastic deformation values (0.31 mm-0.29 mm) and load to failure tests in posterior bending (372.4 N-379.7 N). In the orthogonal plating system most of the failures occurred due to the proximal shaft fracture, whereas in the parallel plating system failure occurred due to the shift of the most distal screw in proximal fragment. Our study showed that both plating systems had similar biomechanical stabilities when anatomic plates with distal locking screws were used in intraarticular distal humerus fractures in artificial humerus models. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gooding, Thomas M.
Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
Scalable load balancing for massively parallel distributed Monte Carlo particle transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.; Joy, K. I.
2013-07-01
In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Development of parallel algorithms for electrical power management in space applications
NASA Technical Reports Server (NTRS)
Berry, Frederick C.
1989-01-01
The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.
Advanced Electric Distribution, Switching, and Conversion Technology for Power Control
NASA Technical Reports Server (NTRS)
Soltis, James V.
1998-01-01
The Electrical Power Control Unit currently under development by Sundstrand Aerospace for use on the Fluids Combustion Facility of the International Space Station is the precursor of modular power distribution and conversion concepts for future spacecraft and aircraft applications. This unit combines modular current-limiting flexible remote power controllers and paralleled power converters into one package. Each unit includes three 1-kW, current-limiting power converter modules designed for a variable-ratio load sharing capability. The flexible remote power controllers can be used in parallel to match load requirements and can be programmed for an initial ON or OFF state on powerup. The unit contains an integral cold plate. The modularity and hybridization of the Electrical Power Control Unit sets the course for future spacecraft electrical power systems, both large and small. In such systems, the basic hybridized converter and flexible remote power controller building blocks could be configured to match power distribution and conversion capabilities to load requirements. In addition, the flexible remote power controllers could be configured in assemblies to feed multiple individual loads and could be used in parallel to meet the specific current requirements of each of those loads. Ultimately, the Electrical Power Control Unit design concept could evolve to a common switch module hybrid, or family of hybrids, for both converter and switchgear applications. By assembling hybrids of a common current rating and voltage class in parallel, researchers could readily adapt these units for multiple applications. The Electrical Power Control Unit concept has the potential to be scaled to larger and smaller ratings for both small and large spacecraft and for aircraft where high-power density, remote power controllers or power converters are required and a common replacement part is desired for multiples of a base current rating.
NASA Astrophysics Data System (ADS)
Lyan, Oleg; Jankunas, Valdas; Guseinoviene, Eleonora; Pašilis, Aleksas; Senulis, Audrius; Knolis, Audrius; Kurt, Erol
2018-02-01
In this study, a permanent magnet synchronous generator (PMSG) topology with compensated reactance windings in parallel rod configuration is proposed to reduce the armature reactance X L and to achieve higher efficiency of PMSG. The PMSG was designed using iron-cored bifilar coil topology to overcome problems of market-dominant rotary type generators. Often the problem is a comparatively high armature reactance X L, which is usually bigger than armature resistance R a. Therefore, the topology is proposed to partially compensate or negligibly reduce the PMSG reactance. The study was performed by using finite element method (FEM) analysis and experimental investigation. FEM analysis was used to investigate magnetic field flux distribution and density in PMSG. The PMSG experimental analyses of no-load losses and electromotive force versus frequency (i.e., speed) was performed. Also terminal voltage, power output and efficiency relation with load current at different frequencies have been evaluated. The reactance of PMSG has low value and a linear relation with operating frequency. The low reactance gives a small variation of efficiency (from 90% to 95%) in a wide range of load (from 3 A to 10 A) and operation frequency (from 44 Hz to 114 Hz). The comparison of PMSG characteristics with parallel and series winding connection showed insignificant power variation. The research results showed that compensated reactance winding in parallel rod configuration in PMSG design provides lower reactance and therefore, higher efficiency under wider load and frequency variation.
Repartitioning Strategies for Massively Parallel Simulation of Reacting Flow
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Zheng, Angen; Givi, Peyman; Labrinidis, Alexandros; Chrysanthis, Panos
2015-11-01
The majority of parallel CFD simulators partition the domain into equal regions and assign the calculations for a particular region to a unique processor. This type of domain decomposition is vital to the efficiency of the solver. However, as the simulation develops, the workload among the partitions often become uneven (e.g. by adaptive mesh refinement, or chemically reacting regions) and a new partition should be considered. The process of repartitioning adjusts the current partition to evenly distribute the load again. We compare two repartitioning tools: Zoltan, an architecture-agnostic graph repartitioner developed at the Sandia National Laboratories; and Paragon, an architecture-aware graph repartitioner developed at the University of Pittsburgh. The comparative assessment is conducted via simulation of the Taylor-Green vortex flow with chemical reaction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji
A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA andmore » MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
NASA Astrophysics Data System (ADS)
Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.
2011-10-01
SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
S-HARP: A parallel dynamic spectral partitioner
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sohn, A.; Simon, H.
1998-01-01
Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
Scan line graphics generation on the massively parallel processor
NASA Technical Reports Server (NTRS)
Dorband, John E.
1988-01-01
Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
Applying graph partitioning methods in measurement-based dynamic load balancing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatele, Abhinav; Fourestier, Sebastien; Menon, Harshitha
Load imbalance leads to an increasing waste of resources as an application is scaled to more and more processors. Achieving the best parallel efficiency for a program requires optimal load balancing which is a NP-hard problem. However, finding near-optimal solutions to this problem for complex computational science and engineering applications is becoming increasingly important. Charm++, a migratable objects based programming model, provides a measurement-based dynamic load balancing framework. This framework instruments and then migrates over-decomposed objects to balance computational load and communication at runtime. This paper explores the use of graph partitioning algorithms, traditionally used for partitioning physical domains/meshes, formore » measurement-based dynamic load balancing of parallel applications. In particular, we present repartitioning methods developed in a graph partitioning toolbox called SCOTCH that consider the previous mapping to minimize migration costs. We also discuss a new imbalance reduction algorithm for graphs with irregular load distributions. We compare several load balancing algorithms using microbenchmarks on Intrepid and Ranger and evaluate the effect of communication, number of cores and number of objects on the benefit achieved from load balancing. New algorithms developed in SCOTCH lead to better performance compared to the METIS partitioners for several cases, both in terms of the application execution time and fewer number of objects migrated.« less
Characterization of wastewater treatment by two microbial fuel cells in continuous flow operation.
Kubota, Keiichi; Watanabe, Tomohide; Yamaguchi, Takashi; Syutsubo, Kazuaki
2016-01-01
A two serially connected single-chamber microbial fuel cell (MFC) was applied to the treatment of diluted molasses wastewater in a continuous operation mode. In addition, the effect of series and parallel connection between the anodes and the cathode on power generation was investigated experimentally. The two serially connected MFC process achieved 79.8% of chemical oxygen demand removal and 11.6% of Coulombic efficiency when the hydraulic retention time of the whole process was 26 h. The power densities were 0.54, 0.34 and 0.40 W m(-3) when electrodes were in individual connection, serial connection and parallel connection modes, respectively. A high open circuit voltage was obtained in the serial connection. Power density decreased at low organic loading rates (OLR) due to the shortage of organic matter. Power generation efficiency tended to decrease as a result of enhancement of methane fermentation at high OLRs. Therefore, high power density and efficiency can be achieved by using a suitable OLR range.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
A design procedure for the phase-controlled parallel-loaded resonant inverter
NASA Technical Reports Server (NTRS)
King, Roger J.
1989-01-01
High-frequency-link power conversion and distribution based on a resonant inverter (RI) has been recently proposed. The design of several topologies is reviewed, and a simple approximate design procedure is developed for the phase-controlled parallel-loaded RI. This design procedure seeks to ensure the benefits of resonant conversion and is verified by data from a laboratory 2.5 kVA, 20-kHz converter. A simple phasor analysis is introduced as a useful approximation for design purposes. The load is considered to be a linear impedance (or an ac current sink). The design procedure is verified using a 2.5-kVA 20-kHz RI. Also obtained are predictable worst-case ratings for each component of the resonant tank circuit and the inverter switches. For a given load VA requirement, below-resonance operation is found to result in a significantly lower tank VA requirement. Under transient conditions such as load short-circuit, a reversal of the expected commutation sequence is possible.
Scalable and balanced dynamic hybrid data assimilation
NASA Astrophysics Data System (ADS)
Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa
2017-04-01
Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them implemented as parallel model runs themselves. The only bottleneck in the process is the gathering and scattering of initial and final model state snapshots before and after the parallel runs which requires a very efficient and low-latency communication network. However, the volume of data communicated is small and the intervening minimization steps are only 3D-Var, which means their computational load is negligible compared with the fully parallel model runs. We present example results of scalable VEnKF with the 4D lake and shallow sea model COHERENS, assimilating simultaneously continuous in situ measurements in a single point and infrequent satellite images that cover a whole lake, with the fully scalable VEnKF.
An efficient dynamic load balancing algorithm
NASA Astrophysics Data System (ADS)
Lagaros, Nikos D.
2014-01-01
In engineering problems, randomness and uncertainties are inherent. Robust design procedures, formulated in the framework of multi-objective optimization, have been proposed in order to take into account sources of randomness and uncertainty. These design procedures require orders of magnitude more computational effort than conventional analysis or optimum design processes since a very large number of finite element analyses is required to be dealt. It is therefore an imperative need to exploit the capabilities of computing resources in order to deal with this kind of problems. In particular, parallel computing can be implemented at the level of metaheuristic optimization, by exploiting the physical parallelization feature of the nondominated sorting evolution strategies method, as well as at the level of repeated structural analyses required for assessing the behavioural constraints and for calculating the objective functions. In this study an efficient dynamic load balancing algorithm for optimum exploitation of available computing resources is proposed and, without loss of generality, is applied for computing the desired Pareto front. In such problems the computation of the complete Pareto front with feasible designs only, constitutes a very challenging task. The proposed algorithm achieves linear speedup factors and almost 100% speedup factor values with reference to the sequential procedure.
The optimization of peptide cargo bound to MHC class I molecules by the peptide-loading complex.
Elliott, Tim; Williams, Anthony
2005-10-01
Major histocompatibility complex (MHC) class I complexes present peptides from both self and foreign intracellular proteins on the surface of most nucleated cells. The assembled heterotrimeric complexes consist of a polymorphic glycosylated heavy chain, non-polymorphic beta(2) microglobulin, and a peptide of typically nine amino acids in length. Assembly of the class I complexes occurs in the endoplasmic reticulum and is assisted by a number of chaperone molecules. A multimolecular unit termed the peptide-loading complex (PLC) is integral to this process. The PLC contains a peptide transporter (transporter associated with antigen processing), a thiooxido-reductase (ERp57), a glycoprotein chaperone (calreticulin), and tapasin, a class I-specific chaperone. We suggest that class I assembly involves a process of optimization where the peptide cargo of the complex is edited by the PLC. Furthermore, this selective peptide loading is biased toward peptides that have a longer off-rate from the assembled complex. We suggest that tapasin is the key chaperone that directs this action of the PLC with secondary contributions from calreticulin and possibly ERp57. We provide a framework model for how this may operate at the molecular level and draw parallels with the proposed mechanism of action of human leukocyte antigen-DM for MHC class II complex optimization.
Biomechanical Comparison of Parallel and Crossed Suture Repair for Longitudinal Meniscus Tears.
Milchteim, Charles; Branch, Eric A; Maughon, Ty; Hughey, Jay; Anz, Adam W
2016-04-01
Longitudinal meniscus tears are commonly encountered in clinical practice. Meniscus repair devices have been previously tested and presented; however, prior studies have not evaluated repair construct designs head to head. This study compared a new-generation meniscus repair device, SpeedCinch, with a similar established device, Fast-Fix 360, and a parallel repair construct to a crossed construct. Both devices utilize self-adjusting No. 2-0 ultra-high molecular weight polyethylene (UHMWPE) and 2 polyether ether ketone (PEEK) anchors. Crossed suture repair constructs have higher failure loads and stiffness compared with simple parallel constructs. The newer repair device would exhibit similar performance to an established device. Controlled laboratory study. Sutures were placed in an open fashion into the body and posterior horn regions of the medial and lateral menisci in 16 cadaveric knees. Evaluation of 2 repair devices and 2 repair constructs created 4 groups: 2 parallel vertical sutures created with the Fast-Fix 360 (2PFF), 2 crossed vertical sutures created with the Fast-Fix 360 (2XFF), 2 parallel vertical sutures created with the SpeedCinch (2PSC), and 2 crossed vertical sutures created with the SpeedCinch (2XSC). After open placement of the repair construct, each meniscus was explanted and tested to failure on a uniaxial material testing machine. All data were checked for normality of distribution, and 1-way analysis of variance by ranks was chosen to evaluate for statistical significance of maximum failure load and stiffness between groups. Statistical significance was defined as P < .05. The mean maximum failure loads ± 95% CI (range) were 89.6 ± 16.3 N (125.7-47.8 N) (2PFF), 72.1 ± 11.7 N (103.4-47.6 N) (2XFF), 71.9 ± 15.5 N (109.4-41.3 N) (2PSC), and 79.5 ± 25.4 N (119.1-30.9 N) (2XSC). Interconstruct comparison revealed no statistical difference between all 4 constructs regarding maximum failure loads (P = .49). Stiffness values were also similar, with no statistical difference on comparison (P = .28). Both devices in the current study had similar failure load and stiffness when 2 vertical or 2 crossed sutures were tested in cadaveric human menisci. Simple parallel vertical sutures perform similarly to crossed suture patterns at the time of implantation.
PLUM: Parallel Load Balancing for Unstructured Adaptive Meshes. Degree awarded by Colorado Univ.
NASA Technical Reports Server (NTRS)
Oliker, Leonid
1998-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for computing large-scale problems that require grid modifications to efficiently resolve solution features. By locally refining and coarsening the mesh to capture physical phenomena of interest, such procedures make standard computational methods more cost effective. Unfortunately, an efficient parallel implementation of these adaptive methods is rather difficult to achieve, primarily due to the load imbalance created by the dynamically-changing nonuniform grid. This requires significant communication at runtime, leading to idle processors and adversely affecting the total execution time. Nonetheless, it is generally thought that unstructured adaptive- grid techniques will constitute a significant fraction of future high-performance supercomputing. Various dynamic load balancing methods have been reported to date; however, most of them either lack a global view of loads across processors or do not apply their techniques to realistic large-scale applications.
Unstructured Adaptive Grid Computations on an Array of SMPs
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.
1996-01-01
Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Saini, Subhash (Technical Monitor)
1998-01-01
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method called PLUM to dynamically balance the processor workloads with a global view. This paper presents the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate that PLUM is an effective dynamic load balancing strategy which remains viable on a large number of processors.
NASA Technical Reports Server (NTRS)
Levy, Samuel; Krupen, Philip
1943-01-01
The von Karman equations for flat plates are solved beyond the buckling load up to edge strains equal to eight time the buckling strain, for the extreme case of rigid clamping along the edges parallel to the load. Deflections, bending stresses, and membrane stresses are given as a function of end compressive load. The theoretical values of effective width are compared with the values derived for simple support along the edges parallel to the load. The increases in effective width due to rigid clamping drops from about 20 percent near the buckling strain to about 8 percent at an edge strain equal to eight times the buckling strain. Experimental values of effective width in the elastic range reported in NACA Technical Note No. 684 are between the theoretical curves for the extremes of simple support and rigid clamping.
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid
1999-01-01
The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.
Examining the architecture of cellular computing through a comparative study with a computer
Wang, Degeng; Gribskov, Michael
2005-01-01
The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software–hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's ‘hardware’ equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the ‘bandwidth’ of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed. PMID:16849179
Examining the architecture of cellular computing through a comparative study with a computer.
Wang, Degeng; Gribskov, Michael
2005-06-22
The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software-hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's "hardware" equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the "bandwidth" of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed.
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Lesoinne, Michel
1993-01-01
Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
Recent Progress on the Parallel Implementation of Moving-Body Overset Grid Schemes
NASA Technical Reports Server (NTRS)
Wissink, Andrew; Allen, Edwin (Technical Monitor)
1998-01-01
Viscous calculations about geometrically complex bodies in which there is relative motion between component parts is one of the most computationally demanding problems facing CFD researchers today. This presentation documents results from the first two years of a CHSSI-funded effort within the U.S. Army AFDD to develop scalable dynamic overset grid methods for unsteady viscous calculations with moving-body problems. The first pan of the presentation will focus on results from OVERFLOW-D1, a parallelized moving-body overset grid scheme that employs traditional Chimera methodology. The two processes that dominate the cost of such problems are the flow solution on each component and the intergrid connectivity solution. Parallel implementations of the OVERFLOW flow solver and DCF3D connectivity software are coupled with a proposed two-part static-dynamic load balancing scheme and tested on the IBM SP and Cray T3E multi-processors. The second part of the presentation will cover some recent results from OVERFLOW-D2, a new flow solver that employs Cartesian grids with various levels of refinement, facilitating solution adaption. A study of the parallel performance of the scheme on large distributed- memory multiprocessor computer architectures will be reported.
A compact submicrosecond, high current generator
NASA Astrophysics Data System (ADS)
Kovalchuk, B. M.; Kharlov, A. V.; Zorin, V. B.; Zherlitsyn, A. A.
2009-08-01
Pulsed current generator was developed for experiments with current carrying pulsed plasma. Main parts of the generator are capacitor bank, low inductive current driving lines, and central load part. Generator consists of four identical sections, connected in parallel to one load. Capacitor bank is assembled from 24 capacitor blocks (100 kV, 80 nF), connected in parallel. It stores 9.6 kJ at 100 kV charging voltage. Each capacitor block incorporates a multigap spark switch, which is able to commute by six parallel channels. Switches operate in dry air at atmospheric pressure. The generator was tested with an inductive load and a liner load. At 17.5 nH inductive load and 100 kV of charging voltage it provides 650 kA of current amplitude with 390 ns rise time with 0.6 Ω damping resistors in discharge circuit of each capacitor block. The net generator inductance without a load was optimized to be as low as 15 nH, which results in extremely low impedance of the generator (˜0.08 Ω). It ensures effective energy coupling with a low impedance load such as Z pinch. The generator operates reliably without any adjustments in 70-100 kV range of charging voltage. Jitter in delay between output pulse and triggering pulse is less than 5 ns at 70-100 kV charging voltage. Operation and handling are very simple, because no oil or purified gases are required for the generator. The generator has dimensions 5.24×1.2×0.18 m3 and total weight about 1400 kg, thus manifesting itself as simple, robust, and cost effective apparatus.
Cognitive demands and the relationship between age and workload in apron control.
Müller, Andreas; Petru, Raluca; Angerer, Peter
2011-01-01
Apron controllers (ACs) determine the taxiways for aircraft entering the apron area until they reach their parking positions and vice versa. The aims of this study were to identify age-sensitive job requirements of apron control (Study 1), and to investigate the relationship between age of ACs and their workload (Study 2). Study 1: There were 14 experienced ACs who assessed the job requirements of apron control with the Fleishman-Job Analyses Survey. Additionally, during one shift, the number of parallel processed traffic data sets (indicating memory-load) and the number of delivered radio messages (indicating processing speed requirements) were assessed. Study 2: There were 30 ACs (age: 23-51 yr) who volunteered for trials during late shifts at an international airport. ACs assessed their subjective workload (NASA-Task Load Index) at four times during the shift and carried out an attention test (d2) before and after the shift. Moreover, their heart rate was assessed during the shift and in a reference period. Study 1: Results indicate that apron control requires especially high levels of memory-load and processing speed. Study 2: Hierarchical regression analyses revealed a u-shaped relationship between age and subjective workload (beta = 0.59) as well as heart rate (beta = 0.33). Up to the age of about 35-37 yr, workload and heart rate decreased with age, but afterwards the relationship became positive. There was no association between chronological age and attention performance. There is a need for age adequate job design in apron control that should especially aim at the reduction of memory-load and processing speed.
Quality factor concept in piezoceramic transformer performance description.
Mezheritsky, Alex V
2006-02-01
A new general approach based on the quality factor concept to piezoceramic transformer (PT) performance description is proposed. The system's quality factor, material elastic anisotropy, and coupling factors of the input and output sections of an electrically excited and electrically loaded PT fully characterize its resonance and near-resonance behavior. The PT efficiency, transformation ratio, and input and output power were analytically analyzed and simulated as functions of the load and frequency for the simplest classical Langevin-type and Rosen-type PT designs. A new formulation of the electrical input impedance allows one to separate the power consumed by PT from the power transferred into the load. The system's PT quality factor takes into account losses in each PT "input-output-load" functional components. The loading process is changing PT input electrical impedance on the way that under loading the minimum series impedance is increasing and the maximum parallel impedance is decreasing coincidentally. The quality-factors ratio, between the states of fully loaded and nonloaded PT, is one of the best measures of PTs dynamic performance--practically, the lower the ratio is, the better PT efficiency. A simple and effective method for the loaded PT quality factor determination is proposed. As was found, a piezoceramic with low piezoelectric anisotropy is required to provide maximum PT efficiency and higher corresponding voltage gain. Limitations on the PT output voltage and power, caused by nonlinear effects in piezoceramics, were established.
DC switching regulated power supply for driving an inductive load
Dyer, George R.
1986-01-01
A power supply for driving an inductive load current from a dc power supply hrough a regulator circuit including a bridge arrangement of diodes and switching transistors controlled by a servo controller which regulates switching in response to the load current to maintain a selected load current. First and second opposite legs of the bridge are formed by first and second parallel-connected transistor arrays, respectively, while the third and fourth legs of the bridge are formed by appropriately connected first and second parallel connected diode arrays, respectively. The regulator may be operated in three "stages" or modes: (1) For current runup in the load, both first and second transistor switch arrays are turned "on" and current is supplied to the load through both transistor arrays. (2) When load current reaches the desired level, the first switch is turned "off", and load current "flywheels" through the second switch array and the fourth leg diode array connecting the second switch array in series with the load. Current is maintained by alternating between modes 1 and 2 at a suitable duty cycle and switching rate set by the controller. (3) Rapid current rundown is accomplished by turning both switch arrays "off", allowing load current to be dumped back into the source through the third and fourth diode arrays connecting the source in series opposition with the load to recover energy from the inductive load. The three operating states are controlled automatically by the controller.
DC switching regulated power supply for driving an inductive load
Dyer, G.R.
1983-11-29
A dc switching regulated power supply for driving an inductive load is provided. The regulator basic circuit is a bridge arrangement of diodes and transistors. First and second opposite legs of the bridge are formed by first and second parallel-connected transistor arrays, respectively, while the third and fourth legs of the bridge are formed by appropriately connected first and second parallel connected diode arrays, respectively. A dc power supply is connected to the input of the bridge and the output is connected to the load. A servo controller is provided to control the switching rate of the transistors to maintain a desired current to the load. The regulator may be operated in three stages or modes: (1) for current runup in the load, both first and second transistor switch arrays are turned on and current is supplied to the load through both transistor arrays. (2) When load current reaches the desired level, the first switch is turned off, and load current flywheels through the second switch array and the fourth leg diode array connecting the second switch array in series with the load. Current is maintained by alternating between modes 1 and 2 at a suitable duty cycle and switching rate set by the controller. (3) Rapid current rundown is accomplished by turning both switch arrays off, allowing load current to be dumped back into the source through the third and fourth diode arrays connecting the source in series opposition with the load to recover energy from the inductive load.
Multitasking the three-dimensional transport code TORT on CRAY platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azmy, Y.Y.; Barnett, D.A.; Burre, C.A.
1996-04-01
The multitasking options in the three-dimensional neutral particle transport code TORT originally implemented for Cray`s CTSS operating system are revived and extended to run on Cray Y/MP and C90 computers using the UNICOS operating system. These include two coarse-grained domain decompositions; across octants, and across directions within an octant, termed Octant Parallel (OP), and Direction Parallel (DP), respectively. Parallel performance of the DP is significantly enhanced by increasing the task grain size and reducing load imbalance via dynamic scheduling of the discrete angles among the participating tasks. Substantial Wall Clock speedup factors, approaching 4.5 using 8 tasks, have been measuredmore » in a time-sharing environment, and generally depend on the test problem specifications, number of tasks, and machine loading during execution.« less
NASA Astrophysics Data System (ADS)
Stepanova, L. V.
2017-12-01
Atomistic simulations of the central crack growth process in an infinite plane medium under mixed-mode loading using Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), a classical molecular dynamics code, are performed. The inter-atomic potential used in this investigation is the Embedded Atom Method (EAM) potential. Plane specimens with an initial central crack are subjected to mixed-mode loadings. The simulation cell contains 400,000 atoms. The crack propagation direction angles under different values of the mixity parameter in a wide range of values from pure tensile loading to pure shear loading in a wide range of temperatures (from 0.1 K to 800 K) are obtained and analyzed. It is shown that the crack propagation direction angles obtained by molecular dynamics coincide with the crack propagation direction angles given by the multi-parameter fracture criteria based on the strain energy density and the multi-parameter description of the crack-tip fields. The multi-parameter fracture criteria are based on the multi-parameter stress field description taking into account the higher order terms of the Williams series expansion of the crack tip fields.
NASA Technical Reports Server (NTRS)
Merticaru, V.
1974-01-01
An original mathematical model is proposed to derive equations for calculation of gear noise. These equations permit the acoustic pressure level to be determined as a function of load. Application of this method to three parallel gears is reported. The logical calculation scheme is given, as well as the results obtained.
Memory under pressure: secondary-task effects on contextual cueing of visual search.
Annac, Efsun; Manginelli, Angela A; Pollmann, Stefan; Shi, Zhuanghua; Müller, Hermann J; Geyer, Thomas
2013-11-04
Repeated display configurations improve visual search. Recently, the question has arisen whether this contextual cueing effect (Chun & Jiang, 1998) is itself mediated by attention, both in terms of selectivity and processing resources deployed. While it is accepted that selective attention modulates contextual cueing (Jiang & Leung, 2005), there is an ongoing debate whether the cueing effect is affected by a secondary working memory (WM) task, specifically at which stage WM influences the cueing effect: the acquisition of configural associations (e.g., Travis, Mattingley, & Dux, 2013) versus the expression of learned associations (e.g., Manginelli, Langer, Klose, & Pollmann, 2013). The present study re-investigated this issue. Observers performed a visual search in combination with a spatial WM task. The latter was applied on either early or late search trials--so as to examine whether WM load hampers the acquisition of or retrieval from contextual memory. Additionally, the WM and search tasks were performed either temporally in parallel or in succession--so as to permit the effects of spatial WM load to be dissociated from those of executive load. The secondary WM task was found to affect cueing in late, but not early, experimental trials--though only when the search and WM tasks were performed in parallel. This pattern suggests that contextual cueing involves a spatial WM resource, with spatial WM providing a workspace linking the current search array with configural long-term memory; as a result, occupying this workspace by a secondary WM task hampers the expression of learned configural associations.
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory
NASA Astrophysics Data System (ADS)
Challacombe, Matt
2000-06-01
A general approach to the parallel sparse-blocked matrix-matrix multiply is developed in the context of linear scaling self-consistent-field (SCF) theory. The data-parallel message passing method uses non-blocking communication to overlap computation and communication. The space filling curve heuristic is used to achieve data locality for sparse matrix elements that decay with “separation”. Load balance is achieved by solving the bin packing problem for blocks with variable size.With this new method as the kernel, parallel performance of the simplified density matrix minimization (SDMM) for solution of the SCF equations is investigated for RHF/6-31G ∗∗ water clusters and RHF/3-21G estane globules. Sustained rates above 5.7 GFLOPS for the SDMM have been achieved for (H 2 O) 200 with 95 Origin 2000 processors. Scalability is found to be limited by load imbalance, which increases with decreasing granularity, due primarily to the inhomogeneous distribution of variable block sizes.
The development of self-expanding peripheral stent with ion-modified surface layer
NASA Astrophysics Data System (ADS)
Lotkov, Alexander I.; Kashin, Oleg A.; Kudryashov, Andrey N.; Krukovskii, Konstantin V.; Kuznetsov, Vladimir M.; Borisov, Dmitry P.; Kretov, Evgenii I.
2016-11-01
In work researches of chemical composition of surface layers of self-expanding stents of nickel-titanium (NiTi) and their functional and mechanical properties after plasma immersion processing by ions of silicon (Si). It is established that in the treatment in the inner and outer surfaces of stents formed doped silicon layer with a thickness of 80 nm. The formation of the doped layer does not impair the functional properties of the stent. At human body temperature, the stent is fully restore its shape after removing the deforming load. The resulting graph of loading of stents during their compression between parallel plates. The research results allow the conclusion that Si-doped stents are promising for treatment of peripheral vascular disease. However, related studies on laboratory animals are required.
NASA Astrophysics Data System (ADS)
Rambalakos, Andreas
Current federal aviation regulations in the United States and around the world mandate the need for aircraft structures to meet damage tolerance requirements through out the service life. These requirements imply that the damaged aircraft structure must maintain adequate residual strength in order to sustain its integrity that is accomplished by a continuous inspection program. The multifold objective of this research is to develop a methodology based on a direct Monte Carlo simulation process and to assess the reliability of aircraft structures. Initially, the structure is modeled as a parallel system with active redundancy comprised of elements with uncorrelated (statistically independent) strengths and subjected to an equal load distribution. Closed form expressions for the system capacity cumulative distribution function (CDF) are developed by expanding the current expression for the capacity CDF of a parallel system comprised by three elements to a parallel system comprised with up to six elements. These newly developed expressions will be used to check the accuracy of the implementation of a Monte Carlo simulation algorithm to determine the probability of failure of a parallel system comprised of an arbitrary number of statistically independent elements. The second objective of this work is to compute the probability of failure of a fuselage skin lap joint under static load conditions through a Monte Carlo simulation scheme by utilizing the residual strength of the fasteners subjected to various initial load distributions and then subjected to a new unequal load distribution resulting from subsequent fastener sequential failures. The final and main objective of this thesis is to present a methodology for computing the resulting gradual deterioration of the reliability of an aircraft structural component by employing a direct Monte Carlo simulation approach. The uncertainties associated with the time to crack initiation, the probability of crack detection, the exponent in the crack propagation rate (Paris equation) and the yield strength of the elements are considered in the analytical model. The structural component is assumed to consist of a prescribed number of elements. This Monte Carlo simulation methodology is used to determine the required non-periodic inspections so that the reliability of the structural component will not fall below a prescribed minimum level. A sensitivity analysis is conducted to determine the effect of three key parameters on the specification of the non-periodic inspection intervals: namely a parameter associated with the time to crack initiation, the applied nominal stress fluctuation and the minimum acceptable reliability level.
NASA Astrophysics Data System (ADS)
Stepanova, Larisa; Bronnikov, Sergej
2018-03-01
The crack growth directional angles in the isotropic linear elastic plane with the central crack under mixed-mode loading conditions for the full range of the mixity parameter are found. Two fracture criteria of traditional linear fracture mechanics (maximum tangential stress and minimum strain energy density criteria) are used. Atomistic simulations of the central crack growth process in an infinite plane medium under mixed-mode loading using Large-scale Molecular Massively Parallel Simulator (LAMMPS), a classical molecular dynamics code, are performed. The inter-atomic potential used in this investigation is Embedded Atom Method (EAM) potential. The plane specimens with initial central crack were subjected to Mixed-Mode loadings. The simulation cell contains 400000 atoms. The crack propagation direction angles under different values of the mixity parameter in a wide range of values from pure tensile loading to pure shear loading in a wide diapason of temperatures (from 0.1 К to 800 К) are obtained and analyzed. It is shown that the crack propagation direction angles obtained by molecular dynamics method coincide with the crack propagation direction angles given by the multi-parameter fracture criteria based on the strain energy density and the multi-parameter description of the crack-tip fields.
Vibration energy harvesting using a piezoelectric circular diaphragm array.
Wang, Wei; Yang, Tongqing; Chen, Xurui; Yao, Xi
2012-09-01
This paper presents a method for harvesting electric energy from mechanical vibration using a mechanically excited piezoelectric circular membrane array. The piezoelectric circular diaphragm array consists of four plates with series and parallel connection, and the electrical characteristics of the array are examined under dynamic conditions. With an optimal load resistor of 160 kΩ, an output power of 28 mW was generated from the array in series connection at 150 Hz under a prestress of 0.8 N and a vibration acceleration of 9.8 m/s(2), whereas a maximal output power of 27 mW can be obtained from the array in parallel connection through a resistive load of 11 kΩ under the same frequency, prestress, and acceleration conditions. The results show that using a piezoelectric circular diaphragm array can significantly increase the output of energy compared with the use of a single plate. By choosing an appropriate connection pattern (series or parallel connections) among the plates, the equivalent impedance of the energy harvesting devices can be tailored to meet the matched load of different applications for maximal power output.
Pakarinen, O; Kaparaju, P; Rintala, J
2011-10-01
The possibility of shifting a methanogenic process for hydrogen production by changing the process parameters viz., organic loading rate (OLR) and hydraulic retention time (HRT) was evaluated. At first, two parallel semi-continuously fed continuously stirred tank reactors (CSTR) were operated as methanogenic reactors (M1 and M2) for 78 days. Results showed that a methane yield of 198-218 L/kg volatile solids fed (VS(fed)) was obtained when fed with grass silage at an OLR of 2 kgVS/m³/d and HRT of 30 days. After 78 days of operation, hydrogen production was induced in M2 by increasing the OLR from 2 to 10 kgVS/m³/d and shortening the HRT from 30 to 6 days. The highest H₂ yield of 42 L/kgVS(fed) was obtained with a maximum H₂ content of 24%. The present results thus demonstrate that methanogenic process can be shifted towards hydrogen production by increasing the OLR and decreasing HRT. Copyright © 2011 Elsevier Ltd. All rights reserved.
An Analysis of Performance Enhancement Techniques for Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)
2002-01-01
The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
NASA Technical Reports Server (NTRS)
Radloff, H. D., II; Hyer, M. W.; Nemeth, M. P.
1994-01-01
The focus of this work is the buckling response of symmetrically laminated composite plates having a planform area in the shape of an isosceles trapezoid. The loading is assumed to be inplane and applied perpendicular to the parallel ends of the plate. The tapered edges of the plate are assumed to have simply supported boundary conditions, while the parallel ends are assumed to have either simply supported or clamped boundary conditions. A semi-analytic closed-form solution based on energy principles and the Trefftz stability criterion is derived and solutions are obtained using the Rayleigh-Ritz method. Intrinsic in this solution is a simplified prebuckling analysis which approximates the inplane force resultant distributions by the forms Nx=P/W(x) and Ny=Nxy=0, where P is the applied load and W(x) is the plate width which, for the trapezoidal planform, varies linearly with the lengthwise coordinate x. The out-of-plane displacement is approximated by a double trigonometric series. This analysis is posed in terms of four nondimensional parameters representing orthotropic and anisotropic material properties, and two nondimensional parameters representing geometric properties. For comparison purposes, a number of specific plate geometry, ply orientation, and stacking sequence combinations are investigated using the general purpose finite element code ABAQUS. Comparison of buckling coefficients calculated using the semi-analytical model and the finite element model show agreement within 5 percent, in general, and within 15 percent for the worst cases. In order to verify both the finite element and semi-analytical analyses, buckling loads are measured for graphite/epoxy plates having a wide range of plate geometries and stacking sequences. Test fixtures, instrumentation system, and experimental technique are described. Experimental results for the buckling load, the buckled mode shape, and the prebuckling plate stiffness are presented and show good agreement with the analytical results regarding the buckling load and the prebuckling plate stiffness. However, the experimental results show that for some cases the analysis underpredicts the number of halfwaves in the buckled mode shape. In the context of the definitions of taper ratio and aspect ratio used in this study, it is concluded that the buckling load always increases as taper ratio increases for a given aspect ratio for plates having simply supported boundary conditions on the parallel ends. There are combinations of plate geometry and ply stackling sequences, however, that reverse this trend for plates having clamped boundary conditions on the parallel ends such that an increase in the taper ratio causes a decrease in the buckling load. The clamped boundary conditions on the parallel ends of the plate are shown to increase the buckling load compared to simply supported boundary conditions. Also, anisotropy (the D16 and D26 terms) is shown to decrease the buckling load and skew the buckled mode shape for both the simply supported and clamped boundary conditions.
Emms, David M; Covshoff, Sarah; Hibberd, Julian M; Kelly, Steven
2016-07-01
C4 photosynthesis is considered one of the most remarkable examples of evolutionary convergence in eukaryotes. However, it is unknown whether the evolution of C4 photosynthesis required the evolution of new genes. Genome-wide gene-tree species-tree reconciliation of seven monocot species that span two origins of C4 photosynthesis revealed that there was significant parallelism in the duplication and retention of genes coincident with the evolution of C4 photosynthesis in these lineages. Specifically, 21 orthologous genes were duplicated and retained independently in parallel at both C4 origins. Analysis of this gene cohort revealed that the set of parallel duplicated and retained genes is enriched for genes that are preferentially expressed in bundle sheath cells, the cell type in which photosynthesis was activated during C4 evolution. Furthermore, functional analysis of the cohort of parallel duplicated genes identified SWEET-13 as a potential key transporter in the evolution of C4 photosynthesis in grasses, and provides new insight into the mechanism of phloem loading in these C4 species. C4 photosynthesis, gene duplication, gene families, parallel evolution. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Emms, David M.; Covshoff, Sarah; Hibberd, Julian M.; Kelly, Steven
2016-01-01
C4 photosynthesis is considered one of the most remarkable examples of evolutionary convergence in eukaryotes. However, it is unknown whether the evolution of C4 photosynthesis required the evolution of new genes. Genome-wide gene-tree species-tree reconciliation of seven monocot species that span two origins of C4 photosynthesis revealed that there was significant parallelism in the duplication and retention of genes coincident with the evolution of C4 photosynthesis in these lineages. Specifically, 21 orthologous genes were duplicated and retained independently in parallel at both C4 origins. Analysis of this gene cohort revealed that the set of parallel duplicated and retained genes is enriched for genes that are preferentially expressed in bundle sheath cells, the cell type in which photosynthesis was activated during C4 evolution. Furthermore, functional analysis of the cohort of parallel duplicated genes identified SWEET-13 as a potential key transporter in the evolution of C4 photosynthesis in grasses, and provides new insight into the mechanism of phloem loading in these C4 species. Key words: C4 photosynthesis, gene duplication, gene families, parallel evolution. PMID:27016024
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu
2012-10-01
We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
FPGA-based prototype storage system with phase change memory
NASA Astrophysics Data System (ADS)
Li, Gezi; Chen, Xiaogang; Chen, Bomy; Li, Shunfen; Zhou, Mi; Han, Wenbing; Song, Zhitang
2016-10-01
With the ever-increasing amount of data being stored via social media, mobile telephony base stations, and network devices etc. the database systems face severe bandwidth bottlenecks when moving vast amounts of data from storage to the processing nodes. At the same time, Storage Class Memory (SCM) technologies such as Phase Change Memory (PCM) with unique features like fast read access, high density, non-volatility, byte-addressability, positive response to increasing temperature, superior scalability, and zero standby leakage have changed the landscape of modern computing and storage systems. In such a scenario, we present a storage system called FLEET which can off-load partial or whole SQL queries to the storage engine from CPU. FLEET uses an FPGA rather than conventional CPUs to implement the off-load engine due to its highly parallel nature. We have implemented an initial prototype of FLEET with PCM-based storage. The results demonstrate that significant performance and CPU utilization gains can be achieved by pushing selected query processing components inside in PCM-based storage.
Dynamics of bubble collapse under vessel confinement in 2D hydrodynamic experiments
NASA Astrophysics Data System (ADS)
Shpuntova, Galina; Austin, Joanna
2013-11-01
One trauma mechanism in biomedical treatment techniques based on the application of cumulative pressure pulses generated either externally (as in shock-wave lithotripsy) or internally (by laser-induced plasma) is the collapse of voids. However, prediction of void-collapse driven tissue damage is a challenging problem, involving complex and dynamic thermomechanical processes in a heterogeneous material. We carry out a series of model experiments to investigate the hydrodynamic processes of voids collapsing under dynamic loading in configurations designed to model cavitation with vessel confinement. The baseline case of void collapse near a single interface is also examined. Thin sheets of tissue-surrogate polymer materials with varying acoustic impedance are used to create one or two parallel material interfaces near the void. Shadowgraph photography and two-color, single-frame particle image velocimetry quantify bubble collapse dynamics including jetting, interface dynamics and penetration, and the response of the surrounding material. Research supported by NSF Award #0954769, ``CAREER: Dynamics and damage of void collapse in biological materials under stress wave loading.''
Assessment of modification factors for a row of bolts or timber connectors
Thomas Lee Wilkinson
1980-01-01
When bolts or timber connectors are used in a row, with load applied parallel to the row, load will be unequally distributed among the fasteners. This study assessed methods of predicting this unequal load distribution, looked at how joint variables can affect the distribution, and compared the predictions with data existing in the literature. Presently used design...
Freitas, S; Walz, A; Merkle, H P; Gander, B
2003-01-01
The potential of a static micromixer for the production of protein-loaded biodegradable polymeric microspheres by a modified solvent extraction process was examined. The mixer consists of an array of microchannels and features a simple set-up, consumes only very small space, lacks moving parts and offers simple control of the microsphere size. Scale-up from lab bench to industrial production is easily feasible through parallel installation of a sufficient number of micromixers ('number-up'). Poly(lactic-co-glycolic acid) microspheres loaded with a model protein, bovine serum albumin (BSA), were prepared. The influence of various process and formulation parameters on the characteristics of the microspheres was examined with special focus on particle size distribution. Microspheres with monomodal size distributions having mean diameters of 5-30 micro m were produced with excellent reproducibility. Particle size distributions were largely unaffected by polymer solution concentration, polymer type and nominal BSA load, but depended on the polymer solvent. Moreover, particle mean diameters could be varied in a considerable range by modulating the flow rates of the mixed fluids. BSA encapsulation efficiencies were mostly in the region of 75-85% and product yields ranged from 90-100%. Because of its simple set-up and its suitability for continuous production, static micromixing is suggested for the automated and aseptic production of protein-loaded microspheres.
Emms, David M.; Covshoff, Sarah; Hibberd, Julian M.; ...
2016-03-24
C4 photosynthesis is considered one of the most remarkable examples of evolutionary convergence in eukaryotes. However, it is unknown whether the evolution of C4 photosynthesis required the evolution of new genes. Genome-wide gene-tree species-tree reconciliation of seven monocot species that span two origins of C4 photosynthesis revealed that there was significant parallelism in the duplication and retention of genes coincident with the evolution of C4 photosynthesis in these lineages. Specifically, 21 orthologous genes were duplicated and retained independently in parallel at both C4 origins. Analysis of this gene cohort revealed that the set of parallel duplicated and retained genes ismore » enriched for genes that are preferentially expressed in bundle sheath cells, the cell type in which photosynthesis was activated during C4 evolution. Moreover, functional analysis of the cohort of parallel duplicated genes identified SWEET-13 as a potential key transporter in the evolution of C4 photosynthesis in grasses, and provides new insight into the mechanism of phloem loading in these C4 species.« less
The Mercury System: Embedding Computation into Disk Drives
2004-08-20
enabling technologies to build extremely fast data search engines . We do this by moving the search closer to the data, and performing it in hardware...engine searches in parallel across a disk or disk surface 2. System Parallelism: Searching is off-loaded to search engines and main processor can
Scalable Domain Decomposed Monte Carlo Particle Transport
NASA Astrophysics Data System (ADS)
O'Brien, Matthew Joseph
In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation. The main algorithms we consider are: • Domain decomposition of constructive solid geometry: enables extremely large calculations in which the background geometry is too large to fit in the memory of a single computational node. • Load Balancing: keeps the workload per processor as even as possible so the calculation runs efficiently. • Global Particle Find: if particles are on the wrong processor, globally resolve their locations to the correct processor based on particle coordinate and background domain. • Visualizing constructive solid geometry, sourcing particles, deciding that particle streaming communication is completed and spatial redecomposition. These algorithms are some of the most important parallel algorithms required for domain decomposed Monte Carlo particle transport. We demonstrate that our previous algorithms were not scalable, prove that our new algorithms are scalable, and run some of the algorithms up to 2 million MPI processes on the Sequoia supercomputer.
NASA Astrophysics Data System (ADS)
Gutzwiller, David; Gontier, Mathieu; Demeulenaere, Alain
2014-11-01
Multi-Block structured solvers hold many advantages over their unstructured counterparts, such as a smaller memory footprint and efficient serial performance. Historically, multi-block structured solvers have not been easily adapted for use in a High Performance Computing (HPC) environment, and the recent trend towards hybrid GPU/CPU architectures has further complicated the situation. This paper will elaborate on developments and innovations applied to the NUMECA FINE/Turbo solver that have allowed near-linear scalability with real-world problems on over 250 hybrid GPU/GPU cluster nodes. Discussion will focus on the implementation of virtual partitioning and load balancing algorithms using a novel meta-block concept. This implementation is transparent to the user, allowing all pre- and post-processing steps to be performed using a simple, unpartitioned grid topology. Additional discussion will elaborate on developments that have improved parallel performance, including fully parallel I/O with the ADIOS API and the GPU porting of the computationally heavy CPUBooster convergence acceleration module. Head of HPC and Release Management, Numeca International.
NASA Astrophysics Data System (ADS)
Zhilenkov, A. A.; Kapitonov, A. A.
2017-10-01
It is known that many of today’s ships and vessels have a shaft generator as a part of their power plants. Modern automatic control systems used in the world’s fleet do not enable their shaft generators to operate in parallel with the main diesel generators for long-term sustenance of the total load of the ship network. On the other hand, according to our calculations and experiments, a shaft generator operated in parallel with the main power plant helps save at least 10% of fuel while making the power system of the ship more efficient, reliable, and eco-friendly. The fouling and corrosion of the propeller as well as the weather conditions of navigation affect its modulus of resistance. It changes the free component of the transient process of shaft generator stress frequency changes in transient processes. While the shaft generator and the diesel generator of the ship power plant are paralleled, there emerges an angle between their EMF. This results in equalizing currents generated between them. The altering torque in the drive-shaft line—propeller system causes torsional fluctuations of the ship shaft line. To compensate for the effect of destabilizing factors and torsional fluctuations of the shaft line on the dynamic characteristics of the transient process that alters the RPM of the main engine, sliding mode controls can be used. To synthesize such a control, one has to evaluate the effect of destabilizing factors.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1997-01-01
Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
Internet traffic load balancing using dynamic hashing with flow volume
NASA Astrophysics Data System (ADS)
Jo, Ju-Yeon; Kim, Yoohwan; Chao, H. Jonathan; Merat, Francis L.
2002-07-01
Sending IP packets over multiple parallel links is in extensive use in today's Internet and its use is growing due to its scalability, reliability and cost-effectiveness. To maximize the efficiency of parallel links, load balancing is necessary among the links, but it may cause the problem of packet reordering. Since packet reordering impairs TCP performance, it is important to reduce the amount of reordering. Hashing offers a simple solution to keep the packet order by sending a flow over a unique link, but static hashing does not guarantee an even distribution of the traffic amount among the links, which could lead to packet loss under heavy load. Dynamic hashing offers some degree of load balancing but suffers from load fluctuations and excessive packet reordering. To overcome these shortcomings, we have enhanced the dynamic hashing algorithm to utilize the flow volume information in order to reassign only the appropriate flows. This new method, called dynamic hashing with flow volume (DHFV), eliminates unnecessary flow reassignments of small flows and achieves load balancing very quickly without load fluctuation by accurately predicting the amount of transferred load between the links. In this paper we provide the general framework of DHFV and address the challenges in implementing DHFV. We then introduce two algorithms of DHFV with different flow selection strategies and show their performances through simulation.
NASA Astrophysics Data System (ADS)
Maruo, Shoji; Sugiyama, Kenji; Daicho, Yuya; Monri, Kensaku
2014-03-01
A three-dimensional (3-D) molding process using a master polymer mold produced by microstereolithography has been developed for the production of piezoelectric ceramic elements. In this method, ceramic slurry is injected into a 3-D polymer mold via a centrifugal casting process. The polymer master mold is thermally decomposed so that complex 3-D piezoelectric ceramic elements can be produced. As an example of 3-D piezoelectric ceramic elements, we produced a spiral piezoelectric element that can convert multidirectional loads into a voltage. It was confirmed that a prototype of the spiral piezoelectric element could generate a voltage by applying a load in both parallel and lateral directions in relation to the helical axis. The power output of 123 pW was obtained by applying the maximum load of 2.8N at 2 Hz along the helical axis. In addition, to improve the performance of power generation, we utilized a two-step sintering process to obtain dense piezoelectric elements. As a result, we obtained a sintering body with relative density of 92.8%. Piezoelectric constant d31 of the sintered body attained to -40.0 pC/N. Furthermore we analyzed the open-circuit voltage of the spiral piezoelectric element using COMSOL multiphysics. As a result, it was found that use of patterned electrodes according to the surface potential distribution of the spiral piezoelectric element had a potential to provide high output voltage that was 20 times larger than that of uniform electrodes.
NASA Technical Reports Server (NTRS)
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.
2013-01-01
The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.
CERN data services for LHC computing
NASA Astrophysics Data System (ADS)
Espinal, X.; Bocchi, E.; Chan, B.; Fiorot, A.; Iven, J.; Lo Presti, G.; Lopez, J.; Gonzalez, H.; Lamanna, M.; Mascetti, L.; Moscicki, J.; Pace, A.; Peters, A.; Ponce, S.; Rousseau, H.; van der Ster, D.
2017-10-01
Dependability, resilience, adaptability and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the broad number of experiments at CERN need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the success of HEP experiments. Nowadays we operate at high incoming throughput (14GB/s during 2015 LHC Pb-Pb run and 11PB in July 2016) and with concurrent complex production work-loads. In parallel our systems provide the platform for the continuous user and experiment driven work-loads for large-scale data analysis, including end-user access and sharing. The storage services at CERN cover the needs of our community: EOS and CASTOR as a large-scale storage; CERNBox for end-user access and sharing; Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy distributed-file-system services. In this paper we will summarise the experience in supporting LHC experiments and the transition of our infrastructure from static monolithic systems to flexible components providing a more coherent environment with pluggable protocols, tuneable QoS, sharing capabilities and fine grained ACLs management while continuing to guarantee dependable and robust services.
Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code
NASA Astrophysics Data System (ADS)
Longoni, Gianluca; Anderson, Stanwood L.
2009-08-01
The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.
NASA Astrophysics Data System (ADS)
Kim, Jin Seok; Hur, Min Young; Kim, Chang Ho; Kim, Ho Jun; Lee, Hae June
2018-03-01
A two-dimensional parallelized particle-in-cell simulation has been developed to simulate a capacitively coupled plasma reactor. The parallelization using graphics processing units is applied to resolve the heavy computational load. It is found that the step-ionization plays an important role in the intermediate gas pressure of a few Torr. Without the step-ionization, the average electron density decreases while the effective electron temperature increases with the increase of gas pressure at a fixed power. With the step-ionization, however, the average electron density increases while the effective electron temperature decreases with the increase of gas pressure. The cases with the step-ionization agree well with the tendency of experimental measurement. The electron energy distribution functions show that the population of electrons having intermediate energy from 4.2 to 12 eV is relaxed by the step-ionization. Also, it was observed that the power consumption by the electrons is increasing with the increase of gas pressure by the step-ionization process, while the power consumption by the ions decreases with the increase of gas pressure.
Atoche, Alejandro Castillo; Castillo, Javier Vázquez
2012-01-01
A high-speed dual super-systolic core for reconstructive signal processing (SP) operations consists of a double parallel systolic array (SA) machine in which each processing element of the array is also conceptualized as another SA in a bit-level fashion. In this study, we addressed the design of a high-speed dual super-systolic array (SSA) core for the enhancement/reconstruction of remote sensing (RS) imaging of radar/synthetic aperture radar (SAR) sensor systems. The selected reconstructive SP algorithms are efficiently transformed in their parallel representation and then, they are mapped into an efficient high performance embedded computing (HPEC) architecture in reconfigurable Xilinx field programmable gate array (FPGA) platforms. As an implementation test case, the proposed approach was aggregated in a HW/SW co-design scheme in order to solve the nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) from a remotely sensed scene. We show how such dual SSA core, drastically reduces the computational load of complex RS regularization techniques achieving the required real-time operational mode. PMID:22736964
Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak
2001-01-01
The Information Power Grid (IPG) concept developed by NASA is aimed to provide a metacomputing platform for large-scale distributed computations, by hiding the intricacies of highly heterogeneous environment and yet maintaining adequate security. In this paper, we propose a latency-tolerant partitioning scheme that dynamically balances processor workloads on the.IPG, and minimizes data movement and runtime communication. By simulating an unsteady adaptive mesh application on a wide area network, we study the performance of our load balancer under the Globus environment. The number of IPG nodes, the number of processors per node, and the interconnected speeds are parameterized to derive conditions under which the IPG would be suitable for parallel distributed processing of such applications. Experimental results demonstrate that effective solution are achieved when the IPG nodes are connected by a high-speed asynchronous interconnection network.
Performance Enhancement Strategies for Multi-Block Overset Grid CFD Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak
2003-01-01
The overset grid methodology has significantly reduced time-to-solution of highfidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement strategies on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machinc. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Details of a sophisticated graph partitioning technique for grid grouping are also provided. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
Shifts in information processing level: the speed theory of intelligence revisited.
Sircar, S S
2000-06-01
A hypothesis is proposed here to reconcile the inconsistencies observed in the IQ-P3 latency relation. The hypothesis stems from the observation that task-induced increase in P3 latency correlates positively with IQ scores. It is hypothesised that: (a) there are several parallel information processing pathways of varying complexity which are associated with the generation of P3 waves of varying latencies; (b) with increasing workload, there is a shift in the 'information processing level' through progressive recruitment of more complex polysynaptic pathways with greater processing power and inhibition of the oligosynaptic pathways; (c) high-IQ subjects have a greater reserve of higher level processing pathways; (d) a given 'task-load' imposes a greater 'mental workload' in subjects with lower IQ than in those with higher IQ. According to this hypothesis, a meaningful comparison of the P3 correlates of IQ is possible only when the information processing level is pushed to its limits.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamblin, T; de Supinski, B R; Schulz, M
Good load balance is crucial on very large parallel systems, but the most sophisticated algorithms introduce dynamic imbalances through adaptation in domain decomposition or use of adaptive solvers. To observe and diagnose imbalance, developers need system-wide, temporally-ordered measurements from full-scale runs. This potentially requires data collection from multiple code regions on all processors over the entire execution. Doing this instrumentation naively can, in combination with the application itself, exceed available I/O bandwidth and storage capacity, and can induce severe behavioral perturbations. We present and evaluate a novel technique for scalable, low-error load balance measurement. This uses a parallel wavelet transformmore » and other parallel encoding methods. We show that our technique collects and reconstructs system-wide measurements with low error. Compression time scales sublinearly with system size and data volume is several orders of magnitude smaller than the raw data. The overhead is low enough for online use in a production environment.« less
Feedbacks Between Surface Processes and Tectonics at Rifted Margins: a Numerical Approach
NASA Astrophysics Data System (ADS)
Andres-Martinez, M.; Perez-Gussinye, M.; Morgan, J. P.; Armitage, J. J.
2014-12-01
Mantle dynamics drives the rifting of the continents and consequent crustal processes shape the topography of the rifted margins. Surface processes modify the topography by eroding positive reliefs and sedimenting on the basins. This lateral displacement of masses implies a change in the loads during rifting, affecting the architecture of the resulting margins. Furthermore, thermal insulation due to sediments could potentially have an impact on the rheologies, which are proved to be one of the most influential parameters that control the deformation style at the continental margins. In order to understand the feedback between these processes we have developed a numerical geodynamic model based on MILAMIN. Our model consists of a 2D Lagrangian triangular mesh for which velocities, displacements, pressures and temperatures are calculated each time step. The model is visco-elastic and includes a free-surface stabilization algorithm, strain weakening and an erosion/sedimentation algorithm. Sediment loads and temperatures on the sediments are taken into account when solving velocities and temperatures for the whole model. Although surface processes are strongly three-dimensional, we have chosen to study a 2D section parallel to the extension as a first approach. Results show that where sedimentation occurs strain further localizes. This is due to the extra load of the sediments exerting a gravitational force over the topography. We also observed angular unconformities on the sediments due to the rotation of crustal blocks associated with normal faults. In order to illustrate the feedbacks between surface and inner processes we will show a series of models calculated with different rheologies and extension velocities, with and without erosion/sedimentation. We will then discuss to which extent thermal insulation due to sedimentation and increased stresses due to sediment loading affect the geometry and distribution of faulting, the rheology of the lower crust and consequently margin architecture.
Analysis of series resonant converter with series-parallel connection
NASA Astrophysics Data System (ADS)
Lin, Bor-Ren; Huang, Chien-Lan
2011-02-01
In this study, a parallel inductor-inductor-capacitor (LLC) resonant converter series-connected on the primary side and parallel-connected on the secondary side is presented for server power supply systems. Based on series resonant behaviour, the power metal-oxide-semiconductor field-effect transistors are turned on at zero voltage switching and the rectifier diodes are turned off at zero current switching. Thus, the switching losses on the power semiconductors are reduced. In the proposed converter, the primary windings of the two LLC converters are connected in series. Thus, the two converters have the same primary currents to ensure that they can supply the balance load current. On the output side, two LLC converters are connected in parallel to share the load current and to reduce the current stress on the secondary windings and the rectifier diodes. In this article, the principle of operation, steady-state analysis and design considerations of the proposed converter are provided and discussed. Experiments with a laboratory prototype with a 24 V/21 A output for server power supply were performed to verify the effectiveness of the proposed converter.
Extending substructure based iterative solvers to multiple load and repeated analyses
NASA Technical Reports Server (NTRS)
Farhat, Charbel
1993-01-01
Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.
Lensch, D; Schaum, C; Cornel, P
2016-01-01
Many digesters in Germany are not operated at full capacity; this offers the opportunity for co-digestion. Within this research the potentials and limits of a flexible and adapted sludge treatment are examined with a focus on the digestion process with added food waste as co-substrate. In parallel, energy data from a municipal wastewater treatment plant (WWTP) are analysed and lab-scale semi-continuous and batch digestion tests are conducted. Within the digestion tests, the ratio of sewage sludge to co-substrate was varied. The final methane yields show the high potential of food waste: the higher the amount of food waste the higher the final yield. However, the conversion rates directly after charging demonstrate better results by charging 10% food waste instead of 20%. Finally, these results are merged with the energy data from the WWTP. As an illustration, the load required to cover base loads as well as peak loads for typical daily variations of the plant's energy demand are calculated. It was found that 735 m³ raw sludge and 73 m³ of a mixture of raw sludge and food waste is required to cover 100% of the base load and 95% of the peak load.
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment
NASA Astrophysics Data System (ADS)
Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.
2013-12-01
Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; Parallel Computing; Load Balance; Optimization; Cost Model
Solli, Linn; Bergersen, Ove; Sørheim, Roald; Briseid, Tormod
2014-08-01
This study examined the effects of an increased load of nitrogen-rich organic material on anaerobic digestion and methane production. Co-digestion of fish waste silage (FWS) and cow manure (CM) was studied in two parallel laboratory-scale (8L effective volume) semi-continuous stirred tank reactors (designated R1 and R2). A reactor fed with CM only (R0) was used as control. The reactors were operated in the mesophilic range (37°C) with a hydraulic retention time of 30 days, and the entire experiment lasted for 450 days. The rate of organic loading was raised by increasing the content of FWS in the feed stock. During the experiment, the amount (volume%) of FWS was increased stepwise in the following order: 3% - 6% - 13% - 16%, and 19%. Measurements of methane production, and analysis of volatile fatty acids, ammonium and pH in the effluents were carried out. The highest methane production from co-digestion of FWS and CM was 0.400 L CH4 gVS(-1), obtained during the period with loading of 16% FWS in R2. Compared to anaerobic digestion of CM only, the methane production was increased by 100% at most, when FWS was added to the feed stock. The biogas processes failed in R1 and R2 during the periods, with loadings of 16% and 19% FWS, respectively. In both reactors, the biogas processes failed due to overloading and accumulation of ammonia and volatile fatty acids. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Hasanov, Alemdar; Kawano, Alexandre
2016-05-01
Two types of inverse source problems of identifying asynchronously distributed spatial loads governed by the Euler-Bernoulli beam equation ρ (x){w}{tt}+μ (x){w}t+{({EI}(x){w}{xx})}{xx}-{T}r{u}{xx}={\\sum }m=1M{g}m(t){f}m(x), (x,t)\\in {{{Ω }}}T := (0,l)× (0,T), with hinged-clamped ends (w(0,t)={w}{xx}(0,t)=0,w(l,t) = {w}x(l,t)=0,t\\in (0,T)), are studied. Here {g}m(t) are linearly independent functions, describing an asynchronous temporal loading, and {f}m(x) are the spatial load distributions. In the first identification problem the values {ν }k(t),k=\\bar{1,K}, of the deflection w(x,t), are assumed to be known, as measured output data, in a neighbourhood of the finite set of points P:= \\{{x}k\\in (0,l),k=\\bar{1,K}\\}\\subset (0,l), corresponding to the internal points of a continuous beam, for all t\\in ]0,T[. In the second identification problem the values {θ }k(t),k=\\bar{1,K}, of the slope {w}x(x,t), are assumed to be known, as measured output data in a neighbourhood of the same set of points P for all t\\in ]0,T[. These inverse source problems will be defined subsequently as the problems ISP1 and ISP2. The general purpose of this study is to develop mathematical concepts and tools that are capable of providing effective numerical algorithms for the numerical solution of the considered class of inverse problems. Note that both measured output data {ν }k(t) and {θ }k(t) contain random noise. In the first part of the study we prove that each measured output data {ν }k(t) and {θ }k(t),k=\\bar{1,K} can uniquely determine the unknown functions {f}m\\in {H}-1(]0,l[),m=\\bar{1,M}. In the second part of the study we will introduce the input-output operators {{ K }}d :{L}2(0,T)\\mapsto {L}2(0,T),({{ K }}df)(t):= w(x,t;f),x\\in P, f(x) := ({f}1(x),\\ldots ,{f}M(x)), and {{ K }}s :{L}2(0,T)\\mapsto {L}2(0,T), ({{ K }}sf)(t):= {w}x(x,t;f), x\\in P , corresponding to the problems ISP1 and ISP2, and then reformulate these problems as the operator equations: {{ K }}df=ν and {{ K }}sf=θ , where ν (t):= ({ν }1(t),\\ldots ,{ν }K(t)) and {θ }k(t):= ({θ }1(t),\\ldots ,{θ }K(t)). Since both measured output data contain random noise, we use the most prominent regularisation method, Tikhonov regularisation, introducing the regularised cost functionals {J}1α (f):= (1/2)\\parallel {{ K }}df-ν {\\parallel }{L2(0,T)}2+(1/2)α \\parallel f{\\parallel }{L2(0,T)}2 and {J}2α (f):= (1/2)\\parallel {{ K }}sf-θ {\\parallel }{L2(0,T)}2+(1/2)α \\parallel f{\\parallel }{L2(0,T)}2. Using a priori estimates for the weak solution of the direct problem and the Tikhonov regularisation method combined with the adjoint problem approach, we prove that the Fréchet gradients {J}1\\prime (f) and {J}2\\prime (f) of both cost functionals can explicitly be derived via the corresponding weak solutions of adjoint problems and the known temporal loads {g}m(t). Moreover, we show that these gradients are Lipschitz continuous, which allows the use of gradient type iteration convergent algorithms. Two applications of the proposed theory are presented. It is shown that solvability results for inverse source problems related to the synchronous loading case, with a single interior measured data, are special cases of the obtained results for asynchronously distributed spatial load cases.
Control and protection system for paralleled modular static inverter-converter systems
NASA Technical Reports Server (NTRS)
Birchenough, A. G.; Gourash, F.
1973-01-01
A control and protection system was developed for use with a paralleled 2.5-kWe-per-module static inverter-converter system. The control and protection system senses internal and external fault parameters such as voltage, frequency, current, and paralleling current unbalance. A logic system controls contactors to isolate defective power conditioners or loads. The system sequences contactor operation to automatically control parallel operation, startup, and fault isolation. Transient overload protection and fault checking sequences are included. The operation and performance of a control and protection system, with detailed circuit descriptions, are presented.
NASA Astrophysics Data System (ADS)
Yan, Beichuan; Regueiro, Richard A.
2018-02-01
A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.
14 CFR 25.507 - Reversed braking.
Code of Federal Regulations, 2010 CFR
2010-01-01
... must be in a three point static ground attitude. Horizontal reactions parallel to the ground and... must be equal to 0.55 times the vertical load at each wheel or to the load developed by 1.2 times the... ground reactions must pass through the center of gravity of the airplane. ...
14 CFR 25.507 - Reversed braking.
Code of Federal Regulations, 2014 CFR
2014-01-01
... must be in a three point static ground attitude. Horizontal reactions parallel to the ground and... must be equal to 0.55 times the vertical load at each wheel or to the load developed by 1.2 times the... ground reactions must pass through the center of gravity of the airplane. ...
14 CFR 25.507 - Reversed braking.
Code of Federal Regulations, 2011 CFR
2011-01-01
... must be in a three point static ground attitude. Horizontal reactions parallel to the ground and... must be equal to 0.55 times the vertical load at each wheel or to the load developed by 1.2 times the... ground reactions must pass through the center of gravity of the airplane. ...
14 CFR 25.507 - Reversed braking.
Code of Federal Regulations, 2012 CFR
2012-01-01
... must be in a three point static ground attitude. Horizontal reactions parallel to the ground and... must be equal to 0.55 times the vertical load at each wheel or to the load developed by 1.2 times the... ground reactions must pass through the center of gravity of the airplane. ...
14 CFR 25.507 - Reversed braking.
Code of Federal Regulations, 2013 CFR
2013-01-01
... must be in a three point static ground attitude. Horizontal reactions parallel to the ground and... must be equal to 0.55 times the vertical load at each wheel or to the load developed by 1.2 times the... ground reactions must pass through the center of gravity of the airplane. ...
Load Balancing in Stochastic Networks: Algorithms, Analysis, and Game Theory
2014-04-16
SECURITY CLASSIFICATION OF: The classic randomized load balancing model is the so-called supermarket model, which describes a system in which...P.O. Box 12211 Research Triangle Park, NC 27709-2211 mean-field limits, supermarket model, thresholds, game, randomized load balancing REPORT...balancing model is the so-called supermarket model, which describes a system in which customers arrive to a service center with n parallel servers according
A high-performance spatial database based approach for pathology imaging algorithm evaluation
Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.
2013-01-01
Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
NASA Astrophysics Data System (ADS)
Wang, Miaomiao; Tan, Chengxuan; Meng, Jing; Yang, Baicun; Li, Yuan
2017-08-01
Characterization and evolution of the cracking mode in shale formation is significant, as fracture networks are an important element in shale gas exploitation. In this study we determine the crack modes and evolution in anisotropic shale under cyclic loading using the acoustic emission (AE) parameter-analysis method based on the average frequency and RA (rise-time/amplitude) value. Shale specimens with bedding-plane orientations parallel and perpendicular to the axial loading direction were subjected to loading cycles with increasing peak values until failure occurred. When the loading was parallel to the bedding plane, most of the cracks at failure were shear cracks, while tensile cracks were dominant in the specimens that were loaded normal to the bedding direction. The evolution of the crack mode in the shale specimens observed in the loading-unloading sequence except for the first cycle can be divided into three stages: (I) no or several cracks (AE events) form as a result of the Kaiser effect, (II) tensile and shear cracks increase steadily at nearly equal proportions, (III) tensile cracks and shear cracks increase abruptly, with more cracks forming in one mode than in the other. As the dominant crack motion is influenced by the bedding, the failure mechanism is discussed based on the evolution of the different crack modes. Our conclusions can increase our understanding of the formation mechanism of fracture networks in the field.
Operation of high power converters in parallel
NASA Technical Reports Server (NTRS)
Decker, D. K.; Inouye, L. Y.
1993-01-01
High power converters that are used in space power subsystems are limited in power handling capability due to component and thermal limitations. For applications, such as Space Station Freedom, where multi-kilowatts of power must be delivered to user loads, parallel operation of converters becomes an attractive option when considering overall power subsystem topologies. TRW developed three different unequal power sharing approaches for parallel operation of converters. These approaches, known as droop, master-slave, and proportional adjustment, are discussed and test results are presented.
The Data Transfer Kit: A geometric rendezvous-based tool for multiphysics data transfer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slattery, S. R.; Wilson, P. P. H.; Pawlowski, R. P.
2013-07-01
The Data Transfer Kit (DTK) is a software library designed to provide parallel data transfer services for arbitrary physics components based on the concept of geometric rendezvous. The rendezvous algorithm provides a means to geometrically correlate two geometric domains that may be arbitrarily decomposed in a parallel simulation. By repartitioning both domains such that they have the same geometric domain on each parallel process, efficient and load balanced search operations and data transfer can be performed at a desirable algorithmic time complexity with low communication overhead relative to other types of mapping algorithms. With the increased development efforts in multiphysicsmore » simulation and other multiple mesh and geometry problems, generating parallel topology maps for transferring fields and other data between geometric domains is a common operation. The algorithms used to generate parallel topology maps based on the concept of geometric rendezvous as implemented in DTK are described with an example using a conjugate heat transfer calculation and thermal coupling with a neutronics code. In addition, we provide the results of initial scaling studies performed on the Jaguar Cray XK6 system at Oak Ridge National Laboratory for a worse-case-scenario problem in terms of algorithmic complexity that shows good scaling on 0(1 x 104) cores for topology map generation and excellent scaling on 0(1 x 105) cores for the data transfer operation with meshes of O(1 x 109) elements. (authors)« less
Parallel Analysis with Unidimensional Binary Data
ERIC Educational Resources Information Center
Weng, Li-Jen; Cheng, Chung-Ping
2005-01-01
The present simulation investigated the performance of parallel analysis for unidimensional binary data. Single-factor models with 8 and 20 indicators were examined, and sample size (50, 100, 200, 500, and 1,000), factor loading (.45, .70, and .90), response ratio on two categories (50/50, 60/40, 70/30, 80/20, and 90/10), and types of correlation…
Concurrent processing simulation of the space station
NASA Technical Reports Server (NTRS)
Gluck, R.; Hale, A. L.; Sunkel, John W.
1989-01-01
The development of a new capability for the time-domain simulation of multibody dynamic systems and its application to the study of a large angle rotational maneuvers of the Space Station is described. The effort was divided into three sequential tasks, which required significant advancements of the state-of-the art to accomplish. These were: (1) the development of an explicit mathematical model via symbol manipulation of a flexible, multibody dynamic system; (2) the development of a methodology for balancing the computational load of an explicit mathematical model for concurrent processing; and (3) the implementation and successful simulation of the above on a prototype Custom Architectured Parallel Processing System (CAPPS) containing eight processors. The throughput rate achieved by the CAPPS operating at only 70 percent efficiency, was 3.9 times greater than that obtained sequentially by the IBM 3090 supercomputer simulating the same problem. More significantly, analysis of the results leads to the conclusion that the relative cost effectiveness of concurrent vs. sequential digital computation will grow substantially as the computational load is increased. This is a welcomed development in an era when very complex and cumbersome mathematical models of large space vehicles must be used as substitutes for full scale testing which has become impractical.
NASA Astrophysics Data System (ADS)
Ma, Guowei; Zhang, Junfei; Wang, Li; Li, Zhijian; Sun, Junbo
2018-07-01
3D concrete printing is an innovative and promising construction method that is rapidly gaining ground in recent years. This technique extrudes premixed concrete materials through a nozzle to build structural components layer upon layer without formworks. The build-up process of depositing filaments or layers intrinsically produce laminated structures and create weak joints between adjacent layers. It is of great significance to clearly elaborate the mechanical characteristics of 3D printed components response to various applied loads and the different performance from the mould-cast ones. In this study, a self-developed 3D printing system was invented and applied to fabricate concrete samples. Three points bending test and direct double shear test were carried out to investigate the mechanical properties of 3D printed prisms. The anisotropic behaviors were probed by loading in different directions. Meanwhile, piezoelectric lead zirconate titanate (PZT) transducers were implemented to monitor the damage evolution of the printed samples in the loading process based on the electromechanical impedance method. Test results demonstrate that the tensile stresses perpendicular to the weaken interfaces formed between filaments were prone to induce cracks than those parallel to the interfaces. The damages of concrete materials resulted in the decrease in the frequency and a change in the amplitude in the conductance spectrum acquired by mounted PZT patches. The admittance signatures showed a clear gradation of the examined damage levels of printed prisms exposed to applied loadings.
Stress-Strain Properties of SIFCON in Uniaxial Compression and Tension
1988-08-01
direction act as contacting beams whereas fibers aligned parallel to the loading direction act as individual columns . The combination of fiber-to-fiber...applicable to the study of SIFCON. These include such topics as the influence of strain rate on composite behavior, cyclic loading response, fiber-to-matrix...the specimen are shown in Figure 17. The grips consisted of self-clamping steel plates and a universal joint connection to the loading machine which
Dynamic Response of Acrylonitrile Butadiene Styrene Under Impact Loading (Open Access)
2016-03-16
of contraction and expansion was observed as the impact load was applied. Thismultistage deformation behavior may be attributable to the ring formed ...ABS fabricated by FDM. Results of the experimental characterization show that rasters formed parallel to the loading direction fabricated in the... formed using a solid ABS block to determine the mechanical property at various strain rates (Fig. 1). Through the analysis of the solid ABS, a linear
Parallel Computing Strategies for Irregular Algorithms
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Hunter, Eric J.; Titze, Ingo R.
2012-01-01
Objectives To quantify the recovery of voice following a 2-hour vocal loading exercise (oral reading). Methods 86 adult participants tracked their voice recovery using short vocal tasks and perceptual ratings after an initial vocal loading exercise and for the following two days. Results Short-term recovery was apparent with 90% recovery within 4-6 hours and full recovery at 12-18 hours. Recovery was shown to be similar to a dermal wound healing trajectory. Conclusions The new recovery trajectory highlighted by the vocal loading exercise in the current study is called a vocal recovery trajectory. By comparing vocal fatigue to dermal wound healing, this trajectory is parallel to a chronic wound healing trajectory (as opposed to an acute wound healing trajectory). This parallel suggests that vocal fatigue from the daily use of the voice could be treated as a chronic wound, with the healing and repair mechanisms in a state of constant repair. In addition, there is likely a vocal fatigue threshold at which point the level of tissue damage would shift the chronic healing trajectory to an acute healing trajectory. PMID:19663377
Simulation and control of a 20 kHz spacecraft power system
NASA Technical Reports Server (NTRS)
Wasynczuk, O.; Krause, P. C.
1988-01-01
A detailed computer representation of four Mapham inverters connected in a series, parallel arrangement has been implemented. System performance is illustrated by computer traces for the four Mapham inverters connected to a Litz cable with parallel resistance and dc receiver loads at the receiving end of the transmission cable. Methods of voltage control and load sharing between the inverters are demonstrated. Also, the detailed computer representation is used to design and to demonstrate the advantages of a feed-forward voltage control strategy. It is illustrated that with a computer simulation of this type, the performance and control of spacecraft power systems may be investigated with relative ease and facility.
Fiber-optically sensorized composite wing
NASA Astrophysics Data System (ADS)
Costa, Joannes M.; Black, Richard J.; Moslehi, Behzad; Oblea, Levy; Patel, Rona; Sotoudeh, Vahid; Abouzeida, Essam; Quinones, Vladimir; Gowayed, Yasser; Soobramaney, Paul; Flowers, George
2014-04-01
Electromagnetic interference (EMI) immune and light-weight, fiber-optic sensor based Structural Health Monitoring (SHM) will find increasing application in aerospace structures ranging from aircraft wings to jet engine vanes. Intelligent Fiber Optic Systems Corporation (IFOS) has been developing multi-functional fiber Bragg grating (FBG) sensor systems including parallel processing FBG interrogators combined with advanced signal processing for SHM, structural state sensing and load monitoring applications. This paper reports work with Auburn University on embedding and testing FBG sensor arrays in a quarter scale model of a T38 composite wing. The wing was designed and manufactured using fabric reinforced polymer matrix composites. FBG sensors were embedded under the top layer of the composite. Their positions were chosen based on strain maps determined by finite element analysis. Static and dynamic testing confirmed expected response from the FBGs. The demonstrated technology has the potential to be further developed into an autonomous onboard system to perform load monitoring, SHM and Non-Destructive Evaluation (NDE) of composite aerospace structures (wings and rotorcraft blades). This platform technology could also be applied to flight testing of morphing and aero-elastic control surfaces.
Model-Based Reasoning in Humans Becomes Automatic with Training.
Economides, Marcos; Kurth-Nelson, Zeb; Lübbert, Annika; Guitart-Masip, Marc; Dolan, Raymond J
2015-09-01
Model-based and model-free reinforcement learning (RL) have been suggested as algorithmic realizations of goal-directed and habitual action strategies. Model-based RL is more flexible than model-free but requires sophisticated calculations using a learnt model of the world. This has led model-based RL to be identified with slow, deliberative processing, and model-free RL with fast, automatic processing. In support of this distinction, it has recently been shown that model-based reasoning is impaired by placing subjects under cognitive load--a hallmark of non-automaticity. Here, using the same task, we show that cognitive load does not impair model-based reasoning if subjects receive prior training on the task. This finding is replicated across two studies and a variety of analysis methods. Thus, task familiarity permits use of model-based reasoning in parallel with other cognitive demands. The ability to deploy model-based reasoning in an automatic, parallelizable fashion has widespread theoretical implications, particularly for the learning and execution of complex behaviors. It also suggests a range of important failure modes in psychiatric disorders.
NASA Astrophysics Data System (ADS)
Podesto, B.; Lapointe, A.; Larose, G.; Robichaud, Y.; Vaillancourt, C.
1981-03-01
The design and construction of a Real-Time Digital Data Acquisition System (RTDDAS) to be used in substations for on-site recording and preprocessing load response data were included. The gathered data can be partially processed on site to compute the apparent, active and reactive powers, voltage and current rms values, and instantaneous values of phase voltages and currents. On-site processing capability is provided for rapid monitoring of the field data to ensure that the test setup is suitable. Production analysis of field data is accomplished off-line on a central computer from data recorded on a dual-density (800/1600) magnetic tape which is IBM-compatible. Parallel channels of data can be recorded at a variable rate from 480 to 9000 samples per second per channel. The RTDDAS is housed in a 9.1 m (30-ft) trailer which is shielded from electromagnetic interference and protected by isolators from switching surges. The test must sometimes be performed. Information pertaining to the installation, software operation, and maintenance is presented.
Stability of tapered and parallel-walled dental implants: A systematic review and meta-analysis.
Atieh, Momen A; Alsabeeha, Nabeel; Duncan, Warwick J
2018-05-15
Clinical trials have suggested that dental implants with a tapered configuration have improved stability at placement, allowing immediate placement and/or loading. The aim of this systematic review and meta-analysis was to evaluate the implant stability of tapered dental implants compared to standard parallel-walled dental implants. Applying the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement, randomized controlled trials (RCTs) were searched for in electronic databases and complemented by hand searching. The risk of bias was assessed using the Cochrane Collaboration's Risk of Bias tool and data were analyzed using statistical software. A total of 1199 studies were identified, of which, five trials were included with 336 dental implants in 303 participants. Overall meta-analysis showed that tapered dental implants had higher implant stability values than parallel-walled dental implants at insertion and 8 weeks but the difference was not statistically significant. Tapered dental implants had significantly less marginal bone loss compared to parallel-walled dental implants. No significant differences in implant failure rate were found between tapered and parallel-walled dental implants. There is limited evidence to demonstrate the effectiveness of tapered dental implants in achieving greater implant stability compared to parallel-walled dental implants. Superior short-term results in maintaining peri-implant marginal bone with tapered dental implants are possible. Further properly designed RCTs are required to endorse the supposed advantages of tapered dental implants in immediate loading protocol and other complex clinical scenarios. © 2018 Wiley Periodicals, Inc.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2014-10-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Efficient Kill-Save Ratios Ease Up the Cognitive Demands on Counterintuitive Moral Utilitarianism.
Trémolière, Bastien; Bonnefon, Jean-François
2014-07-01
The dual-process model of moral judgment postulates that utilitarian responses to moral dilemmas (e.g., accepting to kill one to save five) are demanding of cognitive resources. Here we show that utilitarian responses can become effortless, even when they involve to kill someone, as long as the kill-save ratio is efficient (e.g., 1 is killed to save 500). In Experiment 1, participants responded to moral dilemmas featuring different kill-save ratios under high or low cognitive load. In Experiments 2 and 3, participants responded at their own pace or under time pressure. Efficient kill-save ratios promoted utilitarian responding and neutered the effect of load or time pressure. We discuss whether this effect is more easily explained by a parallel-activation model or by a default-interventionist model. © 2014 by the Society for Personality and Social Psychology, Inc.
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Sözen, S; Çokgör, E U; Başaran, S Teksoy; Aysel, M; Akarsubaşı, A; Ergal, I; Kurt, H; Pala-Ozkok, I; Orhon, D
2014-05-01
The study investigated the effect of high substrate loading on substrate utilization kinetics, and changes inflicted on the composition of the microbial community in a superfast submerged membrane bioreactor. Submerged MBR was sequentially fed with a substrate mixture and acetate; its performance was monitored at steady-state, at extremely low sludge age values of 2.0, 1.0 and 0.5d, all adjusted to a single hydraulic retention time of 8.0 h. Each MBR run was repeated when substrate feeding was increased from 200 mg COD/L to 1000 mg COD/L. Substrate utilization kinetics was altered to significantly lower levels when the MBR was adjusted to higher substrate loadings. Molecular analysis of the biomass revealed that variable process kinetics could be correlated with parallel changes in the composition of the microbial community, mainly by a replacement mechanism, where newer species, better adapted to the new growth conditions, substituted others that are washed out from the system. Copyright © 2014 Elsevier Ltd. All rights reserved.
Gianico, A; Braguglia, C M; Cesarini, R; Mininni, G
2013-09-01
The performance of thermophilic digestion of waste activated sludge, either untreated or thermal pretreated, was evaluated through semi-continuous tests carried out at organic loading rates in the range of 1-3.7 kg VS/m(3)d. Although the thermal pretreatment at T=134 °C proved to be effective in solubilizing organic matter, no significant gain in organics degradation was observed. However, the digestion of pretreated sludge showed significant soluble COD removal (more than 55%) whereas no removal occurred in control reactors. The lower the initial sludge biodegradability, the higher the efficiency of thermal pretreated digestion was observed, in particular as regards higher biogas and methane production rates with respect to the parallel untreated sludge digestion. Heat balance of the combined thermal hydrolysis/thermophilic digestion process, applied on full-scale scenarios, showed positive values for direct combustion of methane. In case of combined heat and power generation, attractive electric energy recoveries were obtained, with a positive heat balance at high load. Copyright © 2013. Published by Elsevier Ltd.
Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P
2014-10-30
Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
Karpievitch, Yuliya V; Almeida, Jonas S
2006-01-01
Background Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. Results mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Conclusion Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet. PMID:16539707
Karpievitch, Yuliya V; Almeida, Jonas S
2006-03-15
Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet.
de Obaldia, Enrique Escobar; Jeong, Chanhue; Grunenfelder, Lessa Kay; Kisailus, David; Zavattieri, Pablo
2015-08-01
Many biomineralized organisms have evolved highly oriented nanostructures to perform specific functions. One key example is the abrasion-resistant rod-like microstructure found in the radular teeth of Chitons (Cryptochiton stelleri), a large mollusk. The teeth consist of a soft core and a hard shell that is abrasion resistant under extreme mechanical loads with which they are subjected during the scraping process. Such remarkable mechanical properties are achieved through a hierarchical arrangement of nanostructured magnetite rods surrounded with α-chitin. We present a combined biomimetic approach in which designs were analyzed with additive manufacturing, experiments, analytical and computational models to gain insights into the abrasion resistance and toughness of rod-like microstructures. Staggered configurations of hard hexagonal rods surrounded by thin weak interfacial material were printed, and mechanically characterized with a cube-corner indenter. Experimental results demonstrate a higher contact resistance and stiffness for the staggered alignments compared to randomly distributed fibrous materials. Moreover, we reveal an optimal rod aspect ratio that lead to an increase in the site-specific properties measured by indentation. Anisotropy has a significant effect (up to 50%) on the Young's modulus in directions parallel and perpendicular to the longitudinal axis of the rods, and 30% on hardness and fracture toughness. Optical microscopy suggests that energy is dissipated in the form of median cracks when the load is parallel to the rods and lateral cracks when the load is perpendicular to the rods. Computational models suggest that inelastic deformation of the rods at early stages of indentation can vary the resistance to penetration. As such, we found that the mechanical behavior of the system is influenced by interfacial shear strain which influences the lateral load transfer and therefore the spread of damage. This new methodology can help to elucidate the evolutionary designs of biomineralized microstructures and understand the tolerance to fracture and damage of chiton radular teeth. Copyright © 2015 Elsevier Ltd. All rights reserved.
Li, Kangkang; Yu, Hai; Feron, Paul; Tade, Moses; Wardhaugh, Leigh
2015-08-18
Using a rate-based model, we assessed the technical feasibility and energy performance of an advanced aqueous-ammonia-based postcombustion capture process integrated with a coal-fired power station. The capture process consists of three identical process trains in parallel, each containing a CO2 capture unit, an NH3 recycling unit, a water separation unit, and a CO2 compressor. A sensitivity study of important parameters, such as NH3 concentration, lean CO2 loading, and stripper pressure, was performed to minimize the energy consumption involved in the CO2 capture process. Process modifications of the rich-split process and the interheating process were investigated to further reduce the solvent regeneration energy. The integrated capture system was then evaluated in terms of the mass balance and the energy consumption of each unit. The results show that our advanced ammonia process is technically feasible and energy-competitive, with a low net power-plant efficiency penalty of 7.7%.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Zuwei; Zhao, Haibo, E-mail: klinsmannzhb@163.com; Zheng, Chuguang
2015-01-15
This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule providesmore » a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are demonstrated in a physically realistic Brownian coagulation case. The computational accuracy is validated with benchmark solution of discrete-sectional method. The simulation results show that the comprehensive approach can attain very favorable improvement in cost without sacrificing computational accuracy.« less
NASA Technical Reports Server (NTRS)
Kimnach, Greg L.; Lebron, Ramon C.; Fox, David A.
1999-01-01
The John H. Glenn Research Center at Lewis Field (GRC) in Cleveland, OH and the Sundstrand Corporation in Rockford, IL have designed and developed an Engineering Model (EM) Electrical Power Control Unit (EPCU) for the Fluids Combustion Facility, (FCF) experiments to be flown on the International Space Station (ISS). The EPCU will be used as the power interface to the ISS power distribution system for the FCF's space experiments'test and telemetry hardware. Furthermore. it is proposed to be the common power interface for all experiments. The EPCU is a three kilowatt 12OVdc-to-28Vdc converter utilizing three independent Power Converter Units (PCUs), each rated at 1kWe (36Adc @ 28Vdc) which are paralleled and synchronized. Each converter may be fed from one of two ISS power channels. The 28Vdc loads are connected to the EPCU output via 48 solid-state and current-limiting switches, rated at 4Adc each. These switches may be paralleled to supply any given load up to the 108Adc normal operational limit of the paralleled converters. The EPCU was designed in this manner to maximize allocated-power utilization. to shed loads autonomously, to provide fault tolerance. and to provide a flexible power converter and control module to meet various ISS load demands. Tests of the EPCU in the Power Systems Facility testbed at GRC reveal that the overall converted-power efficiency, is approximately 89% with a nominal-input voltage of 12OVdc and a total load in the range of 4O% to 110% rated 28Vdc load. (The PCUs alone have an efficiency of approximately 94.5%). Furthermore, the EM unit passed all flight-qualification level (and beyond) vibration tests, passed ISS EMI (conducted, radiated. and susceptibility) requirements. successfully operated for extended periods in a thermal/vacuum chamber, was integrated with a proto-flight experiment and passed all stability and functional requirements.
Parallel Execution of Functional Mock-up Units in Buildings Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ozmen, Ozgur; Nutaro, James J.; New, Joshua Ryan
2016-06-30
A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported asmore » a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.« less
Fine-grained parallel RNAalifold algorithm for RNA secondary structure prediction on FPGA
Xia, Fei; Dou, Yong; Zhou, Xingming; Yang, Xuejun; Xu, Jiaqing; Zhang, Yang
2009-01-01
Background In the field of RNA secondary structure prediction, the RNAalifold algorithm is one of the most popular methods using free energy minimization. However, general-purpose computers including parallel computers or multi-core computers exhibit parallel efficiency of no more than 50%. Field Programmable Gate-Array (FPGA) chips provide a new approach to accelerate RNAalifold by exploiting fine-grained custom design. Results RNAalifold shows complicated data dependences, in which the dependence distance is variable, and the dependence direction is also across two dimensions. We propose a systolic array structure including one master Processing Element (PE) and multiple slave PEs for fine grain hardware implementation on FPGA. We exploit data reuse schemes to reduce the need to load energy matrices from external memory. We also propose several methods to reduce energy table parameter size by 80%. Conclusion To our knowledge, our implementation with 16 PEs is the only FPGA accelerator implementing the complete RNAalifold algorithm. The experimental results show a factor of 12.2 speedup over the RNAalifold (ViennaPackage – 1.6.5) software for a group of aligned RNA sequences with 2981-residue running on a Personal Computer (PC) platform with Pentium 4 2.6 GHz CPU. PMID:19208138
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
NASA Astrophysics Data System (ADS)
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.
NASA Astrophysics Data System (ADS)
Cho, In Ho
For the last few decades, we have obtained tremendous insight into underlying microscopic mechanisms of degrading quasi-brittle materials from persistent and near-saintly efforts in laboratories, and at the same time we have seen unprecedented evolution in computational technology such as massively parallel computers. Thus, time is ripe to embark on a novel approach to settle unanswered questions, especially for the earthquake engineering community, by harmoniously combining the microphysics mechanisms with advanced parallel computing technology. To begin with, it should be stressed that we placed a great deal of emphasis on preserving clear meaning and physical counterparts of all the microscopic material models proposed herein, since it is directly tied to the belief that by doing so, the more physical mechanisms we incorporate, the better prediction we can obtain. We departed from reviewing representative microscopic analysis methodologies, selecting out "fixed-type" multidirectional smeared crack model as the base framework for nonlinear quasi-brittle materials, since it is widely believed to best retain the physical nature of actual cracks. Microscopic stress functions are proposed by integrating well-received existing models to update normal stresses on the crack surfaces (three orthogonal surfaces are allowed to initiate herein) under cyclic loading. Unlike the normal stress update, special attention had to be paid to the shear stress update on the crack surfaces, due primarily to the well-known pathological nature of the fixed-type smeared crack model---spurious large stress transfer over the open crack under nonproportional loading. In hopes of exploiting physical mechanism to resolve this deleterious nature of the fixed crack model, a tribology-inspired three-dimensional (3d) interlocking mechanism has been proposed. Following the main trend of tribology (i.e., the science and engineering of interacting surfaces), we introduced the base fabric of solid particle-soft matrix to explain realistic interlocking over rough crack surfaces, and the adopted Gaussian distribution feeds random particle sizes to the entire domain. Validation against a well-documented rough crack experiment reveals promising accuracy of the proposed 3d interlocking model. A consumed energy-based damage model has been proposed for the weak correlation between the normal and shear stresses on the crack surfaces, and also for describing the nature of irrecoverable damage. Since the evaluation of the consumed energy is directly linked to the microscopic deformation, which can be efficiently tracked on the crack surfaces, the proposed damage model is believed to provide a more physical interpretation than existing damage mechanics, which fundamentally stem from mathematical derivation with few physical counterparts. Another novel point of the present work lies in the topological transition-based "smart" steel bar model, notably with evolving compressive buckling length. We presented a systematic framework of information flow between the key ingredients of composite materials (i.e., steel bar and its surrounding concrete elements). The smart steel model suggested can incorporate smooth transition during reversal loading, tensile rupture, early buckling after reversal from excessive tensile loading, and even compressive buckling. Especially, the buckling length is made to evolve according to the damage states of the surrounding elements of each bar, while all other dominant models leave the length unchanged. What lies behind all the aforementioned novel attempts is, of course, the problem-optimized parallel platform. In fact, the parallel computing in our field has been restricted to monotonic shock or blast loading with explicit algorithm which is characteristically feasible to be parallelized. In the present study, efficient parallelization strategies for the highly demanding implicit nonlinear finite element analysis (FEA) program for real-scale reinforced concrete (RC) structures under cyclic loading are proposed. Quantitative comparison of state-of-the-art parallel strategies, in terms of factorization, had been carried out, leading to the problem-optimized solver, which is successfully embracing the penalty method and banded nature. Particularly, the penalty method employed imparts considerable smoothness to the global response, which yields a practical superiority of the parallel triangular system solver over other advanced solvers such as parallel preconditioned conjugate gradient method. Other salient issues on parallelization are also addressed. The parallel platform established offers unprecedented access to simulations of real-scale structures, giving new understanding about the physics-based mechanisms adopted and probabilistic randomness at the entire system level. Particularly, the platform enables bold simulations of real-scale RC structures exposed to cyclic loading---H-shaped wall system and 4-story T-shaped wall system. The simulations show the desired capability of accurate prediction of global force-displacement responses, postpeak softening behavior, and compressive buckling of longitudinal steel bars. It is fascinating to see that intrinsic randomness of the 3d interlocking model appears to cause "localized" damage of the real-scale structures, which is consistent with reported observations in different fields such as granular media. Equipped with accuracy, stability and scalability as demonstrated so far, the parallel platform is believed to serve as a fertile ground for the introducing of further physical mechanisms into various research fields as well as the earthquake engineering community. In the near future, it can be further expanded to run in concert with reliable FEA programs such as FRAME3d or OPENSEES. Following the central notion of "multiscale" analysis technique, actual infrastructures exposed to extreme natural hazard can be successfully tackled by this next generation analysis tool---the harmonious union of the parallel platform and a general FEA program. At the same time, any type of experiments can be easily conducted by this "virtual laboratory."
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
Static versus dynamic fracturing in shallow carbonate fault zones
NASA Astrophysics Data System (ADS)
Fondriest, M.; Doan, M. L.; Aben, F. M.; Fusseis, F.; Mitchell, T. M.; Di Toro, G.
2015-12-01
Moderate to large earthquakes often nucleate within and propagate through carbonates in the shallow crust, therefore several field and experimental studies were recently aimed to constrain earthquake-related deformation processes within carbonate fault rocks. In particular, the occurrence of thick belts (10-100s m) of low-strain fault-related breccias (average size of rock fragments >1 cm), which is relatively common within carbonate damage zones, was generally interpreted as resulting from the quasi-static growth of fault zones rather than from the cumulative effect of multiple earthquake ruptures. Here we report the occurrence of up to hundreds of meters thick belts of intensely fragmented dolostones along the major transpressive Foiana Fault Zone (Italian Southern Alps) which was exhumed from < 2 km depth. Such dolostones are reduced into fragments ranging from few centimeters down to few millimeters in size with ultrafine-grained layers in proximity to the principal slip zones. Preservation of the original bedding indicates a lack of significant shear strain in the fragmented dolostones which seem to have been shattered in situ. To investigate the origin of the in-situ shattered rocks, the host dolostones were deformed in uniaxial compression both under quasi-static loading (strain rate ~10-3 s-1) and dynamic loading (strain rate >50 s-1). Dolostones deformed up to failure under low-strain rate were affected by single to multiple discrete (i.e. not interconnected) extensional fractures sub-parallel to the loading direction. Dolostones deformed under high-strain rate were shattered above a strain rate threshold of ~200 s-1(strain >1.2%) while they were split in few fragments or were macroscopically intact for lower strain rates. Experimentally shattered dolostones were reduced into a non-cohesive material with most rock fragments a few millimeters in size and elongated parallel to the loading direction. Fracture networks were investigated by X-ray microtomography showing that low- and high-strain rate damage patterns are different with the latter being similar to that of natural in-situ shattered dolostones. In-situ shattered dolostones are thus interpreted as the product of off-fault dynamic stress wave loading and can potentially be used to constrain coseismic energy release in fault zones.
Multi-Modulator for Bandwidth-Efficient Communication
NASA Technical Reports Server (NTRS)
Gray, Andrew; Lee, Dennis; Lay, Norman; Cheetham, Craig; Fong, Wai; Yeh, Pen-Shu; King, Robin; Ghuman, Parminder; Hoy, Scott; Fisher, Dave
2009-01-01
A modulator circuit board has recently been developed to be used in conjunction with a vector modulator to generate any of a large number of modulations for bandwidth-efficient radio transmission of digital data signals at rates than can exceed 100 Mb/s. The modulations include quadrature phaseshift keying (QPSK), offset quadrature phase-shift keying (OQPSK), Gaussian minimum-shift keying (GMSK), and octonary phase-shift keying (8PSK) with square-root raised-cosine pulse shaping. The figure is a greatly simplified block diagram showing the relationship between the modulator board and the rest of the transmitter. The role of the modulator board is to encode the incoming data stream and to shape the resulting pulses, which are fed as inputs to the vector modulator. The combination of encoding and pulse shaping in a given application is chosen to maximize the bandwidth efficiency. The modulator board includes gallium arsenide serial-to-parallel converters at its input end. A complementary metal oxide/semiconductor (CMOS) field-programmable gate array (FPGA) performs the coding and modulation computations and utilizes parallel processing in doing so. The results of the parallel computation are combined and converted to pulse waveforms by use of gallium arsenide parallel-to-serial converters integrated with digital-to-analog converters. Without changing the hardware, one can configure the modulator to produce any of the designed combinations of coding and modulation by loading the appropriate bit configuration file into the FPGA.
Computer Science Techniques Applied to Parallel Atomistic Simulation
NASA Astrophysics Data System (ADS)
Nakano, Aiichiro
1998-03-01
Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
NASA Astrophysics Data System (ADS)
Dettmer, J.; Quijano, J. E.; Dosso, S. E.; Holland, C. W.; Mandolesi, E.
2016-12-01
Geophysical seabed properties are important for the detection and classification of unexploded ordnance. However, current surveying methods such as vertical seismic profiling, coring, or inversion are of limited use when surveying large areas with high spatial sampling density. We consider surveys based on a source and receiver array towed by an autonomous vehicle which produce large volumes of seabed reflectivity data that contain unprecedented and detailed seabed information. The data are analyzed with a particle filter, which requires efficient reflection-coefficient computation, efficient inversion algorithms and efficient use of computer resources. The filter quantifies information content of multiple sequential data sets by considering results from previous data along the survey track to inform the importance sampling at the current point. Challenges arise from environmental changes along the track where the number of sediment layers and their properties change. This is addressed by a trans-dimensional model in the filter which allows layering complexity to change along a track. Efficiency is improved by likelihood tempering of various particle subsets and including exchange moves (parallel tempering). The filter is implemented on a hybrid computer that combines central processing units (CPUs) and graphics processing units (GPUs) to exploit three levels of parallelism: (1) fine-grained parallel computation of spherical reflection coefficients with a GPU implementation of Levin integration; (2) updating particles by concurrent CPU processes which exchange information using automatic load balancing (coarse grained parallelism); (3) overlapping CPU-GPU communication (a major bottleneck) with GPU computation by staggering CPU access to the multiple GPUs. The algorithm is applied to spherical reflection coefficients for data sets along a 14-km track on the Malta Plateau, Mediterranean Sea. We demonstrate substantial efficiency gains over previous methods. [This research was supported in part by the U.S. Dept of Defense, thought the Strategic Environmental Research and Development Program (SERDP).
NASA Astrophysics Data System (ADS)
Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto
2012-11-01
In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
Processing Device for High-Speed Execution of an Xrisc Computer Program
NASA Technical Reports Server (NTRS)
Ng, Tak-Kwong (Inventor); Mills, Carl S. (Inventor)
2016-01-01
A processing device for high-speed execution of a computer program is provided. A memory module may store one or more computer programs. A sequencer may select one of the computer programs and controls execution of the selected program. A register module may store intermediate values associated with a current calculation set, a set of output values associated with a previous calculation set, and a set of input values associated with a subsequent calculation set. An external interface may receive the set of input values from a computing device and provides the set of output values to the computing device. A computation interface may provide a set of operands for computation during processing of the current calculation set. The set of input values are loaded into the register and the set of output values are unloaded from the register in parallel with processing of the current calculation set.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Solli, Linn, E-mail: linn.solli@bioforsk.no; Bergersen, Ove; Sørheim, Roald
2014-08-15
Highlights: • New results from continuous anaerobic co-digestion of fish waste silage (FWS) and cow manure (CM). • Co-digestion of FWS and CM has a high biogas potential. • Optimal mixing ratio of FWS/CM is 13–16/87–84 volume%. • High input of FWS leads to accumulation of NH4+ and VFAs and process failure. - Abstract: This study examined the effects of an increased load of nitrogen-rich organic material on anaerobic digestion and methane production. Co-digestion of fish waste silage (FWS) and cow manure (CM) was studied in two parallel laboratory-scale (8 L effective volume) semi-continuous stirred tank reactors (designated R1 andmore » R2). A reactor fed with CM only (R0) was used as control. The reactors were operated in the mesophilic range (37 °C) with a hydraulic retention time of 30 days, and the entire experiment lasted for 450 days. The rate of organic loading was raised by increasing the content of FWS in the feed stock. During the experiment, the amount (volume%) of FWS was increased stepwise in the following order: 3% – 6% – 13% – 16%, and 19%. Measurements of methane production, and analysis of volatile fatty acids, ammonium and pH in the effluents were carried out. The highest methane production from co-digestion of FWS and CM was 0.400 L CH4 gVS{sup −1}, obtained during the period with loading of 16% FWS in R2. Compared to anaerobic digestion of CM only, the methane production was increased by 100% at most, when FWS was added to the feed stock. The biogas processes failed in R1 and R2 during the periods, with loadings of 16% and 19% FWS, respectively. In both reactors, the biogas processes failed due to overloading and accumulation of ammonia and volatile fatty acids.« less
GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.
Hess, Berk; Kutzner, Carsten; van der Spoel, David; Lindahl, Erik
2008-03-01
Molecular simulation is an extremely useful, but computationally very expensive tool for studies of chemical and biomolecular systems. Here, we present a new implementation of our molecular simulation toolkit GROMACS which now both achieves extremely high performance on single processors from algorithmic optimizations and hand-coded routines and simultaneously scales very well on parallel machines. The code encompasses a minimal-communication domain decomposition algorithm, full dynamic load balancing, a state-of-the-art parallel constraint solver, and efficient virtual site algorithms that allow removal of hydrogen atom degrees of freedom to enable integration time steps up to 5 fs for atomistic simulations also in parallel. To improve the scaling properties of the common particle mesh Ewald electrostatics algorithms, we have in addition used a Multiple-Program, Multiple-Data approach, with separate node domains responsible for direct and reciprocal space interactions. Not only does this combination of algorithms enable extremely long simulations of large systems but also it provides that simulation performance on quite modest numbers of standard cluster nodes.
Partitioning in parallel processing of production systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oflazer, K.
1987-01-01
This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less
Parallel processing considerations for image recognition tasks
NASA Astrophysics Data System (ADS)
Simske, Steven J.
2011-01-01
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
Optimization under uncertainty of parallel nonlinear energy sinks
NASA Astrophysics Data System (ADS)
Boroson, Ethan; Missoum, Samy; Mattei, Pierre-Olivier; Vergez, Christophe
2017-04-01
Nonlinear Energy Sinks (NESs) are a promising technique for passively reducing the amplitude of vibrations. Through nonlinear stiffness properties, a NES is able to passively and irreversibly absorb energy. Unlike the traditional Tuned Mass Damper (TMD), NESs do not require a specific tuning and absorb energy over a wider range of frequencies. Nevertheless, they are still only efficient over a limited range of excitations. In order to mitigate this limitation and maximize the efficiency range, this work investigates the optimization of multiple NESs configured in parallel. It is well known that the efficiency of a NES is extremely sensitive to small perturbations in loading conditions or design parameters. In fact, the efficiency of a NES has been shown to be nearly discontinuous in the neighborhood of its activation threshold. For this reason, uncertainties must be taken into account in the design optimization of NESs. In addition, the discontinuities require a specific treatment during the optimization process. In this work, the objective of the optimization is to maximize the expected value of the efficiency of NESs in parallel. The optimization algorithm is able to tackle design variables with uncertainty (e.g., nonlinear stiffness coefficients) as well as aleatory variables such as the initial velocity of the main system. The optimal design of several parallel NES configurations for maximum mean efficiency is investigated. Specifically, NES nonlinear stiffness properties, considered random design variables, are optimized for cases with 1, 2, 3, 4, 5, and 10 NESs in parallel. The distributions of efficiency for the optimal parallel configurations are compared to distributions of efficiencies of non-optimized NESs. It is observed that the optimization enables a sharp increase in the mean value of efficiency while reducing the corresponding variance, thus leading to more robust NES designs.
Pruthi, Varun; Talwar, Sangeeta; Nawal, Ruchika Roongta; Pruthi, Preeti Jain; Choudhary, Sarika; Yadav, Seema
2018-01-01
The aim of this study was to evaluate retention & fracture resistance of different fibre posts. 90 extracted human permanent maxillary central incisors were used in this study. For retention evaluation, after obturation, post space preparation was done in all root canals and posts were cemented under three groups. Later, the posts were grasped & pulled out from the roots with the help of a three-jaw chuck at a cross-head speed of 5mm/min. Force required to dislodge each post was recorded in Newtons. To evaluate the fracture behavior of posts, artificial root canals were drilled into aluminium blocks and posts were cemented. Load required to fracture each post was recorded in Newtons. The results of the present study show the mean retention values for Fibrekleer Parallel post were significantly greater than those for Synca Double tapered post & Bioloren Tapered post. The mean retention values of the Double tapered post & the tapered post were not statistically different. The Synca Double tapered post had the highest mean load to fracture, and this value was significantly higher than those of FibreKleer Parallel & Bioloren Tapered post. The mean fracture resistance values of Parallel & tapered post were not statistically different. This study showed parallel posts to have better retention than tapered and double tapered posts. Regarding the fracture resistance, double tapered posts were found to be better than parallel and tapered posts.
Kilic, Arzu; Sahinkaya, Erkan; Cinar, Ozer
2014-01-01
Kinetics of sulphur-limestone autotrophic denitrification process in batch assays and the impact of sulphur/limestone ratio on the process performance in long-term operated packed-bed bioreactors were evaluated. The specific nitrate and nitrite reduction rates increased almost linearly with the increasing initial nitrate and nitrite concentrations, respectively. The process performance was evaluated in three parallel packed-bed bioreactors filled with different sulphur/limestone ratios (1:1, 2:1 and 3:1, v/v). Performances of the bioreactors were studied under varying nitrate loadings (0.05 - 0.80 gNO(-)(3) - NL⁻¹ d⁻¹) and hydraulic retention times (3-12 h). The maximum nitrate reduction rate of 0.66 g L⁻¹ d⁻¹ was observed at the loading rate of 0.80 g NO(-)(3) - N L⁻¹ d⁻¹ in the reactor with sulphur/limestone ratio of 3:1. Throughout the study, nitrite concentrations remained quite low (i.e. below 0.5 mg L⁻¹ NO(-)(2) -N. The reactor performance increased in the order of sulphur/limestone ratio of 3:1, 2:1 and 1:1. Denaturing gradient gel electrophoresis analysis of 16S rRNA genes showed quite stable communities in the reactors with the presence of Methylo virgulaligni, Sulfurimonas autotrophica, Sulfurovum lithotrophicum, Thiobacillus aquaesulis and Sulfurimonas autotrophica related species.
Energy to the Edge (E2E) U.S. Army Rapid Equipping Force
2014-03-21
generators, parallel multiple sources, prioritize loads, and balance loads. Smart grids are based on complex algorithms and controls. 3. Reduce...stations are not able to be s rviced by prim power because of their location in the middle of a very active airfield and fueling a syst m that c ist
Treshow, M.
1960-08-16
A device for loading and unloading fuel rods into and from a reactor tank through an access hole includes parallel links carrying a gripper. These links enable the gripper to go through the access hole and then to be moved laterally from the axis of the access hole to the various locations of the fuel rods in the reactor tank.
An Unsolved Electric Circuit: A Common Misconception
ERIC Educational Resources Information Center
Harsha, N. R. Sree; Sreedevi, A.; Prakash, Anupama
2015-01-01
Despite a number of theories in circuit analysis, little is known about the behaviour of ideal equal voltage sources in parallel, connected across a resistive load. We neither have any theory that can predict the voltage source that provides the load current, nor is there any method to test it experimentally. In a series of experiments performed…
Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing
NASA Astrophysics Data System (ADS)
Adnan; Sato, Mitsuhisa
Lazy-task creation is an efficient method of overcoming the overhead of the grain-size problem in parallel computing. Work stealing is an effective load balancing strategy for parallel computing. In this paper, we present dynamic work stealing strategies in a lazy-task creation technique for efficient fine-grain task scheduling. The basic idea is to control load balancing granularity depending on the number of task parents in a stack. The dynamic-length strategy of work stealing uses run-time information, which is information on the load of the victim, to determine the number of tasks that a thief is allowed to steal. We compare it with the bottommost first work stealing strategy used in StackThread/MP, and the fixed-length strategy of work stealing, where a thief requests to steal a fixed number of tasks, as well as other multithreaded frameworks such as Cilk and OpenMP task implementations. The experiments show that the dynamic-length strategy of work stealing performs well in irregular workloads such as in UTS benchmarks, as well as in regular workloads such as Fibonacci, Strassen's matrix multiplication, FFT, and Sparse-LU factorization. The dynamic-length strategy works better than the fixed-length strategy because it is more flexible than the latter; this strategy can avoid load imbalance due to overstealing.
Adaptive mesh refinement and load balancing based on multi-level block-structured Cartesian mesh
NASA Astrophysics Data System (ADS)
Misaka, Takashi; Sasaki, Daisuke; Obayashi, Shigeru
2017-11-01
We developed a framework for a distributed-memory parallel computer that enables dynamic data management for adaptive mesh refinement and load balancing. We employed simple data structure of the building cube method (BCM) where a computational domain is divided into multi-level cubic domains and each cube has the same number of grid points inside, realising a multi-level block-structured Cartesian mesh. Solution adaptive mesh refinement, which works efficiently with the help of the dynamic load balancing, was implemented by dividing cubes based on mesh refinement criteria. The framework was investigated with the Laplace equation in terms of adaptive mesh refinement, load balancing and the parallel efficiency. It was then applied to the incompressible Navier-Stokes equations to simulate a turbulent flow around a sphere. We considered wall-adaptive cube refinement where a non-dimensional wall distance y+ near the sphere is used for a criterion of mesh refinement. The result showed the load imbalance due to y+ adaptive mesh refinement was corrected by the present approach. To utilise the BCM framework more effectively, we also tested a cube-wise algorithm switching where an explicit and implicit time integration schemes are switched depending on the local Courant-Friedrichs-Lewy (CFL) condition in each cube.
NASA Technical Reports Server (NTRS)
Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
Numerical computation of solar neutrino flux attenuated by the MSW mechanism
NASA Astrophysics Data System (ADS)
Kim, Jai Sam; Chae, Yoon Sang; Kim, Jung Dae
1999-07-01
We compute the survival probability of an electron neutrino in its flight through the solar core experiencing the Mikheyev-Smirnov-Wolfenstein effect with all three neutrino species considered. We adopted a hybrid method that uses an accurate approximation formula in the non-resonance region and numerical integration in the non-adiabatic resonance region. The key of our algorithm is to use the importance sampling method for sampling the neutrino creation energy and position and to find the optimum radii to start and stop numerical integration. We further developed a parallel algorithm for a message passing parallel computer. By using an idea of job token, we have developed a dynamical load balancing mechanism which is effective under any irregular load distributions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajbhandari, Samyam; NIkam, Akshay; Lai, Pai-Wei
Tensor contractions represent the most compute-intensive core kernels in ab initio computational quantum chemistry and nuclear physics. Symmetries in these tensor contractions makes them difficult to load balance and scale to large distributed systems. In this paper, we develop an efficient and scalable algorithm to contract symmetric tensors. We introduce a novel approach that avoids data redistribution in contracting symmetric tensors while also avoiding redundant storage and maintaining load balance. We present experimental results on two parallel supercomputers for several symmetric contractions that appear in the CCSD quantum chemistry method. We also present a novel approach to tensor redistribution thatmore » can take advantage of parallel hyperplanes when the initial distribution has replicated dimensions, and use collective broadcast when the final distribution has replicated dimensions, making the algorithm very efficient.« less
Evaluating SPLASH-2 Applications Using MapReduce
NASA Astrophysics Data System (ADS)
Zhu, Shengkai; Xiao, Zhiwei; Chen, Haibo; Chen, Rong; Zhang, Weihua; Zang, Binyu
MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others.
Optimum parallel step-sector bearing lubricated with an incompressible fluid
NASA Technical Reports Server (NTRS)
Hamrock, B. J.
1983-01-01
The dimensionless parameters normally associated with a step sector thrust bearing are the film thickness ratio, the dimensionless step location, the number of sectors, the radius ratio, and the angular extent of the lubrication feed groove. The optimum number of sectors and the parallel step configuration for a step sector thrust bearing while considering load capacity or stiffness and assuming an incompressible fluid are presented.
Characterizing parallel file-access patterns on a large-scale multiprocessor
NASA Technical Reports Server (NTRS)
Purakayastha, Apratim; Ellis, Carla Schlatter; Kotz, David; Nieuwejaar, Nils; Best, Michael
1994-01-01
Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load conditions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.
2016-05-11
the phases of the system load and ground, so to size the voltage divider appropriately Vsys is set equal to the maximum phase-to-ground voltage. The...civilian and military systems is increasing due to technological improvements in power conversion and changing requirements in system loads. The development...of high-power pulsed loads on naval platforms, such as the Laser Weapon System (LaWS) and the electromagnetic railgun, calls for the ability to
Dynamic load balancing algorithm for molecular dynamics based on Voronoi cells domain decompositions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fattebert, J.-L.; Richards, D.F.; Glosli, J.N.
2012-12-01
We present a new algorithm for automatic parallel load balancing in classical molecular dynamics. It assumes a spatial domain decomposition of particles into Voronoi cells. It is a gradient method which attempts to minimize a cost function by displacing Voronoi sites associated with each processor/sub-domain along steepest descent directions. Excellent load balance has been obtained for quasi-2D and 3D practical applications, with up to 440·10 6 particles on 65,536 MPI tasks.
Microfluidic squeezing for intracellular antigen loading in polyclonal B-cells as cellular vaccines
NASA Astrophysics Data System (ADS)
Lee Szeto, Gregory; van Egeren, Debra; Worku, Hermoon; Sharei, Armon; Alejandro, Brian; Park, Clara; Frew, Kirubel; Brefo, Mavis; Mao, Shirley; Heimann, Megan; Langer, Robert; Jensen, Klavs; Irvine, Darrell J.
2015-05-01
B-cells are promising candidate autologous antigen-presenting cells (APCs) to prime antigen-specific T-cells both in vitro and in vivo. However to date, a significant barrier to utilizing B-cells as APCs is their low capacity for non-specific antigen uptake compared to “professional” APCs such as dendritic cells. Here we utilize a microfluidic device that employs many parallel channels to pass single cells through narrow constrictions in high throughput. This microscale “cell squeezing” process creates transient pores in the plasma membrane, enabling intracellular delivery of whole proteins from the surrounding medium into B-cells via mechano-poration. We demonstrate that both resting and activated B-cells process and present antigens delivered via mechano-poration exclusively to antigen-specific CD8+T-cells, and not CD4+T-cells. Squeezed B-cells primed and expanded large numbers of effector CD8+T-cells in vitro that produced effector cytokines critical to cytolytic function, including granzyme B and interferon-γ. Finally, antigen-loaded B-cells were also able to prime antigen-specific CD8+T-cells in vivo when adoptively transferred into mice. Altogether, these data demonstrate crucial proof-of-concept for mechano-poration as an enabling technology for B-cell antigen loading, priming of antigen-specific CD8+T-cells, and decoupling of antigen uptake from B-cell activation.
Szilágyi, N; Kovács, R; Kenyeres, I; Csikor, Zs
2013-01-01
Biofilm development in a fixed bed biofilm reactor system performing municipal wastewater treatment was monitored aiming at accumulating colonization and maximum biofilm mass data usable in engineering practice for process design purposes. Initially a 6 month experimental period was selected for investigations where the biofilm formation and the performance of the reactors were monitored. The results were analyzed by two methods: for simple, steady-state process design purposes the maximum biofilm mass on carriers versus influent load and a time constant of the biofilm growth were determined, whereas for design approaches using dynamic models a simple biofilm mass prediction model including attachment and detachment mechanisms was selected and fitted to the experimental data. According to a detailed statistical analysis, the collected data have not allowed us to determine both the time constant of biofilm growth and the maximum biofilm mass on carriers at the same time. The observed maximum biofilm mass could be determined with a reasonable error and ranged between 438 gTS/m(2) carrier surface and 843 gTS/m(2), depending on influent load, and hydrodynamic conditions. The parallel analysis of the attachment-detachment model showed that the experimental data set allowed us to determine the attachment rate coefficient which was in the range of 0.05-0.4 m d(-1) depending on influent load and hydrodynamic conditions.
Voronoi Tessellation for reducing the processing time of correlation functions
NASA Astrophysics Data System (ADS)
Cárdenas-Montes, Miguel; Sevilla-Noarbe, Ignacio
2018-01-01
The increase of data volume in Cosmology is motivating the search of new solutions for solving the difficulties associated with the large processing time and precision of calculations. This is specially true in the case of several relevant statistics of the galaxy distribution of the Large Scale Structure of the Universe, namely the two and three point angular correlation functions. For these, the processing time has critically grown with the increase of the size of the data sample. Beyond parallel implementations to overcome the barrier of processing time, space partitioning algorithms are necessary to reduce the computational load. These can delimit the elements involved in the correlation function estimation to those that can potentially contribute to the final result. In this work, Voronoi Tessellation is used to reduce the processing time of the two-point and three-point angular correlation functions. The results of this proof-of-concept show a significant reduction of the processing time when preprocessing the galaxy positions with Voronoi Tessellation.
An Optimized Control for LLC Resonant Converter with Wide Load Range
NASA Astrophysics Data System (ADS)
Xi, Xia; Qian, Qinsong
2017-05-01
This paper presents an optimized control which makes LLC resonant converters operate with a wider load range and provides good closed-loop performance. The proposed control employs two paralleled digital compensations to guarantee the good closed-loop performance in a wide load range during the steady state, an optimized trajectory control will take over to change the gate-driving signals immediately at the load transients. Finally, the proposed control has been implemented and tested on a 150W 200kHz 400V/24V LLC resonant converter and the result validates the proposed method.
NASA Technical Reports Server (NTRS)
Smith, Damon C. (Inventor)
2005-01-01
An exercise device 10 is particularly well suited for use in low gravity environments, and includes a frame 12 with plurality of resistance elements 30,82 supported in parallel on the frame. A load transfer member 20 is moveable relative to the frame for transferring the applied force to the free end of each captured resistance element. Load selection template 14 is removably secured both to the load transfer member, and a plurality of capture mechanisms engage the free end of corresponding resistance elements. The force applying mechanism 53 may be a handle, harness or other user interface for applying a force to move the load transfer member.
Izquierdo, M; González-Badillo, J J; Häkkinen, K; Ibáñez, J; Kraemer, W J; Altadill, A; Eslava, J; Gorostiaga, E M
2006-09-01
The purpose of this study was to examine the effect of different loads on repetition speed during single sets of repetitions to failure in bench press and parallel squat. Thirty-six physical active men performed 1-repetition maximum in a bench press (1 RM (BP)) and half squat position (1 RM (HS)), and performed maximal power-output continuous repetition sets randomly every 10 days until failure with a submaximal load (60 %, 65 %, 70 %, and 75 % of 1RM, respectively) during bench press and parallel squat. Average velocity of each repetition was recorded by linking a rotary encoder to the end part of the bar. The values of 1 RM (BP) and 1 RM (HS) were 91 +/- 17 and 200 +/- 20 kg, respectively. The number of repetitions performed for a given percentage of 1RM was significantly higher (p < 0.001) in half squat than in bench press performance. Average repetition velocity decreased at a greater rate in bench press than in parallel squat. The significant reductions observed in the average repetition velocity (expressed as a percentage of the average velocity achieved during the initial repetition) were observed at higher percentage of the total number of repetitions performed in parallel squat (48 - 69 %) than in bench press (34 - 40 %) actions. The major finding in this study was that, for a given muscle action (bench press or parallel squat), the pattern of reduction in the relative average velocity achieved during each repetition and the relative number of repetitions performed was the same for all percentages of 1RM tested. However, relative average velocity decreased at a greater rate in bench press than in parallel squat performance. This would indicate that in bench press the significant reductions observed in the average repetition velocity occurred when the number of repetitions was over one third (34 %) of the total number of repetitions performed, whereas in parallel squat it was nearly one half (48 %). Conceptually, this would indicate that for a given exercise (bench press or squat) and percentage of maximal dynamic strength (1RM), the pattern of velocity decrease can be predicted over a set of repetitions, so that a minimum repetition threshold to ensure maximal speed performance is determined.
Breaking Barriers to Low-Cost Modular Inverter Production & Use
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bogdan Borowy; Leo Casey; Jerry Foshage
2005-05-31
The goal of this cost share contract is to advance key technologies to reduce size, weight and cost while enhancing performance and reliability of Modular Inverter Product for Distributed Energy Resources (DER). Efforts address technology development to meet technical needs of DER market protection, isolation, reliability, and quality. Program activities build on SatCon Technology Corporation inverter experience (e.g., AIPM, Starsine, PowerGate) for Photovoltaic, Fuel Cell, Energy Storage applications. Efforts focused four technical areas, Capacitors, Cooling, Voltage Sensing and Control of Parallel Inverters. Capacitor efforts developed a hybrid capacitor approach for conditioning SatCon's AIPM unit supply voltages by incorporating several typesmore » and sizes to store energy and filter at high, medium and low frequencies while minimizing parasitics (ESR and ESL). Cooling efforts converted the liquid cooled AIPM module to an air-cooled unit using augmented fin, impingement flow cooling. Voltage sensing efforts successfully modified the existing AIPM sensor board to allow several, application dependent configurations and enabling voltage sensor galvanic isolation. Parallel inverter control efforts realized a reliable technique to control individual inverters, connected in a parallel configuration, without a communication link. Individual inverter currents, AC and DC, were balanced in the paralleled modules by introducing a delay to the individual PWM gate pulses. The load current sharing is robust and independent of load types (i.e., linear and nonlinear, resistive and/or inductive). It is a simple yet powerful method for paralleling both individual devices dramatically improves reliability and fault tolerance of parallel inverter power systems. A patent application has been made based on this control technology.« less
NASA Astrophysics Data System (ADS)
Gan, Chee Kwan; Challacombe, Matt
2003-05-01
Recently, early onset linear scaling computation of the exchange-correlation matrix has been achieved using hierarchical cubature [J. Chem. Phys. 113, 10037 (2000)]. Hierarchical cubature differs from other methods in that the integration grid is adaptive and purely Cartesian, which allows for a straightforward domain decomposition in parallel computations; the volume enclosing the entire grid may be simply divided into a number of nonoverlapping boxes. In our data parallel approach, each box requires only a fraction of the total density to perform the necessary numerical integrations due to the finite extent of Gaussian-orbital basis sets. This inherent data locality may be exploited to reduce communications between processors as well as to avoid memory and copy overheads associated with data replication. Although the hierarchical cubature grid is Cartesian, naive boxing leads to irregular work loads due to strong spatial variations of the grid and the electron density. In this paper we describe equal time partitioning, which employs time measurement of the smallest sub-volumes (corresponding to the primitive cubature rule) to load balance grid-work for the next self-consistent-field iteration. After start-up from a heuristic center of mass partitioning, equal time partitioning exploits smooth variation of the density and grid between iterations to achieve load balance. With the 3-21G basis set and a medium quality grid, equal time partitioning applied to taxol (62 heavy atoms) attained a speedup of 61 out of 64 processors, while for a 110 molecule water cluster at standard density it achieved a speedup of 113 out of 128. The efficiency of equal time partitioning applied to hierarchical cubature improves as the grid work per processor increases. With a fine grid and the 6-311G(df,p) basis set, calculations on the 26 atom molecule α-pinene achieved a parallel efficiency better than 99% with 64 processors. For more coarse grained calculations, superlinear speedups are found to result from reduced computational complexity associated with data parallelism.
Capabilities of Fully Parallelized MHD Stability Code MARS
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2016-10-01
Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2015-11-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-09-13
..., measured parallel to the centerline. \\2\\ Subchapters E (Load Lines), F (Marine Engineering), J (Electrical Engineering), N (Dangerous Cargoes), S (Subdivision and Stability), and W (Lifesaving Appliances and...
Mapping a battlefield simulation onto message-passing parallel architectures
NASA Technical Reports Server (NTRS)
Nicol, David M.
1987-01-01
Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.
2012-01-01
Background As Next-Generation Sequencing data becomes available, existing hardware environments do not provide sufficient storage space and computational power to store and process the data due to their enormous size. This is and will be a frequent problem that is encountered everyday by researchers who are working on genetic data. There are some options available for compressing and storing such data, such as general-purpose compression software, PBAT/PLINK binary format, etc. However, these currently available methods either do not offer sufficient compression rates, or require a great amount of CPU time for decompression and loading every time the data is accessed. Results Here, we propose a novel and simple algorithm for storing such sequencing data. We show that, the compression factor of the algorithm ranges from 16 to several hundreds, which potentially allows SNP data of hundreds of Gigabytes to be stored in hundreds of Megabytes. We provide a C++ implementation of the algorithm, which supports direct loading and parallel loading of the compressed format without requiring extra time for decompression. By applying the algorithm to simulated and real datasets, we show that the algorithm gives greater compression rate than the commonly used compression methods, and the data-loading process takes less time. Also, The C++ library provides direct-data-retrieving functions, which allows the compressed information to be easily accessed by other C++ programs. Conclusions The SpeedGene algorithm enables the storage and the analysis of next generation sequencing data in current hardware environment, making system upgrades unnecessary. PMID:22591016
20 kA PFN capacitor bank with solid-state switching. [pulse forming network for plasma studies
NASA Technical Reports Server (NTRS)
Posta, S. J.; Michels, C. J.
1973-01-01
A compact high-current pulse-forming network capacitor bank using paralleled silicon controlled rectifiers as switches is described. The maximum charging voltage of the bank is 1kV and maximum load current is 20 kA. The necessary switch equalization criteria and performance with dummy load and an arc plasma generator are described.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahn, Tae-Hyuk; Sandu, Adrian; Watson, Layne T.
2015-08-01
Ensembles of simulations are employed to estimate the statistics of possible future states of a system, and are widely used in important applications such as climate change and biological modeling. Ensembles of runs can naturally be executed in parallel. However, when the CPU times of individual simulations vary considerably, a simple strategy of assigning an equal number of tasks per processor can lead to serious work imbalances and low parallel efficiency. This paper presents a new probabilistic framework to analyze the performance of dynamic load balancing algorithms for ensembles of simulations where many tasks are mapped onto each processor, andmore » where the individual compute times vary considerably among tasks. Four load balancing strategies are discussed: most-dividing, all-redistribution, random-polling, and neighbor-redistribution. Simulation results with a stochastic budding yeast cell cycle model are consistent with the theoretical analysis. It is especially significant that there is a provable global decrease in load imbalance for the local rebalancing algorithms due to scalability concerns for the global rebalancing algorithms. The overall simulation time is reduced by up to 25 %, and the total processor idle time by 85 %.« less
Watkins, C Edward
2012-09-01
In a way not done before, Tracey, Bludworth, and Glidden-Tracey ("Are there parallel processes in psychotherapy supervision: An empirical examination," Psychotherapy, 2011, advance online publication, doi.10.1037/a0026246) have shown us that parallel process in psychotherapy supervision can indeed be rigorously and meaningfully researched, and their groundbreaking investigation provides a nice prototype for future supervision studies to emulate. In what follows, I offer a brief complementary comment to Tracey et al., addressing one matter that seems to be a potentially important conceptual and empirical parallel process consideration: When is a parallel just a parallel? PsycINFO Database Record (c) 2012 APA, all rights reserved.
Seeing the forest for the trees: Networked workstations as a parallel processing computer
NASA Technical Reports Server (NTRS)
Breen, J. O.; Meleedy, D. M.
1992-01-01
Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.
A status of the Turbine Technology Team activities
NASA Technical Reports Server (NTRS)
Griffin, Lisa W.
1992-01-01
The recent activities of the Turbine Technology Team of the Consortium for Computational Fluid Dynamics (CFD) Application in Propulsion Technology is presented. The team consists of members from the government, industry, and universities. The goal of this team is to demonstrate the benefits to the turbine design process attainable through the application of CFD. This goal is to be achieved by enhancing and validating turbine design tools for improved loading and flowfield definition and loss prediction, and transferring the advanced technology to the turbine design process. In order to demonstrate the advantages of using CFD early in the design phase, the Space Transportation Main Engine (STME) turbines for the National Launch System (NLS) were chosen on which to focus the team's efforts. The Turbine Team activities run parallel to the STME design work.
Parallel Processing at the High School Level.
ERIC Educational Resources Information Center
Sheary, Kathryn Anne
This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…
Kawchuk, Gregory N; Carrasco, Alejandro; Beecher, Grayson; Goertzen, Darrell; Prasad, Narasimha
2010-10-15
Serial dissection of porcine motion segments during robotic control of vertebral kinematics. To identify which spinal tissues are loaded in response to manual therapy (manipulation and mobilization) and to what magnitude. Various theoretical constructs attempt to explain how manual therapies load specific spinal tissues. By using a parallel robot to control vertebral kinematics during serial dissection, it is possible to quantify the loads experienced by discrete spinal tissues undergoing common therapeutic procedures such as manual therapy. In 9 porcine cadavers, manual therapy was provided to L3 and the kinematic response of L3-L4 recorded. The exact kinematic trajectory experienced by L3-L4 in response to manual therapy was then replayed to the isolated segment by a parallel robot equipped with a 6-axis load cell. Discrete spinal tissues were then removed and the kinematic pathway replayed. The change in forces and moments following tissue removal were considered to be those applied to that specific tissue by manual therapy. In this study, both manual therapies affected spinal tissues. The intervertebral disc experienced the greatest forces and moments arising from both manipulation and mobilization. This study is the first to identify which tissues are loaded in response to manual therapy. The observation that manual therapy loads some tissues to a much greater magnitude than others offers a possible explanation for its modest treatment effect; only conditions involving these tissues may be influenced by manual therapy. Future studies are planned to determine if manual therapy can be altered to target (or avoid) specific spinal tissues.
NASA Astrophysics Data System (ADS)
Benini, Luca
2017-06-01
The "internet of everything" envisions trillions of connected objects loaded with high-bandwidth sensors requiring massive amounts of local signal processing, fusion, pattern extraction and classification. From the computational viewpoint, the challenge is formidable and can be addressed only by pushing computing fabrics toward massive parallelism and brain-like energy efficiency levels. CMOS technology can still take us a long way toward this goal, but technology scaling is losing steam. Energy efficiency improvement will increasingly hinge on architecture, circuits, design techniques such as heterogeneous 3D integration, mixed-signal preprocessing, event-based approximate computing and non-Von-Neumann architectures for scalable acceleration.
Simplified Parallel Domain Traversal
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erickson III, David J
2011-01-01
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less
NASA Astrophysics Data System (ADS)
Singh, Santosh Kumar; Ghatak Choudhuri, Sumit
2018-05-01
Parallel connection of UPS inverters to enhance power rating is a widely accepted practice. Inter-modular circulating currents appear when multiple inverter modules are connected in parallel to supply variable critical load. Interfacing of modules henceforth requires an intensive design, using proper control strategy. The potentiality of human intuitive Fuzzy Logic (FL) control with imprecise system model is well known and thus can be utilised in parallel-connected UPS systems. Conventional FL controller is computational intensive, especially with higher number of input variables. This paper proposes application of Hierarchical-Fuzzy Logic control for parallel connected Multi-modular inverters system for reduced computational burden on the processor for a given switching frequency. Simulated results in MATLAB environment and experimental verification using Texas TMS320F2812 DSP are included to demonstrate feasibility of the proposed control scheme.
An algorithm of discovering signatures from DNA databases on a computer cluster.
Lee, Hsiao Ping; Sheu, Tzu-Fang
2014-10-05
Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
NASA Technical Reports Server (NTRS)
Eidson, T. M.; Erlebacher, G.
1994-01-01
While parallel computers offer significant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem - a periodic tridiagonal solver - are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate the various strategies. The particular tridiagonal solver evaluated is used in many computational fluid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solutions for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus a Gaussian elimination type algorithm can be used in a parallel computation and the more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data is moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires significant data movement between processor memories across a network. The second strategy attempts to have the algorithm allow the data across processor boundaries in a chained manner. This usually requires significantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian elimination type algorithm into the parallel chained, load-balanced algorithm.
Machine Learning Based Online Performance Prediction for Runtime Parallelization and Task Scheduling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, J; Ma, X; Singh, K
2008-10-09
With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as well as developing next-generation software requires assistance from hardware, compilers and runtime systems to exploit parallelism transparently within applications. These systems must decompose applications into tasks that can be executed in parallel and then schedule those tasks to minimize load imbalance. However, many systems lack a priori knowledge about the execution time of all tasks to perform effective load balancing with low scheduling overhead. In this paper, we approach this fundamental problem using machine learning techniques first to generatemore » performance models for all tasks and then applying those models to perform automatic performance prediction across program executions. We also extend an existing scheduling algorithm to use generated task cost estimates for online task partitioning and scheduling. We implement the above techniques in the pR framework, which transparently parallelizes scripts in the popular R language, and evaluate their performance and overhead with both a real-world application and a large number of synthetic representative test scripts. Our experimental results show that our proposed approach significantly improves task partitioning and scheduling, with maximum improvements of 21.8%, 40.3% and 22.1% and average improvements of 15.9%, 16.9% and 4.2% for LMM (a real R application) and synthetic test cases with independent and dependent tasks, respectively.« less
Parafoveal load of word N+1 modulates preprocessing effectiveness of word N+2 in Chinese reading.
Yan, Ming; Kliegl, Reinhold; Shu, Hua; Pan, Jinger; Zhou, Xiaolin
2010-12-01
Preview benefits (PBs) from two words to the right of the fixated one (i.e., word N + 2) and associated parafoveal-on-foveal effects are critical for proposals of distributed lexical processing during reading. This experiment examined parafoveal processing during reading of Chinese sentences, using a boundary manipulation of N + 2-word preview with low- and high-frequency words N + 1. The main findings were (a) an identity PB for word N + 2 that was (b) primarily observed when word N + 1 was of high frequency (i.e., an interaction between frequency of word N + 1 and PB for word N + 2), and (c) a parafoveal-on-foveal frequency effect of word N + 1 for fixation durations on word N. We discuss implications for theories of serial attention shifts and parallel distributed processing of words during reading.
Guo, Jing; Wang, Xiao-Yu; Li, Xue-Sheng; Sun, Hai-Yang; Liu, Lin; Li, Hong-Bo
2016-02-01
To evaluate the effect of different designs of marginal preparation on stress distribution in the mandibular premolar restored with endocrown using three-dimensional finite element method. Four models with different designs of marginal preparation, including the flat margin, 90° shoulder, 135° shoulder and chamfer shoulder, were established to imitate mandibular first premolar restored with endocrown. A load of 100 N was applied to the intersection of the long axis and the occlusal surface, either parallel or with an angle of 45° to the long axis of the tooth. The maximum values of Von Mises stress and the stress distribution around the cervical region of the abutment and the endocrown with different designs of marginal preparation were analyzed. The load parallel to the long axis of the tooth caused obvious stress concentration in the lingual portions of both the cervical region of the tooth tissue and the restoration. The stress distribution characteristics on the cervical region of the models with a flat margin and a 90° shoulder were more uniform than those in the models with a 135° shoulder and chamfer shoulder. Loading at 45° to the long axis caused stress concentration mainly on the buccal portion of the cervical region, and the model with a flat margin showed the most favorable stress distribution patterns with a greater maximum Von Mises stress under this circumstance than that with a parallel loading. Irrespective of the loading direction, the stress value was the lowest in the flat margin model, where the stress value in the cervical region of the endocrown was greater than that in the counterpart of the tooth tissue. The stress level on the enamel was higher than that on the dentin nearby in the flat margin model. From the stress distribution point of view, endocrowns with flat margin followed by a 90° shoulder are recommended.
A parallel algorithm for multi-level logic synthesis using the transduction method. M.S. Thesis
NASA Technical Reports Server (NTRS)
Lim, Chieng-Fai
1991-01-01
The Transduction Method has been shown to be a powerful tool in the optimization of multilevel networks. Many tools such as the SYLON synthesis system (X90), (CM89), (LM90) have been developed based on this method. A parallel implementation is presented of SYLON-XTRANS (XM89) on an eight processor Encore Multimax shared memory multiprocessor. It minimizes multilevel networks consisting of simple gates through parallel pruning, gate substitution, gate merging, generalized gate substitution, and gate input reduction. This implementation, called Parallel TRANSduction (PTRANS), also uses partitioning to break large circuits up and performs inter- and intra-partition dynamic load balancing. With this, good speedups and high processor efficiencies are achievable without sacrificing the resulting circuit quality.
Performance analysis of parallel branch and bound search with the hypercube architecture
NASA Technical Reports Server (NTRS)
Mraz, Richard T.
1987-01-01
With the availability of commercial parallel computers, researchers are examining new classes of problems which might benefit from parallel computing. This paper presents results of an investigation of the class of search intensive problems. The specific problem discussed is the Least-Cost Branch and Bound search method of deadline job scheduling. The object-oriented design methodology was used to map the problem into a parallel solution. While the initial design was good for a prototype, the best performance resulted from fine-tuning the algorithm for a specific computer. The experiments analyze the computation time, the speed up over a VAX 11/785, and the load balance of the problem when using loosely coupled multiprocessor system based on the hypercube architecture.
Wang, Peihong; Du, Hejun
2015-07-01
Zinc oxide (ZnO) thin film piezoelectric microelectromechanical systems (MEMS) based vibration energy harvesters with two different designs are presented. These harvesters consist of a silicon cantilever, a silicon proof mass, and a ZnO piezoelectric layer. Design I has a large ZnO piezoelectric element and Design II has two smaller and equally sized ZnO piezoelectric elements; however, the total area of ZnO thin film in two designs is equal. The ZnO thin film is deposited by means of radio-frequency magnetron sputtering method and is characterized by means of XRD and SEM techniques. These ZnO energy harvesters are fabricated by using MEMS micromachining. The natural frequencies of the fabricated ZnO energy harvesters are simulated and tested. The test results show that these two energy harvesters with different designs have almost the same natural frequency. Then, the output performance of different ZnO energy harvesters is tested in detail. The effects of series connection and parallel connection of two ZnO elements on the load voltage and power are also analyzed. The experimental results show that the energy harvester with two ZnO piezoelectric elements in parallel connection in Design II has higher load voltage and higher load power than the fabricated energy harvesters with other designs. Its load voltage is 2.06 V under load resistance of 1 MΩ and its maximal load power is 1.25 μW under load resistance of 0.6 MΩ, when it is excited by an external vibration with frequency of 1300.1 Hz and acceleration of 10 m/s(2). By contrast, the load voltage of the energy harvester of Design I is 1.77 V under 1 MΩ resistance and its maximal load power is 0.98 μW under 0.38 MΩ load resistance when it is excited by the same vibration.
Peloquin, John M; Elliott, Dawn M
2016-04-01
Cracks in fibrous soft tissue, such as intervertebral disc annulus fibrosus and knee meniscus, cause pain and compromise joint mechanics. A crack concentrates stress at its tip, making further failure and crack extension (fracture) more likely. Ex vivo mechanical testing is an important tool for studying the loading conditions required for crack extension, but prior work has shown that it is difficult to reproduce crack extension. Most prior work used edge crack specimens in uniaxial tension, with the crack 90° to the edge of the specimen. This configuration does not necessarily represent the loading conditions that cause in vivo crack extension. To find a potentially better choice for experiments aiming to reproduce crack extension, we used finite element analysis to compare, in factorial combination, (1) center crack vs. edge crack location, (2) biaxial vs. uniaxial loading, and (3) crack-fiber angles ranging from 0° to 90°. The simulated material was annulus fibrosus fibrocartilage with a single fiber family. We hypothesized that one of the simulated test cases would produce a stronger stress concentration than the commonly used uniaxially loaded 90° crack-fiber angle edge crack case. Stress concentrations were compared between cases in terms of fiber-parallel stress (representing risk of fiber rupture), fiber-perpendicular stress (representing risk of matrix rupture), and fiber shear stress (representing risk of fiber sliding). Fiber-perpendicular stress and fiber shear stress concentrations were greatest in edge crack specimens (of any crack-fiber angle) and center crack specimens with a 90° crack-fiber angle. However, unless the crack is parallel to the fiber direction, these stress components alone are insufficient to cause crack opening and extension. Fiber-parallel stress concentrations were greatest in center crack specimens with a 45° crack-fiber angle, either biaxially or uniaxially loaded. We therefore recommend that the 45° center crack case be tried in future experiments intended to study crack extension by fiber rupture. Copyright © 2015 Elsevier Ltd. All rights reserved.
Quantifying the debonding of inclusions through tomography and computational homology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Wei-Yang; Johnson, George C.; Mota, Alejandro
2010-09-01
This report describes a Laboratory Directed Research and Development (LDRD) project to use of synchrotron-radiation computed tomography (SRCT) data to determine the conditions and mechanisms that lead to void nucleation in rolled alloys. The Advanced Light Source (ALS) at Lawrence Berkeley National Laboratory (LBNL) has provided SRCT data of a few specimens of 7075-T7351 aluminum plate (widely used for aerospace applications) stretched to failure, loaded in directions perpendicular and parallel to the rolling direction. The resolution of SRCT data is 900nm, which allows elucidation of the mechanisms governing void growth and coalescence. This resolution is not fine enough, however, formore » nucleation. We propose the use statistics and image processing techniques to obtain sub-resolution scale information from these data, and thus determine where in the specimen and when during the loading program nucleation occurs and the mechanisms that lead to it. Quantitative analysis of the tomography data, however, leads to the conclusion that the reconstruction process compromises the information obtained from the scans. Alternate, more powerful reconstruction algorithms are needed to address this problem, but those fall beyond the scope of this project.« less
Short-term Power Load Forecasting Based on Balanced KNN
NASA Astrophysics Data System (ADS)
Lv, Xianlong; Cheng, Xingong; YanShuang; Tang, Yan-mei
2018-03-01
To improve the accuracy of load forecasting, a short-term load forecasting model based on balanced KNN algorithm is proposed; According to the load characteristics, the historical data of massive power load are divided into scenes by the K-means algorithm; In view of unbalanced load scenes, the balanced KNN algorithm is proposed to classify the scene accurately; The local weighted linear regression algorithm is used to fitting and predict the load; Adopting the Apache Hadoop programming framework of cloud computing, the proposed algorithm model is parallelized and improved to enhance its ability of dealing with massive and high-dimension data. The analysis of the household electricity consumption data for a residential district is done by 23-nodes cloud computing cluster, and experimental results show that the load forecasting accuracy and execution time by the proposed model are the better than those of traditional forecasting algorithm.
Thermally determining flow and/or heat load distribution in parallel paths
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chainer, Timothy J.; Iyengar, Madhusudan K.; Parida, Pritish R.
A method including obtaining calibration data for at least one sub-component in a heat transfer assembly, wherein the calibration data comprises at least one indication of coolant flow rate through the sub-component for a given surface temperature delta of the sub-component and a given heat load into said sub-component, determining a measured heat load into the sub-component, determining a measured surface temperature delta of the sub-component, and determining a coolant flow distribution in a first flow path comprising the sub-component from the calibration data according to the measured heat load and the measured surface temperature delta of the sub-component.
Thermally determining flow and/or heat load distribution in parallel paths
Chainer, Timothy J.; Iyengar, Madhusudan K.; Parida, Pritish R.
2016-12-13
A method including obtaining calibration data for at least one sub-component in a heat transfer assembly, wherein the calibration data comprises at least one indication of coolant flow rate through the sub-component for a given surface temperature delta of the sub-component and a given heat load into said sub-component, determining a measured heat load into the sub-component, determining a measured surface temperature delta of the sub-component, and determining a coolant flow distribution in a first flow path comprising the sub-component from the calibration data according to the measured heat load and the measured surface temperature delta of the sub-component.
The source of dual-task limitations: Serial or parallel processing of multiple response selections?
Marois, René
2014-01-01
Although it is generally recognized that the concurrent performance of two tasks incurs costs, the sources of these dual-task costs remain controversial. The serial bottleneck model suggests that serial postponement of task performance in dual-task conditions results from a central stage of response selection that can only process one task at a time. Cognitive-control models, by contrast, propose that multiple response selections can proceed in parallel, but that serial processing of task performance is predominantly adopted because its processing efficiency is higher than that of parallel processing. In the present study, we empirically tested this proposition by examining whether parallel processing would occur when it was more efficient and financially rewarded. The results indicated that even when parallel processing was more efficient and was incentivized by financial reward, participants still failed to process tasks in parallel. We conclude that central information processing is limited by a serial bottleneck. PMID:23864266
Funabashi, Martha; Nougarou, François; Descarreaux, Martin; Prasad, Narasimha; Kawchuk, Greg
In order to define the relation between spinal manipulative therapy (SMT) input parameters and the distribution of load within spinal tissues, the aim of this study was to determine the influence of force magnitude and application site when SMT is applied to cadaveric spines. In 10 porcine cadavers, a servo-controlled linear actuator motor provided a standardized SMT simulation using 3 different force magnitudes (100N, 300N, and 500N) to 2 different cutaneous locations: L3/L4 facet joint (FJ), and L4 transverse processes (TVP). Vertebral kinematics were tracked optically using indwelling bone pins, the motion segment removed and mounted in a parallel robot equipped with a 6-axis load cell. The kinematics of each SMT application were replicated robotically. Serial dissection of spinal structures was conducted to quantify loading characteristics of discrete spinal tissues. Forces experienced by the L3/L4 segment and spinal structures during SMT replication were recorded and analyzed. Spinal manipulative therapy force magnitude and application site parameters influenced spinal tissues loading. A significant main effect (P < .05) of force magnitude was observed on the loads experienced by the intact specimen and supra- and interspinous ligaments. The main effect of application site was also significant (P < .05), influencing the loading of the intact specimen and facet joints, capsules, and ligamentum flavum (P < .05). Spinal manipulative therapy input parameters of force magnitude and application site significantly influence the distribution of forces within spinal tissues. By controlling these SMT parameters, clinical outcomes may potentially be manipulated. Copyright © 2017. Published by Elsevier Inc.
Analytical study of pressure balancing in gas film seals
NASA Technical Reports Server (NTRS)
Zuk, J.
1973-01-01
The load factor is investigated for subsonic and choked flow conditions, laminar and turbulent flows, and various seal entrance conditions. Both parallel sealing surfaces and surfaces with small linear deformation were investigated. The load factor for subsonic flow depends strongly on pressure ratio; under choked flow conditions, however the load factor is found to depend more strongly on film thickness and flow entrance conditions rather than pressure ratio. The importance of generating hydrodynamic forces to keep the seal balanced under severe and multipoint operation is also discussed.
NASA Astrophysics Data System (ADS)
Penner, Joyce E.; Andronova, Natalia; Oehmke, Robert C.; Brown, Jonathan; Stout, Quentin F.; Jablonowski, Christiane; van Leer, Bram; Powell, Kenneth G.; Herzog, Michael
2007-07-01
One of the most important advances needed in global climate models is the development of atmospheric General Circulation Models (GCMs) that can reliably treat convection. Such GCMs require high resolution in local convectively active regions, both in the horizontal and vertical directions. During previous research we have developed an Adaptive Mesh Refinement (AMR) dynamical core that can adapt its grid resolution horizontally. Our approach utilizes a finite volume numerical representation of the partial differential equations with floating Lagrangian vertical coordinates and requires resolving dynamical processes on small spatial scales. For the latter it uses a newly developed general-purpose library, which facilitates 3D block-structured AMR on spherical grids. The library manages neighbor information as the blocks adapt, and handles the parallel communication and load balancing, freeing the user to concentrate on the scientific modeling aspects of their code. In particular, this library defines and manages adaptive blocks on the sphere, provides user interfaces for interpolation routines and supports the communication and load-balancing aspects for parallel applications. We have successfully tested the library in a 2-D (longitude-latitude) implementation. During the past year, we have extended the library to treat adaptive mesh refinement in the vertical direction. Preliminary results are discussed. This research project is characterized by an interdisciplinary approach involving atmospheric science, computer science and mathematical/numerical aspects. The work is done in close collaboration between the Atmospheric Science, Computer Science and Aerospace Engineering Departments at the University of Michigan and NOAA GFDL.
Parallel Activation in Bilingual Phonological Processing
ERIC Educational Resources Information Center
Lee, Su-Yeon
2011-01-01
In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…
Parallel deterministic neutronics with AMR in 3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clouse, C.; Ferguson, J.; Hendrickson, C.
1997-12-31
AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.
Database Reorganization in Parallel Disk Arrays with I/O Service Stealing
NASA Technical Reports Server (NTRS)
Zabback, Peter; Onyuksel, Ibrahim; Scheuermann, Peter; Weikum, Gerhard
1996-01-01
We present a model for data reorganization in parallel disk systems that is geared towards load balancing in an environment with periodic access patterns. Data reorganization is performed by disk cooling, i.e. migrating files or extents from the hottest disks to the coldest ones. We develop an approximate queueing model for determining the effective arrival rates of cooling requests and discuss its use in assessing the costs versus benefits of cooling.
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2015-04-01
PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Design of a real-time wind turbine simulator using a custom parallel architecture
NASA Technical Reports Server (NTRS)
Hoffman, John A.; Gluck, R.; Sridhar, S.
1995-01-01
The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Temporal responses of coastal hypoxia to nutrient loading and physical controls
NASA Astrophysics Data System (ADS)
Kemp, W. M.; Testa, J. M.; Conley, D. J.; Gilbert, D.; Hagy, J. D.
2009-12-01
The incidence and intensity of hypoxic waters in coastal aquatic ecosystems has been expanding in recent decades coincident with eutrophication of the coastal zone. Worldwide, there is strong interest in reducing the size and duration of hypoxia in coastal waters, because hypoxia causes negative effects for many organisms and ecosystem processes. Although strategies to reduce hypoxia by decreasing nutrient loading are predicated on the assumption that this action would reverse eutrophication, recent analyses of historical data from European and North American coastal systems suggest little evidence for simple linear response trajectories. We review published parallel time-series data on hypoxia and loading rates for inorganic nutrients and labile organic matter to analyze trajectories of oxygen (O2) response to nutrient loading. We also assess existing knowledge of physical and ecological factors regulating O2 in coastal marine waters to facilitate analysis of hypoxia responses to reductions in nutrient (and/or organic matter) inputs. Of the 24 systems identified where concurrent time series of loading and O2 were available, half displayed relatively clear and direct recoveries following remediation. We explored in detail 5 well-studied systems that have exhibited complex, non-linear responses to variations in loading, including apparent "regime shifts". A summary of these analyses suggests that O2 conditions improved rapidly and linearly in systems where remediation focused on organic inputs from sewage treatment plants, which were the primary drivers of hypoxia. In larger more open systems where diffuse nutrient loads are more important in fueling O2 depletion and where climatic influences are pronounced, responses to remediation tended to follow non-linear trends that may include hysteresis and time-lags. Improved understanding of hypoxia remediation requires that future studies use comparative approaches and consider multiple regulating factors. These analyses should consider: (1) the dominant temporal scales of the hypoxia, (2) the relative contributions of inorganic and organic nutrients, (3) the influence of shifts in climatic and oceanographic processes, and (4) the roles of feedback interactions whereby O2-sensitive biogeochemistry, trophic interactions, and habitat conditions influence the nutrient and algal dynamics that regulate O2 levels.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.
1997-12-01
Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
Li, Bingyi; Chen, Liang; Wei, Chunpeng; Xie, Yizhuang; Chen, He; Yu, Wenyue
2017-01-01
With the development of satellite load technology and very large scale integrated (VLSI) circuit technology, onboard real-time synthetic aperture radar (SAR) imaging systems have become a solution for allowing rapid response to disasters. A key goal of the onboard SAR imaging system design is to achieve high real-time processing performance with severe size, weight, and power consumption constraints. In this paper, we analyse the computational burden of the commonly used chirp scaling (CS) SAR imaging algorithm. To reduce the system hardware cost, we propose a partial fixed-point processing scheme. The fast Fourier transform (FFT), which is the most computation-sensitive operation in the CS algorithm, is processed with fixed-point, while other operations are processed with single precision floating-point. With the proposed fixed-point processing error propagation model, the fixed-point processing word length is determined. The fidelity and accuracy relative to conventional ground-based software processors is verified by evaluating both the point target imaging quality and the actual scene imaging quality. As a proof of concept, a field- programmable gate array—application-specific integrated circuit (FPGA-ASIC) hybrid heterogeneous parallel accelerating architecture is designed and realized. The customized fixed-point FFT is implemented using the 130 nm complementary metal oxide semiconductor (CMOS) technology as a co-processor of the Xilinx xc6vlx760t FPGA. A single processing board requires 12 s and consumes 21 W to focus a 50-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384. PMID:28672813
Yang, Chen; Li, Bingyi; Chen, Liang; Wei, Chunpeng; Xie, Yizhuang; Chen, He; Yu, Wenyue
2017-06-24
With the development of satellite load technology and very large scale integrated (VLSI) circuit technology, onboard real-time synthetic aperture radar (SAR) imaging systems have become a solution for allowing rapid response to disasters. A key goal of the onboard SAR imaging system design is to achieve high real-time processing performance with severe size, weight, and power consumption constraints. In this paper, we analyse the computational burden of the commonly used chirp scaling (CS) SAR imaging algorithm. To reduce the system hardware cost, we propose a partial fixed-point processing scheme. The fast Fourier transform (FFT), which is the most computation-sensitive operation in the CS algorithm, is processed with fixed-point, while other operations are processed with single precision floating-point. With the proposed fixed-point processing error propagation model, the fixed-point processing word length is determined. The fidelity and accuracy relative to conventional ground-based software processors is verified by evaluating both the point target imaging quality and the actual scene imaging quality. As a proof of concept, a field- programmable gate array-application-specific integrated circuit (FPGA-ASIC) hybrid heterogeneous parallel accelerating architecture is designed and realized. The customized fixed-point FFT is implemented using the 130 nm complementary metal oxide semiconductor (CMOS) technology as a co-processor of the Xilinx xc6vlx760t FPGA. A single processing board requires 12 s and consumes 21 W to focus a 50-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384.
The Goddard Space Flight Center Program to develop parallel image processing systems
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1972-01-01
Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
NASA Astrophysics Data System (ADS)
Han, Qiguo; Zhu, Kai; Shi, Wenming; Wu, Kuayu; Chen, Kai
2018-02-01
In order to solve the problem of low voltage ride through(LVRT) of the major auxiliary equipment’s variable-frequency drive (VFD) in thermal power plant, the scheme of supercapacitor paralleled in the DC link of VFD is put forward, furthermore, two solutions of direct parallel support and voltage boost parallel support of supercapacitor are proposed. The capacitor values for the relevant motor loads are calculated according to the law of energy conservation, and they are verified by Matlab simulation. At last, a set of test prototype is set up, and the test results prove the feasibility of the proposed schemes.
Parallel Harmony Search Based Distributed Energy Resource Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ceylan, Oguzhan; Liu, Guodong; Tomsovic, Kevin
2015-01-01
This paper presents a harmony search based parallel optimization algorithm to minimize voltage deviations in three phase unbalanced electrical distribution systems and to maximize active power outputs of distributed energy resources (DR). The main contribution is to reduce the adverse impacts on voltage profile during a day as photovoltaics (PVs) output or electrical vehicles (EVs) charging changes throughout a day. The IEEE 123- bus distribution test system is modified by adding DRs and EVs under different load profiles. The simulation results show that by using parallel computing techniques, heuristic methods may be used as an alternative optimization tool in electricalmore » power distribution systems operation.« less
Implicit schemes and parallel computing in unstructured grid CFD
NASA Technical Reports Server (NTRS)
Venkatakrishnam, V.
1995-01-01
The development of implicit schemes for obtaining steady state solutions to the Euler and Navier-Stokes equations on unstructured grids is outlined. Applications are presented that compare the convergence characteristics of various implicit methods. Next, the development of explicit and implicit schemes to compute unsteady flows on unstructured grids is discussed. Next, the issues involved in parallelizing finite volume schemes on unstructured meshes in an MIMD (multiple instruction/multiple data stream) fashion are outlined. Techniques for partitioning unstructured grids among processors and for extracting parallelism in explicit and implicit solvers are discussed. Finally, some dynamic load balancing ideas, which are useful in adaptive transient computations, are presented.
NASA Technical Reports Server (NTRS)
Lee, Jong-Won; Allen, David H.
1990-01-01
A continuous fiber composite is modelled by a two-element composite cylinder in order to predict the elastoplastic response of the composite under a monotonically increasing tensile loading parallel to fibers. The fibers and matrix are assumed to be elastic-perfectly plastic materials obeying Hill's and Tresca's yield criteria, respectively. Here, the composite behavior when the fibers yield prior to the matrix is investigated.
Parallel grid library for rapid and flexible simulation development
NASA Astrophysics Data System (ADS)
Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.
2013-04-01
We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and load balancing. Solution method: The simulation grid is represented by an adjacency list (graph) with vertices stored into a hash table and edges into contiguous arrays. Message Passing Interface standard is used for parallelization. Cell data is given as a template parameter when instantiating the grid. Restrictions: Logically cartesian grid. Running time: Running time depends on the hardware, problem and the solution method. Small problems can be solved in under a minute and very large problems can take weeks. The examples and tests provided with the package take less than about one minute using default options. In the version of dccrg presented here the speed of adaptive mesh refinement is at most of the order of 106 total created cells per second. http://www.mpi-forum.org/. http://www.boost.org/. K. Devine, E. Boman, R. Heaphy, B. Hendrickson, C. Vaughan, Zoltan data management services for parallel dynamic applications, Comput. Sci. Eng. 4 (2002) 90-97. http://dx.doi.org/10.1109/5992.988653. https://gitorious.org/sfc++.
GATECloud.net: a platform for large-scale, open-source text processing on the cloud.
Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina
2013-01-28
Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
Heat transfer optimization for air-mist cooling between a stack of parallel plates
NASA Astrophysics Data System (ADS)
Issa, Roy J.
2010-06-01
A theoretical model is developed to predict the upper limit heat transfer between a stack of parallel plates subject to multiphase cooling by air-mist flow. The model predicts the optimal separation distance between the plates based on the development of the boundary layers for small and large separation distances, and for dilute mist conditions. Simulation results show the optimal separation distance to be strongly dependent on the liquid-to-air mass flow rate loading ratio, and reach a limit for a critical loading. For these dilute spray conditions, complete evaporation of the droplets takes place. Simulation results also show the optimal separation distance decreases with the increase in the mist flow rate. The proposed theoretical model shall lead to a better understanding of the design of fins spacing in heat exchangers where multiphase spray cooling is used.
Performance Analysis and Portability of the PLUM Load Balancing System
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.
1998-01-01
The ability to dynamically adapt an unstructured mesh is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult. To address this problem, we have developed PLUM, an automatic portable framework for performing adaptive numerical computations in a message-passing environment. PLUM requires that all data be globally redistributed after each mesh adaption to achieve load balance. We present an algorithm for minimizing this remapping overhead by guaranteeing an optimal processor reassignment. We also show that the data redistribution cost can be significantly reduced by applying our heuristic processor reassignment algorithm to the default mapping of the parallel partitioner. Portability is examined by comparing performance on a SP2, an Origin2000, and a T3E. Results show that PLUM can be successfully ported to different platforms without any code modifications.
Validation of Shear Wave Elastography in Skeletal Muscle
Eby, Sarah F.; Song, Pengfei; Chen, Shigao; Chen, Qingshan; Greenleaf, James F.; An, Kai-Nan
2013-01-01
Skeletal muscle is a very dynamic tissue, thus accurate quantification of skeletal muscle stiffness throughout its functional range is crucial to improve the physical functioning and independence following pathology. Shear wave elastography (SWE) is an ultrasound-based technique that characterizes tissue mechanical properties based on the propagation of remotely induced shear waves. The objective of this study is to validate SWE throughout the functional range of motion of skeletal muscle for three ultrasound transducer orientations. We hypothesized that combining traditional materials testing (MTS) techniques with SWE measurements will show increased stiffness measures with increasing tensile load, and will correlate well with each other for trials in which the transducer is parallel to underlying muscle fibers. To evaluate this hypothesis, we monitored the deformation throughout tensile loading of four porcine brachialis whole-muscle tissue specimens, while simultaneously making SWE measurements of the same specimen. We used regression to examine the correlation between Young's modulus from MTS and shear modulus from SWE for each of the transducer orientations. We applied a generalized linear model to account for repeated testing. Model parameters were estimated via generalized estimating equations. The regression coefficient was 0.1944, with a 95% confidence interval of (0.1463 – 0.2425) for parallel transducer trials. Shear waves did not propagate well for both the 45° and perpendicular transducer orientations. Both parallel SWE and MTS showed increased stiffness with increasing tensile load. This study provides the necessary first step for additional studies that can evaluate the distribution of stiffness throughout muscle. PMID:23953670
Internal viscoelastic loading in cat papillary muscle.
Chiu, Y L; Ballou, E W; Ford, L E
1982-01-01
The passive mechanical properties of myocardium were defined by measuring force responses to rapid length ramps applied to unstimulated cat papillary muscles. The immediate force changes following these ramps recovered partially to their initial value, suggesting a series combination of viscous element and spring. Because the stretched muscle can bear force at rest, the viscous element must be in parallel with an additional spring. The instantaneous extension-force curves measured at different lengths were nonlinear, and could be made to superimpose by a simple horizontal shift. This finding suggests that the same spring was being measured at each length, and that this spring was in series with both the viscous element and its parallel spring (Voigt configuration), so that the parallel spring is held nearly rigid by the viscous element during rapid steps. The series spring in the passive muscle could account for most of the series elastic recoil in the active muscle, suggesting that the same spring is in series with both the contractile elements and the viscous element. It is postulated that the viscous element might be coupled to the contractile elements by a compliance, so that the load imposed on the contractile elements by the passive structures is viscoelastic rather than purely viscous. Such a viscoelastic load would give the muscle a length-independent, early diastolic restoring force. The possibility is discussed that the length-independent restoring force would allow some of the energy liberated during active shortening to be stored and released during relaxation. Images FIGURE 7 FIGURE 8 PMID:7171707
Thread concept for automatic task parallelization in image analysis
NASA Astrophysics Data System (ADS)
Lueckenhaus, Maximilian; Eckstein, Wolfgang
1998-09-01
Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Studies in optical parallel processing. [All optical and electro-optic approaches
NASA Technical Reports Server (NTRS)
Lee, S. H.
1978-01-01
Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.
Accelerating semantic graph databases on commodity clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Castellana, Vito G.; Haglin, David J.
We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.
State-plane analysis of parallel resonant converter
NASA Technical Reports Server (NTRS)
Oruganti, R.; Lee, F. C.
1985-01-01
A method for analyzing the complex operation of a parallel resonant converter is developed, utilizing graphical state-plane techniques. The comprehensive mode analysis uncovers, for the first time, the presence of other complex modes besides the continuous conduction mode and the discontinuous conduction mode and determines their theoretical boundaries. Based on the insight gained from the analysis, a novel, high-frequency resonant buck converter is proposed. The voltage conversion ratio of the new converter is almost independent of load.
Benchmarking Ada tasking on tightly coupled multiprocessor architectures
NASA Technical Reports Server (NTRS)
Collard, Philippe; Goforth, Andre; Marquardt, Matthew
1989-01-01
The development of benchmarks and performance measures for parallel Ada tasking is reported with emphasis on the macroscopic behavior of the benchmark across a set of load parameters. The application chosen for the study was the NASREM model for telerobot control, relevant to many NASA missions. The results of the study demonstrate the potential of parallel Ada in accomplishing the task of developing a control system for a system such as the Flight Telerobotic Servicer using the NASREM framework.
Parallel workflow tools to facilitate human brain MRI post-processing
Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang
2015-01-01
Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043
Cooperative storage of shared files in a parallel computing system with dynamic block size
Bent, John M.; Faibish, Sorin; Grider, Gary
2015-11-10
Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).
Nuclear fuel management optimization using genetic algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeChaine, M.D.; Feltus, M.A.
1995-07-01
The code independent genetic algorithm reactor optimization (CIGARO) system has been developed to optimize nuclear reactor loading patterns. It uses genetic algorithms (GAs) and a code-independent interface, so any reactor physics code (e.g., CASMO-3/SIMULATE-3) can be used to evaluate the loading patterns. The system is compared to other GA-based loading pattern optimizers. Tests were carried out to maximize the beginning of cycle k{sub eff} for a pressurized water reactor core loading with a penalty function to limit power peaking. The CIGARO system performed well, increasing the k{sub eff} after lowering the peak power. Tests of a prototype parallel evaluation methodmore » showed the potential for a significant speedup.« less
The effect of model uncertainty on some optimal routing problems
NASA Technical Reports Server (NTRS)
Mohanty, Bibhu; Cassandras, Christos G.
1991-01-01
The effect of model uncertainties on optimal routing in a system of parallel queues is examined. The uncertainty arises in modeling the service time distribution for the customers (jobs, packets) to be served. For a Poisson arrival process and Bernoulli routing, the optimal mean system delay generally depends on the variance of this distribution. However, as the input traffic load approaches the system capacity the optimal routing assignment and corresponding mean system delay are shown to converge to a variance-invariant point. The implications of these results are examined in the context of gradient-based routing algorithms. An example of a model-independent algorithm using online gradient estimation is also included.
Crack Front Segmentation and Facet Coarsening in Mixed-Mode Fracture
NASA Astrophysics Data System (ADS)
Chen, Chih-Hung; Cambonie, Tristan; Lazarus, Veronique; Nicoli, Matteo; Pons, Antonio J.; Karma, Alain
2015-12-01
A planar crack generically segments into an array of "daughter cracks" shaped as tilted facets when loaded with both a tensile stress normal to the crack plane (mode I) and a shear stress parallel to the crack front (mode III). We investigate facet propagation and coarsening using in situ microscopy observations of fracture surfaces at different stages of quasistatic mixed-mode crack propagation and phase-field simulations. The results demonstrate that the bifurcation from propagating a planar to segmented crack front is strongly subcritical, reconciling previous theoretical predictions of linear stability analysis with experimental observations. They further show that facet coarsening is a self-similar process driven by a spatial period-doubling instability of facet arrays.
Efficient multitasking: parallel versus serial processing of multiple tasks
Fischer, Rico; Plessow, Franziska
2015-01-01
In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling. PMID:26441742
Efficient multitasking: parallel versus serial processing of multiple tasks.
Fischer, Rico; Plessow, Franziska
2015-01-01
In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.
Collins, Anne G E; Albrecht, Matthew A; Waltz, James A; Gold, James M; Frank, Michael J
2017-09-15
When studying learning, researchers directly observe only the participants' choices, which are often assumed to arise from a unitary learning process. However, a number of separable systems, such as working memory (WM) and reinforcement learning (RL), contribute simultaneously to human learning. Identifying each system's contributions is essential for mapping the neural substrates contributing in parallel to behavior; computational modeling can help to design tasks that allow such a separable identification of processes and infer their contributions in individuals. We present a new experimental protocol that separately identifies the contributions of RL and WM to learning, is sensitive to parametric variations in both, and allows us to investigate whether the processes interact. In experiments 1 and 2, we tested this protocol with healthy young adults (n = 29 and n = 52, respectively). In experiment 3, we used it to investigate learning deficits in medicated individuals with schizophrenia (n = 49 patients, n = 32 control subjects). Experiments 1 and 2 established WM and RL contributions to learning, as evidenced by parametric modulations of choice by load and delay and reward history, respectively. They also showed interactions between WM and RL, where RL was enhanced under high WM load. Moreover, we observed a cost of mental effort when controlling for reinforcement history: participants preferred stimuli they encountered under low WM load. Experiment 3 revealed selective deficits in WM contributions and preserved RL value learning in individuals with schizophrenia compared with control subjects. Computational approaches allow us to disentangle contributions of multiple systems to learning and, consequently, to further our understanding of psychiatric diseases. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Simon, Horst D.
1996-01-01
The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
Efficient Interconnection Schemes for VLSI and Parallel Computation
1989-08-01
Definition: Let R be a routing network. A set S of wires in R is a (directed) cut if it partitions the network into two sets of processors A and B ...such that every path from a processor in A to a processor in B contains a wire in S. The capacity cap(S) is the number of wires in the cut. For a set of...messages M, define the load load(M, S) of M on a cut S to be the number of messages in M from a processor in A to a processor in B . The load factor
NASA Astrophysics Data System (ADS)
Li, Gaohua; Fu, Xiang; Wang, Fuxin
2017-10-01
The low-dissipation high-order accurate hybrid up-winding/central scheme based on fifth-order weighted essentially non-oscillatory (WENO) and sixth-order central schemes, along with the Spalart-Allmaras (SA)-based delayed detached eddy simulation (DDES) turbulence model, and the flow feature-based adaptive mesh refinement (AMR), are implemented into a dual-mesh overset grid infrastructure with parallel computing capabilities, for the purpose of simulating vortex-dominated unsteady detached wake flows with high spatial resolutions. The overset grid assembly (OGA) process based on collection detection theory and implicit hole-cutting algorithm achieves an automatic coupling for the near-body and off-body solvers, and the error-and-try method is used for obtaining a globally balanced load distribution among the composed multiple codes. The results of flows over high Reynolds cylinder and two-bladed helicopter rotor show that the combination of high-order hybrid scheme, advanced turbulence model, and overset adaptive mesh refinement can effectively enhance the spatial resolution for the simulation of turbulent wake eddies.
Abbs, J H; Gracco, V L
1984-04-01
The contribution of ascending afferents to the control of speech movement was evaluated by applying unanticipated loads to the lower lip during the generation of combined upper lip-lower lip speech gestures. To eliminate potential contamination due to anticipation or adaptation, loads were applied randomly on only 10-15% of the trials. Physical characteristics of the perturbations were within the normal range of forces and movements involved in natural lip actions for speech. Compensatory responses in multiple facial muscles and lip movements were observed the first time a load was introduced, and achievement of the multimovement speech goals was never disrupted by these perturbations. Muscle responses were seen in the lower lip muscles, implicating corrective, feedback processes. Additionally, compensatory responses to these lower lip loads were also observed in the independently controlled muscles of the upper lip, reflecting the parallel operation of open-loop, sensorimotor mechanisms. Compensatory responses from both the upper and lower lip muscles were observed with small (1 mm) as well as large (15 mm) perturbations. The latencies of these compensatory responses were not discernible by conventional ensemble averaging. Moreover, responses at latencies of lower brain stem-mediated reflexes (i.e., 10-18 ms) were not apparent with inspection of individual records. Response latencies were determined on individual loaded trials through the use of a computer algorithm that took into account the variability of electromyograms (EMG) among the control trials. These latency measures confirmed the absence of brain stem-mediated responses and yielded response latencies that ranged from 22 to 75 ms. Response latencies appeared to be influenced by the time relation between load onset and the initiation of muscle activation. Examination of muscle activity changes for individual loaded trials revealed complementary variations in the magnitude of responses among multiple muscles contributing to a movement compensation. These observations may have implications for limb movement control if multimovement speech gestures are considered analogous to a limb action requiring coordinated movements around multiple joints. In this context, these speech motor control data might be interpreted to suggest that for complex movements, both corrective feedback and open-loop predictive processes are operating, with the latter involved in the control of coordination among multiple movement subcomponents.
Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton
2014-11-11
We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
An implementation of a tree code on a SIMD, parallel computer
NASA Technical Reports Server (NTRS)
Olson, Kevin M.; Dorband, John E.
1994-01-01
We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
NASA Astrophysics Data System (ADS)
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai; Ng, Cho-Kuen; Rivetta, Claudio
2017-10-01
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we present the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. The simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.
NASA Astrophysics Data System (ADS)
Huang, Bo; Zhai, Yutao; Liu, Shaojun; Mao, Xiaodong
2018-03-01
Selective laser melting (SLM) is a promising way for the fabrication of complex reduced activation ferritic/martensitic steel components. The microstructure of the SLM built China low activation martensitic (CLAM) steel plates was observed and analyzed. The hardness, Charpy impact and tensile testing of the specimens in different orientations were performed at room temperature. The results showed that the difference in the mechanical properties was related to the anisotropy in microstructure. The planer unmelted porosity in the interface of the adjacent layers induced opening/tensile mode when the tensile samples parallel to the build direction were tested whereas the samples vertical to the build direction fractured in the shear mode with the grains being sheared in a slant angle. Moreover, the impact absorbed energy (IAE) of all impact specimens was significantly lower than that of the wrought CLAM steel, and the IAE of the samples vertical to the build direction was higher than that of the samples parallel to the build direction. The impact fracture surfaces revealed that the load parallel to the build layers caused laminated tearing among the layers, and the load vertical to the layers induced intergranular fracture across the layers.
Roller-gear drives for robotic manipulators design, fabrication and test
NASA Technical Reports Server (NTRS)
Anderson, William J.; Shipitalo, William
1991-01-01
Two single axis planetary roller-gear drives and a two axis roller-gear drive with dual inputs were designed for use as robotic transmissions. Each of the single axis drives is a two planet row, four planet arrangement with spur gears and compressively loaded cylindrical rollers acting in parallel. The two axis drive employs bevel gears and cone rollers acting in parallel. The rollers serve a dual function: they remove backlash from the system, and they transmit torque when the gears are not fully engaged.
Cryogenic parallel, single phase flows: an analytical approach
NASA Astrophysics Data System (ADS)
Eichhorn, R.
2017-02-01
Managing the cryogenic flows inside a state-of-the-art accelerator cryomodule has become a demanding endeavour: In order to build highly efficient modules, all heat transfers are usually intercepted at various temperatures. For a multi-cavity module, operated at 1.8 K, this requires intercepts at 4 K and at 80 K at different locations with sometimes strongly varying heat loads which for simplicity reasons are operated in parallel. This contribution will describe an analytical approach, based on optimization theories.
Shahinpoor, Mohsen
1995-01-01
A device for electromagnetically accelerating projectiles. The invention features two parallel conducting circular plates, a plurality of electrode connections to both upper and lower plates, a support base, and a projectile magazine. A projectile is spring-loaded into a firing position concentrically located between the parallel plates. A voltage source is applied to the plates to cause current to flow in directions defined by selectable, discrete electrode connections on both upper and lower plates. Repulsive Lorentz forces are generated to eject the projectile in a 360 degree range of fire.
Unni, Anirudh; Ihme, Klas; Jipp, Meike; Rieger, Jochem W.
2017-01-01
Cognitive overload or underload results in a decrease in human performance which may result in fatal incidents while driving. We envision that driver assistive systems which adapt their functionality to the driver’s cognitive state could be a promising approach to reduce road accidents due to human errors. This research attempts to predict variations of cognitive working memory load levels in a natural driving scenario with multiple parallel tasks and to reveal predictive brain areas. We used a modified version of the n-back task to induce five different working memory load levels (from 0-back up to 4-back) forcing the participants to continuously update, memorize, and recall the previous ‘n’ speed sequences and adjust their speed accordingly while they drove for approximately 60 min on a highway with concurrent traffic in a virtual reality driving simulator. We measured brain activation using multichannel whole head, high density functional near-infrared spectroscopy (fNIRS) and predicted working memory load level from the fNIRS data by combining multivariate lasso regression and cross-validation. This allowed us to predict variations in working memory load in a continuous time-resolved manner with mean Pearson correlations between induced and predicted working memory load over 15 participants of 0.61 [standard error (SE) 0.04] and a maximum of 0.8. Restricting the analysis to prefrontal sensors placed over the forehead reduced the mean correlation to 0.38 (SE 0.04), indicating additional information gained through whole head coverage. Moreover, working memory load predictions derived from peripheral heart rate parameters achieved much lower correlations (mean 0.21, SE 0.1). Importantly, whole head fNIRS sampling revealed increasing brain activation in bilateral inferior frontal and bilateral temporo-occipital brain areas with increasing working memory load levels suggesting that these areas are specifically involved in workload-related processing. PMID:28424602
Unni, Anirudh; Ihme, Klas; Jipp, Meike; Rieger, Jochem W
2017-01-01
Cognitive overload or underload results in a decrease in human performance which may result in fatal incidents while driving. We envision that driver assistive systems which adapt their functionality to the driver's cognitive state could be a promising approach to reduce road accidents due to human errors. This research attempts to predict variations of cognitive working memory load levels in a natural driving scenario with multiple parallel tasks and to reveal predictive brain areas. We used a modified version of the n-back task to induce five different working memory load levels (from 0-back up to 4-back) forcing the participants to continuously update, memorize, and recall the previous 'n' speed sequences and adjust their speed accordingly while they drove for approximately 60 min on a highway with concurrent traffic in a virtual reality driving simulator. We measured brain activation using multichannel whole head, high density functional near-infrared spectroscopy (fNIRS) and predicted working memory load level from the fNIRS data by combining multivariate lasso regression and cross-validation. This allowed us to predict variations in working memory load in a continuous time-resolved manner with mean Pearson correlations between induced and predicted working memory load over 15 participants of 0.61 [standard error (SE) 0.04] and a maximum of 0.8. Restricting the analysis to prefrontal sensors placed over the forehead reduced the mean correlation to 0.38 (SE 0.04), indicating additional information gained through whole head coverage. Moreover, working memory load predictions derived from peripheral heart rate parameters achieved much lower correlations (mean 0.21, SE 0.1). Importantly, whole head fNIRS sampling revealed increasing brain activation in bilateral inferior frontal and bilateral temporo-occipital brain areas with increasing working memory load levels suggesting that these areas are specifically involved in workload-related processing.
A PIPO Boost Converter with Low Ripple and Medium Current Application
NASA Astrophysics Data System (ADS)
Bandri, S.; Sofian, A.; Ismail, F.
2018-04-01
This paper presents a Parallel Input Parallel Output (PIPO) boost converter is proposed to gain power ability of converter, and reduce current inductors. The proposed technique will distribute current for n-parallel inductor and switching component. Four parallel boost converters implement on input voltage 20.5Vdc to generate output voltage 28.8Vdc. The PIPO boost converter applied phase shift pulse width modulation which will compare with conventional PIPO boost converters by using a similar pulse for every switching component. The current ripple reduction shows an advantage PIPO boost converter then conventional boost converter. Varies loads and duty cycle will be simulated and analyzed to verify the performance of PIPO boost converter. Finally, the unbalance of current inductor is able to be verified on four area of duty cycle in less than 0.6.
Strategies for Large Scale Implementation of a Multiscale, Multiprocess Integrated Hydrologic Model
NASA Astrophysics Data System (ADS)
Kumar, M.; Duffy, C.
2006-05-01
Distributed models simulate hydrologic state variables in space and time while taking into account the heterogeneities in terrain, surface, subsurface properties and meteorological forcings. Computational cost and complexity associated with these model increases with its tendency to accurately simulate the large number of interacting physical processes at fine spatio-temporal resolution in a large basin. A hydrologic model run on a coarse spatial discretization of the watershed with limited number of physical processes needs lesser computational load. But this negatively affects the accuracy of model results and restricts physical realization of the problem. So it is imperative to have an integrated modeling strategy (a) which can be universally applied at various scales in order to study the tradeoffs between computational complexity (determined by spatio- temporal resolution), accuracy and predictive uncertainty in relation to various approximations of physical processes (b) which can be applied at adaptively different spatial scales in the same domain by taking into account the local heterogeneity of topography and hydrogeologic variables c) which is flexible enough to incorporate different number and approximation of process equations depending on model purpose and computational constraint. An efficient implementation of this strategy becomes all the more important for Great Salt Lake river basin which is relatively large (~89000 sq. km) and complex in terms of hydrologic and geomorphic conditions. Also the types and the time scales of hydrologic processes which are dominant in different parts of basin are different. Part of snow melt runoff generated in the Uinta Mountains infiltrates and contributes as base flow to the Great Salt Lake over a time scale of decades to centuries. The adaptive strategy helps capture the steep topographic and climatic gradient along the Wasatch front. Here we present the aforesaid modeling strategy along with an associated hydrologic modeling framework which facilitates a seamless, computationally efficient and accurate integration of the process model with the data model. The flexibility of this framework leads to implementation of multiscale, multiresolution, adaptive refinement/de-refinement and nested modeling simulations with least computational burden. However, performing these simulations and related calibration of these models over a large basin at higher spatio- temporal resolutions is computationally intensive and requires use of increasing computing power. With the advent of parallel processing architectures, high computing performance can be achieved by parallelization of existing serial integrated-hydrologic-model code. This translates to running the same model simulation on a network of large number of processors thereby reducing the time needed to obtain solution. The paper also discusses the implementation of the integrated model on parallel processors. Also will be discussed the mapping of the problem on multi-processor environment, method to incorporate coupling between hydrologic processes using interprocessor communication models, model data structure and parallel numerical algorithms to obtain high performance.
Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.
2014-01-01
Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P
2014-07-01
Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arumugam, Kamesh
Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, David Robert; Fensin, Saryu Jindal; Dippo, Olivia
Here, we present a study on the spall strength of additive manufactured (AM) Ti-6Al-4V. Samples were obtained from two pieces of selective laser melted (SLM, a powder bed fusion technique) Ti-6Al-4V such that the response to dynamic tensile loading could be investigated as a function of the orientation between the build layers and the loading direction. A sample of wrought bar-stock Ti-6Al-4V was also tested to act as a baseline representing the traditionally manufactured material response. A single-stage light gas-gun was used to launch a thin flyer plate into the samples, generating a region of intense tensile stress on amore » plane normal to the impact direction. The rear free surface velocity time history of each sample was recorded with laser-based velocimetry to allow the spall strength to be calculated. The samples were also soft recovered to enable post-mortem characterization of the spall damage evolution. Results showed that when the tensile load was applied normal to the interfaces between the build layers caused by the SLM fabrication process the spall strength was drastically reduced, dropping to 60% of that of the wrought material. However, when loaded parallel to the AM build layer interfaces the spall strength was found to remain at 95% of the wrought control, suggesting that when loading normal to the AM layer interfaces, void nucleation is facilitated more readily due to weaknesses along these boundaries. Quasi-static testing of the same sample orientations revealed a much lower degree of anisotropy, demonstrating the importance of rate-dependent studies for damage evolution in AM materials.« less
Parallel operation of NH3 screw compressors - the optimum way
NASA Astrophysics Data System (ADS)
Pijnenburg, B.; Ritmann, J.
2015-08-01
The use of more smaller industrial NH3 screw compressors operating in parallel seems to offer the optimum way when it comes to fulfilling maximum part load efficiency, increased redundancy and other highly requested features in the industrial refrigeration industry today. Parallel operation in an optimum way can be selected to secure continuous operation and can in most applications be configured to ensure lower overall operating economy. New compressors are developed to meet requirements for flexibility in operation and are controlled in an intelligent way. The intelligent control system keeps focus on all external demands, but yet striving to offer always the lowest possible absorbed power, including in future scenarios with connection to smart grid.
ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms
NASA Astrophysics Data System (ADS)
Pilla, Laércio L.; Bozzetti, Tiago C.; Castro, Márcio; Navaux, Philippe O. A.; Méhaut, Jean-François
2015-10-01
Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.
Electronic system for high power load control. [solar arrays
NASA Technical Reports Server (NTRS)
Miller, E. L. (Inventor)
1980-01-01
Parallel current paths are divided into two groups, with control devices in the current paths of one group each having a current limiting resistor, and the control devices in the other group each having no limiting resistor, so that when the control devices of the second group are turned fully on, a short circuit is achieved by the arrangement of parallel current paths. Separate but coordinated control signals are provided to turn on the control devices of the first group and increase their conduction toward saturation as a function of control input, and when fully on, or shortly before, to turn on the control devices of the second group and increase their conduction toward saturation as a function of the control input as that input continues to increase. Electronic means may be used to generate signals. The system may be used for 1-V characteristic measurements of solar arrays as well as for other load control purposes.
Characterizing output bottlenecks in a supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Bing; Chase, Jeffrey; Dillow, David A
2012-01-01
Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic,more » contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.« less
Anderson, Patrick L; Mahoney, Arthur W; Webster, Robert J
2017-07-01
This paper examines shape sensing for a new class of surgical robot that consists of parallel flexible structures that can be reconfigured inside the human body. Known as CRISP robots, these devices provide access to the human body through needle-sized entry points, yet can be configured into truss-like structures capable of dexterous movement and large force application. They can also be reconfigured as needed during a surgical procedure. Since CRISP robots are elastic, they will deform when subjected to external forces or other perturbations. In this paper, we explore how to combine sensor information with mechanics-based models for CRISP robots to estimate their shapes under applied loads. The end result is a shape sensing framework for CRISP robots that will enable future research on control under applied loads, autonomous motion, force sensing, and other robot behaviors.
NASA Technical Reports Server (NTRS)
French, K. W., Jr.
1985-01-01
The flexibility of the PHOENICS computational fluid dynamics package was assessed along two general avenues; parallel modeling and analog modeling. In parallel modeling the dependent and independent variables retain their identity within some scaling factors, even though the boundary conditions and especially the constitutive relations do not correspond to any realistic fluid dynamic situation. PHOENICS was used to generate a CFD model that should exhibit the physical anomalies of a granular medium and permit reasonable similarity with boundary conditions typical to membrane or porous piston loading. A considerable portion of the study was spent prying into the existing code with a prejudice toward rate type and disarming any inherent fluid behavior. The final stages of the study were directed at the more specific problem of multiaxis loading of cylindrical geometry with a concern for the appearance of bulging, cross slab shear failure modes.
Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing
NASA Technical Reports Server (NTRS)
Sterling, T. L.; Zima, H. P.
2002-01-01
Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh, a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers, a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing. The paper concludes with a discussion of related research activities and an outlook to future work.
A high-speed linear algebra library with automatic parallelism
NASA Technical Reports Server (NTRS)
Boucher, Michael L.
1994-01-01
Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
NASA Astrophysics Data System (ADS)
Rerucha, Simon; Sarbort, Martin; Hola, Miroslava; Cizek, Martin; Hucl, Vaclav; Cip, Ondrej; Lazar, Josef
2016-12-01
The homodyne detection with only a single detector represents a promising approach in the interferometric application which enables a significant reduction of the optical system complexity while preserving the fundamental resolution and dynamic range of the single frequency laser interferometers. We present the design, implementation and analysis of algorithmic methods for computational processing of the single-detector interference signal based on parallel pipelined processing suitable for real time implementation on a programmable hardware platform (e.g. the FPGA - Field Programmable Gate Arrays or the SoC - System on Chip). The algorithmic methods incorporate (a) the single detector signal (sine) scaling, filtering, demodulations and mixing necessary for the second (cosine) quadrature signal reconstruction followed by a conic section projection in Cartesian plane as well as (a) the phase unwrapping together with the goniometric and linear transformations needed for the scale linearization and periodic error correction. The digital computing scheme was designed for bandwidths up to tens of megahertz which would allow to measure the displacements at the velocities around half metre per second. The algorithmic methods were tested in real-time operation with a PC-based reference implementation that employed the advantage pipelined processing by balancing the computational load among multiple processor cores. The results indicate that the algorithmic methods are suitable for a wide range of applications [3] and that they are bringing the fringe counting interferometry closer to the industrial applications due to their optical setup simplicity and robustness, computational stability, scalability and also a cost-effectiveness.
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.
Tam, Wing-Kin; Yang, Zhi
2018-05-01
Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Time Series Discord Detection in Medical Data using a Parallel Relational Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woodbridge, Diane; Rintoul, Mark Daniel; Wilson, Andrew T.
Recent advances in sensor technology have made continuous real-time health monitoring available in both hospital and non-hospital settings. Since data collected from high frequency medical sensors includes a huge amount of data, storing and processing continuous medical data is an emerging big data area. Especially detecting anomaly in real time is important for patients’ emergency detection and prevention. A time series discord indicates a subsequence that has the maximum difference to the rest of the time series subsequences, meaning that it has abnormal or unusual data trends. In this study, we implemented two versions of time series discord detection algorithmsmore » on a high performance parallel database management system (DBMS) and applied them to 240 Hz waveform data collected from 9,723 patients. The initial brute force version of the discord detection algorithm takes each possible subsequence and calculates a distance to the nearest non-self match to find the biggest discords in time series. For the heuristic version of the algorithm, a combination of an array and a trie structure was applied to order time series data for enhancing time efficiency. The study results showed efficient data loading, decoding and discord searches in a large amount of data, benefiting from the time series discord detection algorithm and the architectural characteristics of the parallel DBMS including data compression, data pipe-lining, and task scheduling.« less
Time Series Discord Detection in Medical Data using a Parallel Relational Database [PowerPoint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woodbridge, Diane; Wilson, Andrew T.; Rintoul, Mark Daniel
Recent advances in sensor technology have made continuous real-time health monitoring available in both hospital and non-hospital settings. Since data collected from high frequency medical sensors includes a huge amount of data, storing and processing continuous medical data is an emerging big data area. Especially detecting anomaly in real time is important for patients’ emergency detection and prevention. A time series discord indicates a subsequence that has the maximum difference to the rest of the time series subsequences, meaning that it has abnormal or unusual data trends. In this study, we implemented two versions of time series discord detection algorithmsmore » on a high performance parallel database management system (DBMS) and applied them to 240 Hz waveform data collected from 9,723 patients. The initial brute force version of the discord detection algorithm takes each possible subsequence and calculates a distance to the nearest non-self match to find the biggest discords in time series. For the heuristic version of the algorithm, a combination of an array and a trie structure was applied to order time series data for enhancing time efficiency. The study results showed efficient data loading, decoding and discord searches in a large amount of data, benefiting from the time series discord detection algorithm and the architectural characteristics of the parallel DBMS including data compression, data pipe-lining, and task scheduling.« less
Anatomically constrained neural network models for the categorization of facial expression
NASA Astrophysics Data System (ADS)
McMenamin, Brenton W.; Assadi, Amir H.
2004-12-01
The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
Anatomically constrained neural network models for the categorization of facial expression
NASA Astrophysics Data System (ADS)
McMenamin, Brenton W.; Assadi, Amir H.
2005-01-01
The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
Crosetto, D.B.
1996-12-31
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Crosetto, Dario B.
1996-01-01
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Dynamic load balance scheme for the DSMC algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jin; Geng, Xiangren; Jiang, Dingwu
The direct simulation Monte Carlo (DSMC) algorithm, devised by Bird, has been used over a wide range of various rarified flow problems in the past 40 years. While the DSMC is suitable for the parallel implementation on powerful multi-processor architecture, it also introduces a large load imbalance across the processor array, even for small examples. The load imposed on a processor by a DSMC calculation is determined to a large extent by the total of simulator particles upon it. Since most flows are impulsively started with initial distribution of particles which is surely quite different from the steady state, themore » total of simulator particles will change dramatically. The load balance based upon an initial distribution of particles will break down as the steady state of flow is reached. The load imbalance and huge computational cost of DSMC has limited its application to rarefied or simple transitional flows. In this paper, by taking advantage of METIS, a software for partitioning unstructured graphs, and taking the total of simulator particles in each cell as a weight information, the repartitioning based upon the principle that each processor handles approximately the equal total of simulator particles has been achieved. The computation must pause several times to renew the total of simulator particles in each processor and repartition the whole domain again. Thus the load balance across the processors array holds in the duration of computation. The parallel efficiency can be improved effectively. The benchmark solution of a cylinder submerged in hypersonic flow has been simulated numerically. Besides, hypersonic flow past around a complex wing-body configuration has also been simulated. The results have displayed that, for both of cases, the computational time can be reduced by about 50%.« less
Super and parallel computers and their impact on civil engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamat, M.P.
1986-01-01
This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.
NASA Technical Reports Server (NTRS)
Hsia, T. C.; Lu, G. Z.; Han, W. H.
1987-01-01
In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
Performance evaluation of canny edge detection on a tiled multicore architecture
NASA Astrophysics Data System (ADS)
Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald
2011-01-01
In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Use of parallel computing in mass processing of laser data
NASA Astrophysics Data System (ADS)
Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.
2015-12-01
The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.
NASA Technical Reports Server (NTRS)
Nagano, S. (Inventor)
1979-01-01
A module failure isolation circuit is described which senses and averages the collector current of each paralled inverter power transistor and compares the collector current of each power transistor the average collector current of all power transistors to determine when the sensed collector current of a power transistor in any one inverter falls below a predetermined ratio of the average collector current. The module associated with any transistor that fails to maintain a current level above the predetermined radio of the average collector current is then shut off. A separate circuit detects when there is no load, or a light load, to inhibit operation of the isolation circuit during no load or light load conditions.
Ouellet, Jean A.; Richards, Corey; Sardar, Zeeshan M.; Giannitsios, Demetri; Noiseux, Nicholas; Strydom, Willem S.; Reindl, Rudy; Jarzem, Peter; Arlet, Vincent; Steffen, Thomas
2013-01-01
The ideal treatment for unstable thoracolumbar fractures remains controversial with posterior reduction and stabilization, anterior reduction and stabilization, combined posterior and anterior reduction and stabilization, and even nonoperative management advocated. Short segment posterior osteosynthesis of these fractures has less comorbidities compared with the other operative approaches but settles into kyphosis over time. Biomechanical comparison of the divergent bridge construct versus the parallel tension band construct was performed for anteriorly destabilized T11–L1 spine segments using three different models: (1) finite element analysis (FEA), (2) a synthetic model, and (3) a human cadaveric model. Outcomes measured were construct stiffness and ultimate failure load. Our objective was to determine if the divergent pedicle screw bridge construct would provide more resistance to kyphotic deforming forces. All three modalities showed greater stiffness with the divergent bridge construct. The FEA calculated a stiffness of 21.6 N/m for the tension band construct versus 34.1 N/m for the divergent bridge construct. The synthetic model resulted in a mean stiffness of 17.3 N/m for parallel tension band versus 20.6 N/m for the divergent bridge (p = 0.03), whereas the cadaveric model had an average stiffness of 15.2 N/m in the parallel tension band compared with 18.4 N/m for the divergent bridge (p = 0.02). Ultimate failure load with the cadaveric model was found to be 622 N for the divergent bridge construct versus 419 N (p = 0.15) for the parallel tension band construct. This study confirms our clinical experience that the short posterior divergent bridge construct provides greater stiffness for the management of unstable thoracolumbar fractures. PMID:24436856
Microencapsulation of puerarin nanoparticles by poly(l-lactide) in a supercritical CO(2) process.
Chen, Ai-Zheng; Li, Yi; Chau, Foo-Tim; Lau, Tsui-Yan; Hu, Jun-Yan; Zhao, Zheng; Mok, Daniel Kam-Wah
2009-10-01
Puerarin nanoparticles were firstly prepared in the process of solution-enhanced dispersion by supercritical CO(2) (SEDS) and then successfully microencapsulated by poly(l-lactide) (PLLA) in a modified SEDS process. By adding an organic non-solvent, an initial puerarin solution with a higher degree of saturation and lower concentration was obtained and applied in the SEDS process. The resulting puerarin nanoparticles were then suspended in PLLA solution and microencapsulated by PLLA in a modified SEDS process, where an 'injector' was employed in the particle suspension delivery system. The puerarin nanoparticles exhibited a good spherical shape, a smooth surface and a narrow particle size distribution with a mean particle size of 188 nm. After microencapsulation the puerarin-PLLA microparticles had a mean size of 675 nm, a drug load of 23.6% and an encapsulation efficiency of 39.4%; after a burst release at the first stage, the drug was released in a sustained process. Compared with the parallel study of a co-precipitation process, this microencapsulation process is a much more promising technique to prepare a drug-polymer carrier for a drug delivery system, especially for protein drugs.
Effect of ion-neutral collisions on the evolution of kinetic Alfvén waves in plasmas
NASA Astrophysics Data System (ADS)
Goyal, R.; Sharma, R. P.
2018-03-01
This paper studies the effect of ion-neutral collisions on the propagation of kinetic Alfvén waves (KAWs) in inhomogeneous magnetized plasma. The inhomogeneity in the plasma imposed by background density in a direction transverse as well as parallel to the ambient magnetic field plays a vital role in the localization process. The mass loading of ions takes place due to their collisions with neutral fluid leading to the damping of the KAWs. Numerical analysis of linear KAWs in inhomogeneous magnetized plasma is done for a fixed finite frequency taking into consideration the ion-neutral collisions. There is a prominent effect of collisional damping on the wave localization, wave magnetic field, and frequency spectrum. A semi-analytical technique has been employed to study the magnetic field amplitude decay process and the effect of wave frequency in the range of ion cyclotron frequency on the propagation of waves leading to damping.
Li, Bingyi; Chen, Liang; Yu, Wenyue; Xie, Yizhuang; Bian, Mingming; Zhang, Qingjun; Pang, Long
2018-01-01
With the development of satellite load technology and very large-scale integrated (VLSI) circuit technology, on-board real-time synthetic aperture radar (SAR) imaging systems have facilitated rapid response to disasters. A key goal of the on-board SAR imaging system design is to achieve high real-time processing performance under severe size, weight, and power consumption constraints. This paper presents a multi-node prototype system for real-time SAR imaging processing. We decompose the commonly used chirp scaling (CS) SAR imaging algorithm into two parts according to the computing features. The linearization and logic-memory optimum allocation methods are adopted to realize the nonlinear part in a reconfigurable structure, and the two-part bandwidth balance method is used to realize the linear part. Thus, float-point SAR imaging processing can be integrated into a single Field Programmable Gate Array (FPGA) chip instead of relying on distributed technologies. A single-processing node requires 10.6 s and consumes 17 W to focus on 25-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384. The design methodology of the multi-FPGA parallel accelerating system under the real-time principle is introduced. As a proof of concept, a prototype with four processing nodes and one master node is implemented using a Xilinx xc6vlx315t FPGA. The weight and volume of one single machine are 10 kg and 32 cm × 24 cm × 20 cm, respectively, and the power consumption is under 100 W. The real-time performance of the proposed design is demonstrated on Chinese Gaofen-3 stripmap continuous imaging. PMID:29495637
Commercial absorption chiller models for evaluation of control strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koeppel, E.A.; Klein, S.A.; Mitchell, J.W.
1995-08-01
A steady-state computer simulation model of a direct fired double-effect water-lithium bromide absorption chiller in the parallel-flow configuration was developed from first principles. Unknown model parameters such as heat transfer coefficients were determined by matching the model`s calculated state points and coefficient of performance (COP) against nominal full-load operating data and COPs obtained from a manufacturer`s catalog. The model compares favorably with the manufacturer`s performance ratings for varying water circuit (chilled and cooling) temperatures at full load conditions and for chiller part-load performance. The model was used (1) to investigate the effect of varying the water circuit flow rates withmore » the chiller load and (2) to optimize chiller part-load performance with respect to the distribution and flow of the weak solution.« less
Effect of laser shock processing on fatigue life of 2205 duplex stainless steel notched specimens
NASA Astrophysics Data System (ADS)
Vázquez Jiménez, César A.; Gómez Rosas, Gilberto; Rubio González, Carlos; Granados Alejo, Vignaud; Hereñú, Silvina
2017-12-01
The effect laser shock processing (LSP) on high cycle fatigue behavior of 2205 duplex stainless steel (DSS) notched samples was investigated. The swept direction parallel (LSP 1) and perpendicular (LSP 2) to rolling were used in order to examine the sensitivity of LSP to manufacturing process since this steel present significantly anisotropy. The Nd:YAG pulsed laser operating at 10 Hz frequency and 1064 nm wavelength was utilized. The LSP configuration was the water jet mode without protective coating. Notched specimens 4 mm thick were treated on both sides, and then fatigue loading was applied with R = 0.1. The results showed that the LSP 2 condition induces higher compressive residual stresses as well as a higher fatigue life than the LSP 1 condition. By applying LSP 2 condition, an enhancement of fatigue life up to 402% is reported. In addition, the microhardness profiles showed different depths of hardening layer for each direction, according to the anisotropy observed.
Stricker, Anne-Emmanuelle; Barrie, Ashley; Maas, Carol L A; Fernandes, William; Lishman, Lori
2009-03-01
A full-scale demonstration of an integrated fixed-film activated sludge (IFFAS) process with floating carriers has been conducted in Ontario, Canada, since August 2003. In this study, data collected on-site from July 2005 to December 2006 are analyzed and compared with the performance of a conventional activated sludge train operated in parallel. Both trains received similar loadings and maintained comparable mixed liquor concentrations; however, the IFFAS had 50% more biomass when the attached growth was considered. In the winter, the conventional train operated at the critical solids retention time (SRT) and had fluctuating partial nitrification. The IFFAS nitrified more consistently and had a doubled average capacity. In the summer, the suspended SRT was less limiting, and the benefit of IFFAS for nitrification was marginal. The lessons learned from the operational requirements and challenges of the IFFAS process (air flow, carrier management, and seasonal foaming) are discussed, and design recommendations are proposed for whole plant retrofit.
Biostratinomic processes for the development of mud-cast logs in Carboniferous and Holocene swamps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gastaldo, R.A.; Demko, T.M.; Liu, Yuejin
1989-08-01
Prostrate trees are common features of fossil forest litters, and are frequently preserved as mud-casts. Specimens of Carboniferous mud-cast trees and a mud-filled incipient cast of a Holocene Taxodium have been investigated to determine the biostratinomic processes responsible for their formation. These processes are complex. Hollowing of tree trunks may take place during life or by degradation after death. Once the trunk has fallen, the hollow cavity is supported by surrounding wood and/or bark tissues and acts as a conduit for sediment-laden waters. Leaf litter may be preserved on bedding surfaces. The infilling sequence of horizontal, parallel bedded, fine-grained sedimentmore » is deposited from suspended load during multiple overbank flooding events. These results differ from experimentally produced pith casts in which the sediment grain size is of fine sand. In Holocene specimens, alluvial mud within the log may provide a substrate for infaunal invertebrates. No evidence of infaunal burrowing in Carboniferous analogues exists.« less
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units
USDA-ARS?s Scientific Manuscript database
This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
NASA Astrophysics Data System (ADS)
Wang, Yaping; Lin, Shunjiang; Yang, Zhibin
2017-05-01
In the traditional three-phase power flow calculation of the low voltage distribution network, the load model is described as constant power. Since this model cannot reflect the characteristics of actual loads, the result of the traditional calculation is always different from the actual situation. In this paper, the load model in which dynamic load represented by air conditioners parallel with static load represented by lighting loads is used to describe characteristics of residents load, and the three-phase power flow calculation model is proposed. The power flow calculation model includes the power balance equations of three-phase (A,B,C), the current balance equations of phase 0, and the torque balancing equations of induction motors in air conditioners. And then an alternating iterative algorithm of induction motor torque balance equations with each node balance equations is proposed to solve the three-phase power flow model. This method is applied to an actual low voltage distribution network of residents load, and by the calculation of three different operating states of air conditioners, the result demonstrates the effectiveness of the proposed model and the algorithm.
Efficient parallelization for AMR MHD multiphysics calculations; implementation in AstroBEAR
NASA Astrophysics Data System (ADS)
Carroll-Nellenback, Jonathan J.; Shroyer, Brandon; Frank, Adam; Ding, Chen
2013-03-01
Current adaptive mesh refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (>80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.
Live Load Response of Short Span Bridges with Parallam(R) Decks
DOT National Transportation Integrated Search
2007-01-01
Structural Composite Lumber (SCL) is reconstituted with high grade presorted veneers to enhance properties including higher and more uniform strength and stiffness than conventional lumber. Parallel Strand Lumber (PSL) is mainly constituted of wood s...
Choe, Leila H; Lee, Kelvin H
2003-10-01
We investigate one approach to assess the quantitative variability in two-dimensional gel electrophoresis (2-DE) separations based on gel-to-gel variability, sample preparation variability, sample load differences, and the effect of automation on image analysis. We observe that 95% of spots present in three out of four replicate gels exhibit less than a 0.52 coefficient of variation (CV) in fluorescent stain intensity (% volume) for a single sample run on multiple gels. When four parallel sample preparations are performed, this value increases to 0.57. We do not observe any significant change in quantitative value for an increase or decrease in sample load of 30% when using appropriate image analysis variables. Increasing use of automation, while necessary in modern 2-DE experiments, does change the observed level of quantitative and qualitative variability among replicate gels. The number of spots that change qualitatively for a single sample run in parallel varies from a CV = 0.03 for fully manual analysis to CV = 0.20 for a fully automated analysis. We present a systematic method by which a single laboratory can measure gel-to-gel variability using only three gel runs.
Theory of the deformation of aligned polyethylene.
Hammad, A; Swinburne, T D; Hasan, H; Del Rosso, S; Iannucci, L; Sutton, A P
2015-08-08
Solitons are proposed as the agents of plastic and viscoelastic deformation in aligned polyethylene. Interactions between straight, parallel molecules are mapped rigorously onto the Frenkel-Kontorova model. It is shown that these molecular interactions distribute an applied load between molecules, with a characteristic transfer length equal to the soliton width. Load transfer leads to the introduction of tensile and compressive solitons at the chain ends to mark the onset of plasticity at a well-defined yield stress, which is much less than the theoretical pull-out stress. Interaction energies between solitons and an equation of motion for solitons are derived. The equation of motion is based on Langevin dynamics and the fluctuation-dissipation theorem and it leads to the rigorous definition of an effective mass for solitons. It forms the basis of a soliton dynamics in direct analogy to dislocation dynamics. Close parallels are drawn between solitons in aligned polymers and dislocations in crystals, including the configurational force on a soliton. The origins of the strain rate and temperature dependencies of the viscoelastic behaviour are discussed in terms of the formation energy of solitons. A failure mechanism is proposed involving soliton condensation under a tensile load.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shah, Nihar; Wei, Max; Letschert, Virginie
2015-10-01
Hydrofluorocarbons (HFCs) emitted from uses such as refrigerants and thermal insulating foam, are now the fastest growing greenhouse gases (GHGs), with global warming potentials (GWP) thousands of times higher than carbon dioxide (CO2). Because of the short lifetime of these molecules in the atmosphere, mitigating the amount of these short-lived climate pollutants (SLCPs) provides a faster path to climate change mitigation than control of CO2 alone. This has led to proposals from Africa, Europe, India, Island States, and North America to amend the Montreal Protocol on Substances that Deplete the Ozone Layer (Montreal Protocol) to phase-down high-GWP HFCs. Simultaneously, energymore » efficiency market transformation programs such as standards, labeling and incentive programs are endeavoring to improve the energy efficiency for refrigeration and air conditioning equipment to provide life cycle cost, energy, GHG, and peak load savings. In this paper we provide an estimate of the magnitude of such GHG and peak electric load savings potential, for room air conditioning, if the refrigerant transition and energy efficiency improvement policies are implemented either separately or in parallel. We find that implementing HFC refrigerant transition and energy efficiency improvement policies in parallel for room air conditioning, roughly doubles the benefit of either policy implemented separately. We estimate that shifting the 2030 world stock of room air conditioners from the low efficiency technology using high-GWP refrigerants to higher efficiency technology and low-GWP refrigerants in parallel would save between 340-790 gigawatts (GW) of peak load globally, which is roughly equivalent to avoiding 680-1550 peak power plants of 500MW each. This would save 0.85 GT/year annually in China equivalent to over 8 Three Gorges dams and over 0.32 GT/year annually in India equivalent to roughly twice India’s 100GW solar mission target. While there is some uncertainty associated with emissions and growth projections, moving to efficient room air conditioning (~30% more efficient than current technology) in parallel with low-GWP refrigerants in room air conditioning could avoid up to ~25 billion tonnes of CO2 in 2030, ~33 billion in 2040, and ~40 billion in 2050, i.e. cumulative savings up to 98 billion tonnes of CO2 by 2050. Therefore, superefficient room ACs using low-GWP refrigerants merit serious consideration to maximize peak load reduction and GHG savings.« less
Control system for a wound-rotor motor
Ellis, James N.
1983-01-01
A load switching circuit for switching two or more transformer taps under load carrying conditions includes first and second parallel connected bridge rectifier circuits which control the selective connection of a direct current load to taps of a transformer. The first bridge circuit is normally conducting so that the load is connected to a first tap through the first bridge circuit. To transfer the load to the second tap, a switch is operable to connect the second bridge circuit to a second tap, and when the second bridge circuit begins to conduct, the first bridge circuit ceases conduction because the potential at the second tap is higher than the potential at the first tap, and the load is thus connected to the second tap through the second bridge circuit. The load switching circuit is applicable in a motor speed controller for a wound-rotor motor for effecting tap switching as a function of motor speed while providing a stepless motor speed control characteristic.
Massively parallel information processing systems for space applications
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1979-01-01
NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
Grider, Gary A.; Poole, Stephen W.
2015-09-01
Collective buffering and data pattern solutions are provided for storage, retrieval, and/or analysis of data in a collective parallel processing environment. For example, a method can be provided for data storage in a collective parallel processing environment. The method comprises receiving data to be written for a plurality of collective processes within a collective parallel processing environment, extracting a data pattern for the data to be written for the plurality of collective processes, generating a representation describing the data pattern, and saving the data and the representation.
schwimmbad: A uniform interface to parallel processing pools in Python
NASA Astrophysics Data System (ADS)
Price-Whelan, Adrian M.; Foreman-Mackey, Daniel
2017-09-01
Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).
Characterization of Damage in Triaxial Braid Composites Under Tensile Loading
NASA Technical Reports Server (NTRS)
Littell, Justin D.; Binienda, Wieslaw K.; Roberts, Gary D.; Goldberg, Robert K.
2009-01-01
Carbon fiber composites utilizing flattened, large tow yarns in woven or braided forms are being used in many aerospace applications. Their complex fiber architecture and large unit cell size present challenges in both understanding deformation processes and measuring reliable material properties. This report examines composites made using flattened 12k and 24k standard modulus carbon fiber yarns in a 0 /+60 /-60 triaxial braid architecture. Standard straight-sided tensile coupons are tested with the 0 axial braid fibers either parallel with or perpendicular to the applied tensile load (axial or transverse tensile test, respectively). Nonuniform surface strain resulting from the triaxial braid architecture is examined using photogrammetry. Local regions of high strain concentration are examined to identify where failure initiates and to determine the local strain at the time of initiation. Splitting within fiber bundles is the first failure mode observed at low to intermediate strains. For axial tensile tests splitting is primarily in the 60 bias fibers, which were oriented 60 to the applied load. At higher strains, out-of-plane deformation associated with localized delamination between fiber bundles or damage within fiber bundles is observed. For transverse tensile tests, the splitting is primarily in the 0 axial fibers, which were oriented transverse to the applied load. The initiation and accumulation of local damage causes the global transverse stress-strain curves to become nonlinear and causes failure to occur at a reduced ultimate strain. Extensive delamination at the specimen edges is also observed.
NASA Astrophysics Data System (ADS)
Abdussalam, Ragba Mohamed
Thin-walled cylinders are used extensively in the food packaging and cosmetics industries. The cost of material is a major contributor to the overall cost and so improvements in design and manufacturing processes are always being sought. Shape optimisation provides one method for such improvements. Aluminium aerosol cans are a particular form of thin-walled cylinder with a complex shape consisting of truncated cone top, parallel cylindrical section and inverted dome base. They are manufactured in one piece by a reverse-extrusion process, which produces a vessel with a variable thickness from 0.31 mm in the cylinder up to 1.31 mm in the base for a 53 mm diameter can. During manufacture, packaging and charging, they are subjected to pressure, axial and radial loads and design calculations are generally outside the British and American pressure vessel codes. 'Design-by-test' appears to be the favoured approach. However, a more rigorous approach is needed in order to optimise the designs. Finite element analysis (FEA) is a powerful tool for predicting stress, strain and displacement behaviour of components and structures. FEA is also used extensively to model manufacturing processes. In this study, elastic and elastic-plastic FEA has been used to develop a thorough understanding of the mechanisms of yielding, 'dome reversal' (an inherent safety feature, where the base suffers elastic-plastic buckling at a pressure below the burst pressure) and collapse due to internal pressure loading and how these are affected by geometry. It has also been used to study the buckling behaviour under compressive axial loading. Furthermore, numerical simulations of the extrusion process (in order to investigate the effects of tool geometry, friction coefficient and boundary conditions) have been undertaken. Experimental verification of the buckling and collapse behaviours has also been carried out and there is reasonable agreement between the experimental data and the numerical predictions.
Wang, Tao; Zhang, Diandian; Sun, Yating; Zhou, Shanshan; Li, Lin; Shao, Jingjing
2018-04-01
A lab-scale ultrasound enhancing Anammox reactor (R1) was established and irradiated once a week by ultrasound with the optimal parameter (frequency of 25 kHz, intensity of 0.2 W cm -2 and exposure time of 3 min) obtained by batch experiments. R1 and the controlled Anammox reactor (R2) without exposure to the ultrasound were operated in parallel. The start-up period of Anammox process (53 days) in R1 was shorter than that (61 days) in R2. The nitrogen loading-enhancing period (day 53-day 135) in R1 was also shorter than that (day 61-day 151) in R2. At the end of the nitrogen loading-enhancing period, NLR (0.76 kg N m -3 d -1 ) and NRR (0.68 kg N m -3 d -1 ) of R1 were both higher than NLR (0.66 kg N m -3 d -1 ) and NRR (0.56 kg N m -3 d -1 ) of R2. Moreover, The stability of Anammox process in R1 was better than that in R2. The results demonstrated that the periodical irradiation of ultrasound enhanced the start-up and operational performance of Anammox reactor. Microbial community analysis indicated that the ultrasound accelerated the microbial succession from some other bacteria to Anammox bacteria so that shorten the start-up period of Anammox process from the conventional activated sludge. It also indicated that the ultrasound strengthened the competitive advantage of Candidatus Kuenenia stuttgartiensis in Anammox bacteria of the mature sludge so as to enhance the nitrogen removal performance of the Anammox reactor under the operation condition of high nitrogen loading. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Kleinberg, L. L. (Inventor)
1984-01-01
A bandpass amplifier employing a field effect transistor amplifier first stage is described with a resistive load either a.c. or directly coupled to the non-inverting input of an operational amplifier second stage which is loaded in a Wien Bridge configuration. The bandpass amplifier may be operated with a signal injected into the gate terminal of the field effect transistor and the signal output taken from the output terminal of the operational amplifier. The operational amplifier stage appears as an inductive reactance, capacitive reactance and negative resistance at the non-inverting input of the operational amplifier, all of which appear in parallel with the resistive load of the field effect transistor.
Apparatus for combinatorial screening of electrochemical materials
Kepler, Keith Douglas [Belmont, CA; Wang, Yu [Foster City, CA
2009-12-15
A high throughput combinatorial screening method and apparatus for the evaluation of electrochemical materials using a single voltage source (2) is disclosed wherein temperature changes arising from the application of an electrical load to a cell array (1) are used to evaluate the relative electrochemical efficiency of the materials comprising the array. The apparatus may include an array of electrochemical cells (1) that are connected to each other in parallel or in series, an electronic load (2) for applying a voltage or current to the electrochemical cells (1), and a device (3), external to the cells, for monitoring the relative temperature of each cell when the load is applied.
Efficient transformer for electromagnetic waves
Miller, R.B.
A transformer structure for efficient transfer of electromagnetic energy from a transmission line to an unmatched load provides voltage multiplication and current division by a predetermined constant. Impedance levels are transformed by the square of that constant. The structure includes a wave splitter, connected to an input transmission device and to a plurality of output transmission devices. The output transmission devices are effectively connected in parallel to the input transmission device. The output transmission devices are effectively series connected to provide energy to a load. The transformer structure is particularly effective in increasing efficiency of energy transfer through an inverting convolute structure by capturing and transferring energy losses from the inverter to the load.
BALANCING THE LOAD: A VORONOI BASED SCHEME FOR PARALLEL COMPUTATIONS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinberg, Elad; Yalinewich, Almog; Sari, Re'em
2015-01-01
One of the key issues when running a simulation on multiple CPUs is maintaining a proper load balance throughout the run and minimizing communications between CPUs. We propose a novel method of utilizing a Voronoi diagram to achieve a nearly perfect load balance without the need of any global redistributions of data. As a show case, we implement our method in RICH, a two-dimensional moving mesh hydrodynamical code, but it can be extended trivially to other codes in two or three dimensions. Our tests show that this method is indeed efficient and can be used in a large variety ofmore » existing hydrodynamical codes.« less
NASA Technical Reports Server (NTRS)
Arbocz, Johann; Hol, J. M. A. M.; deVries, J.
1998-01-01
A rigorous solution is presented for the case of stiffened anisotropic cylindrical shells with general imperfections under combined loading, where the edge supports are provided by symmetrical or unsymmetrical elastic rings. The circumferential dependence is eliminated by a truncated Fourier series. The resulting nonlinear 2-point boundary value problem is solved numerically via the "Parallel Shooting Method". The changing deformation patterns resulting from the different degrees of interaction between the given initial imperfections and the specified end rings are displayed. Recommendations are made as to the minimum ring stiffnesses required for optimal load carrying configurations.
Parallel Signal Processing and System Simulation using aCe
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2003-01-01
Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
Read, S J; Vanman, E J; Miller, L C
1997-01-01
We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.
Short-Term Load Forecasting Based Automatic Distribution Network Reconfiguration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Ding, Fei; Zhang, Yingchen
In a traditional dynamic network reconfiguration study, the optimal topology is determined at every scheduled time point by using the real load data measured at that time. The development of the load forecasting technique can provide an accurate prediction of the load power that will happen in a future time and provide more information about load changes. With the inclusion of load forecasting, the optimal topology can be determined based on the predicted load conditions during a longer time period instead of using a snapshot of the load at the time when the reconfiguration happens; thus, the distribution system operatormore » can use this information to better operate the system reconfiguration and achieve optimal solutions. This paper proposes a short-term load forecasting approach to automatically reconfigure distribution systems in a dynamic and pre-event manner. Specifically, a short-term and high-resolution distribution system load forecasting approach is proposed with a forecaster based on support vector regression and parallel parameters optimization. The network reconfiguration problem is solved by using the forecasted load continuously to determine the optimal network topology with the minimum amount of loss at the future time. The simulation results validate and evaluate the proposed approach.« less
Short-Term Load Forecasting Based Automatic Distribution Network Reconfiguration: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Ding, Fei; Zhang, Yingchen
In the traditional dynamic network reconfiguration study, the optimal topology is determined at every scheduled time point by using the real load data measured at that time. The development of load forecasting technique can provide accurate prediction of load power that will happen in future time and provide more information about load changes. With the inclusion of load forecasting, the optimal topology can be determined based on the predicted load conditions during the longer time period instead of using the snapshot of load at the time when the reconfiguration happens, and thus it can provide information to the distribution systemmore » operator (DSO) to better operate the system reconfiguration to achieve optimal solutions. Thus, this paper proposes a short-term load forecasting based approach for automatically reconfiguring distribution systems in a dynamic and pre-event manner. Specifically, a short-term and high-resolution distribution system load forecasting approach is proposed with support vector regression (SVR) based forecaster and parallel parameters optimization. And the network reconfiguration problem is solved by using the forecasted load continuously to determine the optimal network topology with the minimum loss at the future time. The simulation results validate and evaluate the proposed approach.« less
Short-Term Load Forecasting-Based Automatic Distribution Network Reconfiguration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Ding, Fei; Zhang, Yingchen
In a traditional dynamic network reconfiguration study, the optimal topology is determined at every scheduled time point by using the real load data measured at that time. The development of the load forecasting technique can provide an accurate prediction of the load power that will happen in a future time and provide more information about load changes. With the inclusion of load forecasting, the optimal topology can be determined based on the predicted load conditions during a longer time period instead of using a snapshot of the load at the time when the reconfiguration happens; thus, the distribution system operatormore » can use this information to better operate the system reconfiguration and achieve optimal solutions. This paper proposes a short-term load forecasting approach to automatically reconfigure distribution systems in a dynamic and pre-event manner. Specifically, a short-term and high-resolution distribution system load forecasting approach is proposed with a forecaster based on support vector regression and parallel parameters optimization. The network reconfiguration problem is solved by using the forecasted load continuously to determine the optimal network topology with the minimum amount of loss at the future time. The simulation results validate and evaluate the proposed approach.« less
NASA Astrophysics Data System (ADS)
Cura, Rémi; Perret, Julien; Paparoditis, Nicolas
2017-05-01
In addition to more traditional geographical data such as images (rasters) and vectors, point cloud data are becoming increasingly available. Such data are appreciated for their precision and true three-Dimensional (3D) nature. However, managing point clouds can be difficult due to scaling problems and specificities of this data type. Several methods exist but are usually fairly specialised and solve only one aspect of the management problem. In this work, we propose a comprehensive and efficient point cloud management system based on a database server that works on groups of points (patches) rather than individual points. This system is specifically designed to cover the basic needs of point cloud users: fast loading, compressed storage, powerful patch and point filtering, easy data access and exporting, and integrated processing. Moreover, the proposed system fully integrates metadata (like sensor position) and can conjointly use point clouds with other geospatial data, such as images, vectors, topology and other point clouds. Point cloud (parallel) processing can be done in-base with fast prototyping capabilities. Lastly, the system is built on open source technologies; therefore it can be easily extended and customised. We test the proposed system with several billion points obtained from Lidar (aerial and terrestrial) and stereo-vision. We demonstrate loading speeds in the ˜50 million pts/h per process range, transparent-for-user and greater than 2 to 4:1 compression ratio, patch filtering in the 0.1 to 1 s range, and output in the 0.1 million pts/s per process range, along with classical processing methods, such as object detection.
David, Arthur; Perrin, Jean-Louis; Rosain, David; Rodier, Claire; Picot, Bernadette; Tournoud, Marie-George
2011-10-01
The aim of this study was to better understand the fate of nutrients discharged by sewage treatment plants into an intermittent Mediterranean river, during a low-flow period. Many pollutants stored in the riverbed during the low-flow period can be transferred to the downstream environments during flood events. The study focused on two processes that affect the fate and the transport of nutrients, a physical process (retention in the riverbed sediments) and a biological process (denitrification). A spatial campaign was carried out during a low-flow period to characterize the nutrient contents of both water and sediments in the Vène River. The results showed high nutrient concentrations in the water column downstream of the treated wastewater disposal (up to 13,315 μg N/L for ammonium and 2,901 μg P/L for total phosphorus). Nutrient concentrations decreased rapidly downstream of the disposal whereas nutrient contents in the sediments increased (up to 1,898 and 784 μg/g for total phosphorus and Kjeldahl nitrogen, respectively). According to an in situ experiment using sediment boxes placed in the riverbed for 85 days, we estimated that the proportion of nutrients trapped in the sediments represents 25% (respectively 10%) of phosphorus (respectively nitrogen) loads lost from the water column. In parallel, laboratory tests indicated that denitrification occurred in the Vène River, and we estimated that denitrification likely coupled to nitrification processes during the 85 days of the experiment was significantly involved in the removal of nitrogen loads (up to 38%) from the water column and was greater than accumulation processes.
Using Parallel Processing for Problem Solving.
1979-12-01
are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language
NASA Astrophysics Data System (ADS)
Akil, Mohamed
2017-05-01
The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Design of High Field Solenoids made of High Temperature Superconductors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bartalesi, Antonio; /Pisa U.
2010-12-01
This thesis starts from the analytical mechanical analysis of a superconducting solenoid, loaded by self generated Lorentz forces. Also, a finite element model is proposed and verified with the analytical results. To study the anisotropic behavior of a coil made by layers of superconductor and insulation, a finite element meso-mechanic model is proposed and designed. The resulting material properties are then used in the main solenoid analysis. In parallel, design work is performed as well: an existing Insert Test Facility (ITF) is adapted and structurally verified to support a coil made of YBa{sub 2}Cu{sub 3}O{sub 7}, a High Temperature Superconductormore » (HTS). Finally, a technological winding process was proposed and the required tooling is designed.« less
Image Processing Using a Parallel Architecture.
1987-12-01
ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than
Dynamic Electromechanical Characterization of the Ferroelectric Ceramic PZT 95/5
NASA Astrophysics Data System (ADS)
Setchell, R. E.; Chhabildas, L. C.; Furnish, M. D.; Montgomery, S. T.; Holman, G. T.
1997-07-01
Shock-induced depoling of the ferroelectric ceramic PZT 95/5 has been utilized in a number of pulsed power applications. The dynamic behavior of the poled ceramic is complex, with nonlinear coupling between mechanical and electrical variables. Recent efforts to improve numerical simulations of this process have been limited by the scarcity of relevant experimental studies within the last twenty years. Consequently, we have initiated an extensive experimental study of the dynamic electromechanical behavior of this material. Samples of the poled ceramic are shocked to axial stresses from 0.5 to 5 GPa in planar impact experiments and observed with laser interferometry (VISAR) to obtain transmitted wave profiles. Current generation due to shock-induced depoling is observed using different external loads to vary electric field strengths within the samples. Experimental configurations either have the remanent polarization parallel to the direction of shock motion (axially poled) or perpendicular (normally poled). Initial experiments on unpoled samples utilized PVDF stress gauges as well as VISAR, and extended prior data on shock loading and release behavior. (Supported by the U. S. Department of Energy under contract DE-AC04-94AL85000). abstract.
[Parallel virtual reality visualization of extreme large medical datasets].
Tang, Min
2010-04-01
On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.
NASA Technical Reports Server (NTRS)
Sanger, Eugen
1932-01-01
A method is presented for approximate static calculation, which is based on the customary assumption of rigid ribs, while taking into account the systematic errors in the calculation results due to this arbitrary assumption. The procedure is given in greater detail for semicantilever and cantilever wings with polygonal spar plan form and for wings under direct loading only. The last example illustrates the advantages of the use of influence lines for such wing structures and their practical interpretation.
Robust synchronization of spin-torque oscillators with an LCR load.
Pikovsky, Arkady
2013-09-01
We study dynamics of a serial array of spin-torque oscillators with a parallel inductor-capacitor-resistor (LCR) load. In a large range of parameters the fully synchronous regime, where all the oscillators have the same state and the output field is maximal, is shown to be stable. However, not always such a robust complete synchronization develops from a random initial state; in many cases nontrivial clustering is observed, with a partial synchronization resulting in a quasiperiodic or chaotic mean-field dynamics.
Calculation of heat sink around cracks formed under pulsed heat load
NASA Astrophysics Data System (ADS)
Lazareva, G. G.; Arakcheev, A. S.; Kandaurov, I. V.; Kasatov, A. A.; Kurkuchekov, V. V.; Maksimova, A. G.; Popov, V. A.; Shoshin, A. A.; Snytnikov, A. V.; Trunev, Yu A.; Vasilyev, A. A.; Vyacheslavov, L. N.
2017-10-01
The experimental and numerical simulations of the conditions causing the intensive erosion and expected to be realized infusion reactor were carried out. The influence of relevant pulsed heat loads to tungsten was simulated using a powerful electron beam source in BINP. The mechanical destruction, melting and splashing of the material were observed. The laboratory experiments are accompanied by computational ones. Computational experiment allowed to quantitatively describe the overheating near the cracks, caused by parallel to surface cracks.
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Search asymmetries: parallel processing of uncertain sensory information.
Vincent, Benjamin T
2011-08-01
What is the mechanism underlying search phenomena such as search asymmetry? Two-stage models such as Feature Integration Theory and Guided Search propose parallel pre-attentive processing followed by serial post-attentive processing. They claim search asymmetry effects are indicative of finding pairs of features, one processed in parallel, the other in serial. An alternative proposal is that a 1-stage parallel process is responsible, and search asymmetries occur when one stimulus has greater internal uncertainty associated with it than another. While the latter account is simpler, only a few studies have set out to empirically test its quantitative predictions, and many researchers still subscribe to the 2-stage account. This paper examines three separate parallel models (Bayesian optimal observer, max rule, and a heuristic decision rule). All three parallel models can account for search asymmetry effects and I conclude that either people can optimally utilise the uncertain sensory data available to them, or are able to select heuristic decision rules which approximate optimal performance. Copyright © 2011 Elsevier Ltd. All rights reserved.
Coupling between structure and liquids in a parallel stage space shuttle design
NASA Technical Reports Server (NTRS)
Kana, D. D.; Ko, W. L.; Francis, P. H.; Nagy, A.
1972-01-01
A study was conducted to determine the influence of liquid propellants on the dynamic loads for space shuttle vehicles. A parallel-stage configuration model was designed and tested to determine the influence of liquid propellants on coupled natural modes. A forty degree-of-freedom analytical model was also developed for predicting these modes. Currently available analytical models were used to represent the liquid contributions, even though coupled longitudinal and lateral motions are present in such a complex structure. Agreement between the results was found in the lower few modes.
Shahinpoor, M.
1995-07-25
A device is disclosed for electromagnetically accelerating projectiles. The invention features two parallel conducting circular plates, a plurality of electrode connections to both upper and lower plates, a support base, and a projectile magazine. A projectile is spring-loaded into a firing position concentrically located between the parallel plates. A voltage source is applied to the plates to cause current to flow in directions defined by selectable, discrete electrode connections on both upper and lower plates. Repulsive Lorentz forces are generated to eject the projectile in a 360 degree range of fire. 4 figs.
Unstructured grids on SIMD torus machines
NASA Technical Reports Server (NTRS)
Bjorstad, Petter E.; Schreiber, Robert
1994-01-01
Unstructured grids lead to unstructured communication on distributed memory parallel computers, a problem that has been considered difficult. Here, we consider adaptive, offline communication routing for a SIMD processor grid. Our approach is empirical. We use large data sets drawn from supercomputing applications instead of an analytic model of communication load. The chief contribution of this paper is an experimental demonstration of the effectiveness of certain routing heuristics. Our routing algorithm is adaptive, nonminimal, and is generally designed to exploit locality. We have a parallel implementation of the router, and we report on its performance.
Multiplexer/Demultiplexer Loading Tool (MDMLT)
NASA Technical Reports Server (NTRS)
Brewer, Lenox Allen; Hale, Elizabeth; Martella, Robert; Gyorfi, Ryan
2012-01-01
The purpose of the MDMLT is to improve the reliability and speed of loading multiplexers/demultiplexers (MDMs) in the Software Development and Integration Laboratory (SDIL) by automating the configuration management (CM) of the loads in the MDMs, automating the loading procedure, and providing the capability to load multiple or all MDMs concurrently. This loading may be accomplished in parallel, or single MDMs (remote). The MDMLT is a Web-based tool that is capable of loading the entire International Space Station (ISS) MDM configuration in parallel. It is able to load Flight Equivalent Units (FEUs), enhanced, standard, and prototype MDMs as well as both EEPROM (Electrically Erasable Programmable Read-Only Memory) and SSMMU (Solid State Mass Memory Unit) (MASS Memory). This software has extensive configuration management to track loading history, and the performance improvement means of loading the entire ISS MDM configuration of 49 MDMs in approximately 30 minutes, as opposed to 36 hours, which is what it took previously utilizing the flight method of S-Band uplink. The laptop version recently added to the MDMLT suite allows remote lab loading with the CM of information entered into a common database when it is reconnected to the network. This allows the program to reconfigure the test rigs quickly between shifts, allowing the lab to support a variety of onboard configurations during a single day, based on upcoming or current missions. The MDMLT Computer Software Configuration Item (CSCI) supports a Web-based command and control interface to the user. An interface to the SDIL File Transfer Protocol (FTP) server is supported to import Integrated Flight Loads (IFLs) and Internal Product Release Notes (IPRNs) into the database. An interface to the Monitor and Control System (MCS) is supported to control the power state, and to enable or disable the debug port of the MDMs to be loaded. Two direct interfaces to the MDM are supported: a serial interface (debug port) to receive MDM memory dump data and the calculated checksum, and the Small Computer System Interface (SCSI) to transfer load files to MDMs with hard disks. File transfer from the MDM Loading Tool to EEPROM within the MDM is performed via the MILSTD- 1553 bus, making use of the Real- Time Input/Output Processors (RTIOP) when using the rig-based MDMLT, and via a bus box when using the laptop MDMLT. The bus box is a cost-effective alternative to PC-1553 cards for the laptop. It is noted that this system can be modified and adapted to any avionic laboratory for spacecraft computer loading, ship avionics, or aircraft avionics where multiple configurations and strong configuration management of software/firmware loads are required.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-09
... Mississippi Department of Environmental Quality (MDEQ), on July 13, 2012, for parallel processing. This... of Contents I. What is parallel processing? II. Background III. What elements are required under... Executive Order Reviews I. What is parallel processing? Consistent with EPA regulations found at 40 CFR Part...
Double Take: Parallel Processing by the Cerebral Hemispheres Reduces Attentional Blink
ERIC Educational Resources Information Center
Scalf, Paige E.; Banich, Marie T.; Kramer, Arthur F.; Narechania, Kunjan; Simon, Clarissa D.
2007-01-01
Recent data have shown that parallel processing by the cerebral hemispheres can expand the capacity of visual working memory for spatial locations (J. F. Delvenne, 2005) and attentional tracking (G. A. Alvarez & P. Cavanagh, 2005). Evidence that parallel processing by the cerebral hemispheres can improve item identification has remained elusive.…
Paucke, Madlen; Oppermann, Frank; Koch, Iring; Jescheniak, Jörg D
2015-12-01
Previous dual-task picture-naming studies suggest that lexical processes require capacity-limited processes and prevent other tasks to be carried out in parallel. However, studies involving the processing of multiple pictures suggest that parallel lexical processing is possible. The present study investigated the specific costs that may arise when such parallel processing occurs. We used a novel dual-task paradigm by presenting 2 visual objects associated with different tasks and manipulating between-task similarity. With high similarity, a picture-naming task (T1) was combined with a phoneme-decision task (T2), so that lexical processes were shared across tasks. With low similarity, picture-naming was combined with a size-decision T2 (nonshared lexical processes). In Experiment 1, we found that a manipulation of lexical processes (lexical frequency of T1 object name) showed an additive propagation with low between-task similarity and an overadditive propagation with high between-task similarity. Experiment 2 replicated this differential forward propagation of the lexical effect and showed that it disappeared with longer stimulus onset asynchronies. Moreover, both experiments showed backward crosstalk, indexed as worse T1 performance with high between-task similarity compared with low similarity. Together, these findings suggest that conditions of high between-task similarity can lead to parallel lexical processing in both tasks, which, however, does not result in benefits but rather in extra performance costs. These costs can be attributed to crosstalk based on the dual-task binding problem arising from parallel processing. Hence, the present study reveals that capacity-limited lexical processing can run in parallel across dual tasks but only at the expense of extraordinary high costs. (c) 2015 APA, all rights reserved).
Hempel, Nico; Bunn, Jeffrey R.; Nitschke-Pagel, Thomas; ...
2017-02-02
This research is dedicated to the experimental investigation of the residual stress relaxation in girth-welded pipes due to quasi-static bending loads. Ferritic-pearlitic steel pipes are welded with two passes, resulting in a characteristic residual stress state with high tensile residual stresses at the weld root. Also, four-point bending is applied to generate axial load stress causing changes in the residual stress state. These are determined both on the outer and inner surfaces of the pipes, as well as in the pipe wall, using X-ray and neutron diffraction. Focusing on the effect of tensile load stress, it is revealed that notmore » only the tensile residual stresses are reduced due to exceeding the yield stress, but also the compressive residual stresses for equilibrium reasons. Furthermore, residual stress relaxation occurs both parallel and perpendicular to the applied load stress.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hempel, Nico; Bunn, Jeffrey R.; Nitschke-Pagel, Thomas
This research is dedicated to the experimental investigation of the residual stress relaxation in girth-welded pipes due to quasi-static bending loads. Ferritic-pearlitic steel pipes are welded with two passes, resulting in a characteristic residual stress state with high tensile residual stresses at the weld root. Also, four-point bending is applied to generate axial load stress causing changes in the residual stress state. These are determined both on the outer and inner surfaces of the pipes, as well as in the pipe wall, using X-ray and neutron diffraction. Focusing on the effect of tensile load stress, it is revealed that notmore » only the tensile residual stresses are reduced due to exceeding the yield stress, but also the compressive residual stresses for equilibrium reasons. Furthermore, residual stress relaxation occurs both parallel and perpendicular to the applied load stress.« less
Dynamic strain distribution of FRP plate under blast loading
NASA Astrophysics Data System (ADS)
Saburi, T.; Yoshida, M.; Kubota, S.
2017-02-01
The dynamic strain distribution of a fiber re-enforced plastic (FRP) plate under blast loading was investigated using a Digital Image Correlation (DIC) image analysis method. The testing FRP plates were mounted in parallel to each other on a steel frame. 50 g of composition C4 explosive was used as a blast loading source and set in the center of the FRP plates. The dynamic behavior of the FRP plate under blast loading were observed by two high-speed video cameras. The set of two high-speed video image sequences were used to analyze the FRP three-dimensional strain distribution by means of DIC method. A point strain profile extracted from the analyzed strain distribution data was compared with a directly observed strain profile using a strain gauge and it was shown that the strain profile under the blast loading by DIC method is quantitatively accurate.