Dynamic load balancing of applications
Wheat, Stephen R.
1997-01-01
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated.
Dynamic load balancing of applications
Wheat, S.R.
1997-05-13
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers is disclosed. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. 13 figs.
Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems
NASA Technical Reports Server (NTRS)
Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.
2000-01-01
The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
The Feasibility of Adaptive Unstructured Computations On Petaflops Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Heber, Gerd; Gao, Guang; Saini, Subhash (Technical Monitor)
1999-01-01
This viewgraph presentation covers the advantages of mesh adaptation, unstructured grids, and dynamic load balancing. It illustrates parallel adaptive communications, and explains PLUM (Parallel dynamic load balancing for adaptive unstructured meshes), and PSAW (Proper Self Avoiding Walks).
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations
NASA Technical Reports Server (NTRS)
Chrisochoides, Nikos
1995-01-01
We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam
In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.
Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.
1999-01-01
The ability to dynamically adapt an unstructured grid is a powerful tool for efficiently solving computational problems with evolving physical features. In this paper, we report on our experience parallelizing an edge-based adaptation scheme, called 3D_TAG. using message passing. Results show excellent speedup when a realistic helicopter rotor mesh is randomly refined. However. performance deteriorates when the mesh is refined using a solution-based error indicator since mesh adaptation for practical problems occurs in a localized region., creating a severe load imbalance. To address this problem, we have developed PLUM, a global dynamic load balancing framework for adaptive numerical computations. Even though PLUM primarily balances processor workloads for the solution phase, it reduces the load imbalance problem within mesh adaptation by repartitioning the mesh after targeting edges for refinement but before the actual subdivision. This dramatically improves the performance of parallel 3D_TAG since refinement occurs in a more load balanced fashion. We also present optimal and heuristic algorithms that, when applied to the default mapping of a parallel repartitioner, significantly reduce the data redistribution overhead. Finally, portability is examined by comparing performance on three state-of-the-art parallel machines.
Parallel Processing of Adaptive Meshes with Load Balancing
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology, and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration.
PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Saini, Subhash (Technical Monitor)
1998-01-01
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method called PLUM to dynamically balance the processor workloads with a global view. This paper presents the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate that PLUM is an effective dynamic load balancing strategy which remains viable on a large number of processors.
Applying graph partitioning methods in measurement-based dynamic load balancing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatele, Abhinav; Fourestier, Sebastien; Menon, Harshitha
Load imbalance leads to an increasing waste of resources as an application is scaled to more and more processors. Achieving the best parallel efficiency for a program requires optimal load balancing which is a NP-hard problem. However, finding near-optimal solutions to this problem for complex computational science and engineering applications is becoming increasingly important. Charm++, a migratable objects based programming model, provides a measurement-based dynamic load balancing framework. This framework instruments and then migrates over-decomposed objects to balance computational load and communication at runtime. This paper explores the use of graph partitioning algorithms, traditionally used for partitioning physical domains/meshes, formore » measurement-based dynamic load balancing of parallel applications. In particular, we present repartitioning methods developed in a graph partitioning toolbox called SCOTCH that consider the previous mapping to minimize migration costs. We also discuss a new imbalance reduction algorithm for graphs with irregular load distributions. We compare several load balancing algorithms using microbenchmarks on Intrepid and Ranger and evaluate the effect of communication, number of cores and number of objects on the benefit achieved from load balancing. New algorithms developed in SCOTCH lead to better performance compared to the METIS partitioners for several cases, both in terms of the application execution time and fewer number of objects migrated.« less
Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jiang; Guo, Hanqi; Yuan, Xiaoru
Particle tracing is a fundamental technique in flow field data visualization. In this work, we present a novel dynamic load balancing method for parallel particle tracing. Specifically, we employ a constrained k-d tree decomposition approach to dynamically redistribute tasks among processes. Each process is initially assigned a regularly partitioned block along with duplicated ghost layer under the memory limit. During particle tracing, the k-d tree decomposition is dynamically performed by constraining the cutting planes in the overlap range of duplicated data. This ensures that each process is reassigned particles as even as possible, and on the other hand the newmore » assigned particles for a process always locate in its block. Result shows good load balance and high efficiency of our method.« less
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Sohn, Andrew
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load imbalance among processors on a parallel machine. This paper describes the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution cost is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35% of the mesh is randomly adapted. For large-scale scientific computations, our load balancing strategy gives almost a sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remapper yields processor assignments that are less than 3% off the optimal solutions but requires only 1% of the computational time.
Dynamic load balancing algorithm for molecular dynamics based on Voronoi cells domain decompositions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fattebert, J.-L.; Richards, D.F.; Glosli, J.N.
2012-12-01
We present a new algorithm for automatic parallel load balancing in classical molecular dynamics. It assumes a spatial domain decomposition of particles into Voronoi cells. It is a gradient method which attempts to minimize a cost function by displacing Voronoi sites associated with each processor/sub-domain along steepest descent directions. Excellent load balance has been obtained for quasi-2D and 3D practical applications, with up to 440·10 6 particles on 65,536 MPI tasks.
Data Partitioning and Load Balancing in Parallel Disk Systems
NASA Technical Reports Server (NTRS)
Scheuermann, Peter; Weikum, Gerhard; Zabback, Peter
1997-01-01
Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible waves, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent, self-reliant file system that aims to optimize striping by taking into account the requirements of the applications and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Sohn, Andrew
1996-01-01
Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.
Unstructured Adaptive Grid Computations on an Array of SMPs
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.
1996-01-01
Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
Load Balancing Unstructured Adaptive Grids for CFD Problems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid
1996-01-01
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.
Internet traffic load balancing using dynamic hashing with flow volume
NASA Astrophysics Data System (ADS)
Jo, Ju-Yeon; Kim, Yoohwan; Chao, H. Jonathan; Merat, Francis L.
2002-07-01
Sending IP packets over multiple parallel links is in extensive use in today's Internet and its use is growing due to its scalability, reliability and cost-effectiveness. To maximize the efficiency of parallel links, load balancing is necessary among the links, but it may cause the problem of packet reordering. Since packet reordering impairs TCP performance, it is important to reduce the amount of reordering. Hashing offers a simple solution to keep the packet order by sending a flow over a unique link, but static hashing does not guarantee an even distribution of the traffic amount among the links, which could lead to packet loss under heavy load. Dynamic hashing offers some degree of load balancing but suffers from load fluctuations and excessive packet reordering. To overcome these shortcomings, we have enhanced the dynamic hashing algorithm to utilize the flow volume information in order to reassign only the appropriate flows. This new method, called dynamic hashing with flow volume (DHFV), eliminates unnecessary flow reassignments of small flows and achieves load balancing very quickly without load fluctuation by accurately predicting the amount of transferred load between the links. In this paper we provide the general framework of DHFV and address the challenges in implementing DHFV. We then introduce two algorithms of DHFV with different flow selection strategies and show their performances through simulation.
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Dynamic Load Balancing for Adaptive Computations on Distributed-Memory Machines
NASA Technical Reports Server (NTRS)
1999-01-01
Dynamic load balancing is central to adaptive mesh-based computations on large-scale parallel computers. The principal investigator has investigated various issues on the dynamic load balancing problem under NASA JOVE and JAG rants. The major accomplishments of the project are two graph partitioning algorithms and a load balancing framework. The S-HARP dynamic graph partitioner is known to be the fastest among the known dynamic graph partitioners to date. It can partition a graph of over 100,000 vertices in 0.25 seconds on a 64- processor Cray T3E distributed-memory multiprocessor while maintaining the scalability of over 16-fold speedup. Other known and widely used dynamic graph partitioners take over a second or two while giving low scalability of a few fold speedup on 64 processors. These results have been published in journals and peer-reviewed flagship conferences.
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid
1999-01-01
The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.
Load Balancing Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pearce, Olga Tkachyshyn
2014-12-01
The largest supercomputers have millions of independent processors, and concurrency levels are rapidly increasing. For ideal efficiency, developers of the simulations that run on these machines must ensure that computational work is evenly balanced among processors. Assigning work evenly is challenging because many large modern parallel codes simulate behavior of physical systems that evolve over time, and their workloads change over time. Furthermore, the cost of imbalanced load increases with scale because most large-scale scientific simulations today use a Single Program Multiple Data (SPMD) parallel programming model, and an increasing number of processors will wait for the slowest one atmore » the synchronization points. To address load imbalance, many large-scale parallel applications use dynamic load balance algorithms to redistribute work evenly. The research objective of this dissertation is to develop methods to decide when and how to load balance the application, and to balance it effectively and affordably. We measure and evaluate the computational load of the application, and develop strategies to decide when and how to correct the imbalance. Depending on the simulation, a fast, local load balance algorithm may be suitable, or a more sophisticated and expensive algorithm may be required. We developed a model for comparison of load balance algorithms for a specific state of the simulation that enables the selection of a balancing algorithm that will minimize overall runtime.« less
S-HARP: A parallel dynamic spectral partitioner
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sohn, A.; Simon, H.
1998-01-01
Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
PLUM: Parallel Load Balancing for Unstructured Adaptive Meshes. Degree awarded by Colorado Univ.
NASA Technical Reports Server (NTRS)
Oliker, Leonid
1998-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for computing large-scale problems that require grid modifications to efficiently resolve solution features. By locally refining and coarsening the mesh to capture physical phenomena of interest, such procedures make standard computational methods more cost effective. Unfortunately, an efficient parallel implementation of these adaptive methods is rather difficult to achieve, primarily due to the load imbalance created by the dynamically-changing nonuniform grid. This requires significant communication at runtime, leading to idle processors and adversely affecting the total execution time. Nonetheless, it is generally thought that unstructured adaptive- grid techniques will constitute a significant fraction of future high-performance supercomputing. Various dynamic load balancing methods have been reported to date; however, most of them either lack a global view of loads across processors or do not apply their techniques to realistic large-scale applications.
Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing
NASA Astrophysics Data System (ADS)
Adnan; Sato, Mitsuhisa
Lazy-task creation is an efficient method of overcoming the overhead of the grain-size problem in parallel computing. Work stealing is an effective load balancing strategy for parallel computing. In this paper, we present dynamic work stealing strategies in a lazy-task creation technique for efficient fine-grain task scheduling. The basic idea is to control load balancing granularity depending on the number of task parents in a stack. The dynamic-length strategy of work stealing uses run-time information, which is information on the load of the victim, to determine the number of tasks that a thief is allowed to steal. We compare it with the bottommost first work stealing strategy used in StackThread/MP, and the fixed-length strategy of work stealing, where a thief requests to steal a fixed number of tasks, as well as other multithreaded frameworks such as Cilk and OpenMP task implementations. The experiments show that the dynamic-length strategy of work stealing performs well in irregular workloads such as in UTS benchmarks, as well as in regular workloads such as Fibonacci, Strassen's matrix multiplication, FFT, and Sparse-LU factorization. The dynamic-length strategy works better than the fixed-length strategy because it is more flexible than the latter; this strategy can avoid load imbalance due to overstealing.
Multiprocessing the Sieve of Eratosthenes
NASA Technical Reports Server (NTRS)
Bokhari, S.
1986-01-01
The Sieve of Eratosthenes for finding prime numbers in recent years has seen much use as a benchmark algorithm for serial computers while its intrinsically parallel nature has gone largely unnoticed. The implementation of a parallel version of this algorithm for a real parallel computer, the Flex/32, is described and its performance discussed. It is shown that the algorithm is sensitive to several fundamental performance parameters of parallel machines, such as spawning time, signaling time, memory access, and overhead of process switching. Because of the nature of the algorithm, it is impossible to get any speedup beyond 4 or 5 processors unless some form of dynamic load balancing is employed. We describe the performance of our algorithm with and without load balancing and compare it with theoretical lower bounds and simulated results. It is straightforward to understand this algorithm and to check the final results. However, its efficient implementation on a real parallel machine requires thoughtful design, especially if dynamic load balancing is desired. The fundamental operations required by the algorithm are very simple: this means that the slightest overhead appears prominently in performance data. The Sieve thus serves not only as a very severe test of the capabilities of a parallel processor but is also an interesting challenge for the programmer.
A tool for simulating parallel branch-and-bound methods
NASA Astrophysics Data System (ADS)
Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail
2016-01-01
The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Load Balancing Strategies for Multi-Block Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)
2002-01-01
The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
Adaptive mesh refinement and load balancing based on multi-level block-structured Cartesian mesh
NASA Astrophysics Data System (ADS)
Misaka, Takashi; Sasaki, Daisuke; Obayashi, Shigeru
2017-11-01
We developed a framework for a distributed-memory parallel computer that enables dynamic data management for adaptive mesh refinement and load balancing. We employed simple data structure of the building cube method (BCM) where a computational domain is divided into multi-level cubic domains and each cube has the same number of grid points inside, realising a multi-level block-structured Cartesian mesh. Solution adaptive mesh refinement, which works efficiently with the help of the dynamic load balancing, was implemented by dividing cubes based on mesh refinement criteria. The framework was investigated with the Laplace equation in terms of adaptive mesh refinement, load balancing and the parallel efficiency. It was then applied to the incompressible Navier-Stokes equations to simulate a turbulent flow around a sphere. We considered wall-adaptive cube refinement where a non-dimensional wall distance y+ near the sphere is used for a criterion of mesh refinement. The result showed the load imbalance due to y+ adaptive mesh refinement was corrected by the present approach. To utilise the BCM framework more effectively, we also tested a cube-wise algorithm switching where an explicit and implicit time integration schemes are switched depending on the local Courant-Friedrichs-Lewy (CFL) condition in each cube.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
NASA Technical Reports Server (NTRS)
Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
Dynamic load balancing for petascale quantum Monte Carlo applications: The Alias method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sudheer, C. D.; Krishnan, S.; Srinivasan, A.
Diffusion Monte Carlo is the most accurate widely used Quantum Monte Carlo method for the electronic structure of materials, but it requires frequent load balancing or population redistribution steps to maintain efficiency and avoid accumulation of systematic errors on parallel machines. The load balancing step can be a significant factor affecting performance, and will become more important as the number of processing elements increases. We propose a new dynamic load balancing algorithm, the Alias Method, and evaluate it theoretically and empirically. An important feature of the new algorithm is that the load can be perfectly balanced with each process receivingmore » at most one message. It is also optimal in the maximum size of messages received by any process. We also optimize its implementation to reduce network contention, a process facilitated by the low messaging requirement of the algorithm. Empirical results on the petaflop Cray XT Jaguar supercomputer at ORNL showing up to 30% improvement in performance on 120,000 cores. The load balancing algorithm may be straightforwardly implemented in existing codes. The algorithm may also be employed by any method with many near identical computational tasks that requires load balancing.« less
Performance tradeoffs in static and dynamic load balancing strategies
NASA Technical Reports Server (NTRS)
Iqbal, M. A.; Saltz, J. H.; Bokhart, S. H.
1986-01-01
The problem of uniformly distributing the load of a parallel program over a multiprocessor system was considered. A program was analyzed whose structure permits the computation of the optimal static solution. Then four strategies for load balancing were described and their performance compared. The strategies are: (1) the optimal static assignment algorithm which is guaranteed to yield the best static solution, (2) the static binary dissection method which is very fast but sub-optimal, (3) the greedy algorithm, a static fully polynomial time approximation scheme, which estimates the optimal solution to arbitrary accuracy, and (4) the predictive dynamic load balancing heuristic which uses information on the precedence relationships within the program and outperforms any of the static methods. It is also shown that the overhead incurred by the dynamic heuristic is reduced considerably if it is started off with a static assignment provided by either of the other three strategies.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Simon, Horst D.
1996-01-01
The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate
NASA Technical Reports Server (NTRS)
Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel
1994-01-01
This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.
Efficient Load Balancing and Data Remapping for Adaptive Grid Calculations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1997-01-01
Mesh adaption is a powerful tool for efficient unstructured- grid computations but causes load imbalance among processors on a parallel machine. We present a novel method to dynamically balance the processor workloads with a global view. This paper presents, for the first time, the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. Previous results indicated that mesh repartitioning and data remapping are potential bottlenecks for performing large-scale scientific calculations. We resolve these issues and demonstrate that our framework remains viable on a large number of processors.
The Distributed Diagonal Force Decomposition Method for Parallelizing Molecular Dynamics Simulations
Boršnik, Urban; Miller, Benjamin T.; Brooks, Bernard R.; Janežič, Dušanka
2011-01-01
Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007
NASA Astrophysics Data System (ADS)
Furuichi, M.; Nishiura, D.
2015-12-01
Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).
Parallelized reliability estimation of reconfigurable computer networks
NASA Technical Reports Server (NTRS)
Nicol, David M.; Das, Subhendu; Palumbo, Dan
1990-01-01
A parallelized system, ASSURE, for computing the reliability of embedded avionics flight control systems which are able to reconfigure themselves in the event of failure is described. ASSURE accepts a grammar that describes a reliability semi-Markov state-space. From this it creates a parallel program that simultaneously generates and analyzes the state-space, placing upper and lower bounds on the probability of system failure. ASSURE is implemented on a 32-node Intel iPSC/860, and has achieved high processor efficiencies on real problems. Through a combination of improved algorithms, exploitation of parallelism, and use of an advanced microprocessor architecture, ASSURE has reduced the execution time on substantial problems by a factor of one thousand over previous workstation implementations. Furthermore, ASSURE's parallel execution rate on the iPSC/860 is an order of magnitude faster than its serial execution rate on a Cray-2 supercomputer. While dynamic load balancing is necessary for ASSURE's good performance, it is needed only infrequently; the particular method of load balancing used does not substantially affect performance.
NASA Astrophysics Data System (ADS)
Wang, Yaping; Lin, Shunjiang; Yang, Zhibin
2017-05-01
In the traditional three-phase power flow calculation of the low voltage distribution network, the load model is described as constant power. Since this model cannot reflect the characteristics of actual loads, the result of the traditional calculation is always different from the actual situation. In this paper, the load model in which dynamic load represented by air conditioners parallel with static load represented by lighting loads is used to describe characteristics of residents load, and the three-phase power flow calculation model is proposed. The power flow calculation model includes the power balance equations of three-phase (A,B,C), the current balance equations of phase 0, and the torque balancing equations of induction motors in air conditioners. And then an alternating iterative algorithm of induction motor torque balance equations with each node balance equations is proposed to solve the three-phase power flow model. This method is applied to an actual low voltage distribution network of residents load, and by the calculation of three different operating states of air conditioners, the result demonstrates the effectiveness of the proposed model and the algorithm.
NASA Technical Reports Server (NTRS)
Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele
2004-01-01
In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
NASA Technical Reports Server (NTRS)
Krasteva, Denitza T.
1998-01-01
Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++
NASA Technical Reports Server (NTRS)
Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.
1996-01-01
This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kodavasal, Janardhan; Harms, Kevin; Srivastava, Priyesh
A closed-cycle gasoline compression ignition engine simulation near top dead center (TDC) was used to profile the performance of a parallel commercial engine computational fluid dynamics code, as it was scaled on up to 4096 cores of an IBM Blue Gene/Q supercomputer. The test case has 9 million cells near TDC, with a fixed mesh size of 0.15 mm, and was run on configurations ranging from 128 to 4096 cores. Profiling was done for a small duration of 0.11 crank angle degrees near TDC during ignition. Optimization of input/output performance resulted in a significant speedup in reading restart files, andmore » in an over 100-times speedup in writing restart files and files for post-processing. Improvements to communication resulted in a 1400-times speedup in the mesh load balancing operation during initialization, on 4096 cores. An improved, “stiffness-based” algorithm for load balancing chemical kinetics calculations was developed, which results in an over 3-times faster run-time near ignition on 4096 cores relative to the original load balancing scheme. With this improvement to load balancing, the code achieves over 78% scaling efficiency on 2048 cores, and over 65% scaling efficiency on 4096 cores, relative to 256 cores.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamblin, T; de Supinski, B R; Schulz, M
Good load balance is crucial on very large parallel systems, but the most sophisticated algorithms introduce dynamic imbalances through adaptation in domain decomposition or use of adaptive solvers. To observe and diagnose imbalance, developers need system-wide, temporally-ordered measurements from full-scale runs. This potentially requires data collection from multiple code regions on all processors over the entire execution. Doing this instrumentation naively can, in combination with the application itself, exceed available I/O bandwidth and storage capacity, and can induce severe behavioral perturbations. We present and evaluate a novel technique for scalable, low-error load balance measurement. This uses a parallel wavelet transformmore » and other parallel encoding methods. We show that our technique collects and reconstructs system-wide measurements with low error. Compression time scales sublinearly with system size and data volume is several orders of magnitude smaller than the raw data. The overhead is low enough for online use in a production environment.« less
Automatic mesh refinement and parallel load balancing for Fokker-Planck-DSMC algorithm
NASA Astrophysics Data System (ADS)
Küchlin, Stephan; Jenny, Patrick
2018-06-01
Recently, a parallel Fokker-Planck-DSMC algorithm for rarefied gas flow simulation in complex domains at all Knudsen numbers was developed by the authors. Fokker-Planck-DSMC (FP-DSMC) is an augmentation of the classical DSMC algorithm, which mitigates the near-continuum deficiencies in terms of computational cost of pure DSMC. At each time step, based on a local Knudsen number criterion, the discrete DSMC collision operator is dynamically switched to the Fokker-Planck operator, which is based on the integration of continuous stochastic processes in time, and has fixed computational cost per particle, rather than per collision. In this contribution, we present an extension of the previous implementation with automatic local mesh refinement and parallel load-balancing. In particular, we show how the properties of discrete approximations to space-filling curves enable an efficient implementation. Exemplary numerical studies highlight the capabilities of the new code.
Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak
2001-01-01
The Information Power Grid (IPG) concept developed by NASA is aimed to provide a metacomputing platform for large-scale distributed computations, by hiding the intricacies of highly heterogeneous environment and yet maintaining adequate security. In this paper, we propose a latency-tolerant partitioning scheme that dynamically balances processor workloads on the.IPG, and minimizes data movement and runtime communication. By simulating an unsteady adaptive mesh application on a wide area network, we study the performance of our load balancer under the Globus environment. The number of IPG nodes, the number of processors per node, and the interconnected speeds are parameterized to derive conditions under which the IPG would be suitable for parallel distributed processing of such applications. Experimental results demonstrate that effective solution are achieved when the IPG nodes are connected by a high-speed asynchronous interconnection network.
Load balancing for massively-parallel soft-real-time systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hailperin, M.
1988-09-01
Global load balancing, if practical, would allow the effective use of massively-parallel ensemble architectures for large soft-real-problems. The challenge is to replace quick global communications, which is impractical in a massively-parallel system, with statistical techniques. In this vein, the author proposes a novel approach to decentralized load balancing based on statistical time-series analysis. Each site estimates the system-wide average load using information about past loads of individual sites and attempts to equal that average. This estimation process is practical because the soft-real-time systems of interest naturally exhibit loads that are periodic, in a statistical sense akin to seasonality in econometrics.more » It is shown how this load-characterization technique can be the foundation for a load-balancing system in an architecture employing cut-through routing and an efficient multicast protocol.« less
DNS load balancing in the CERN cloud
NASA Astrophysics Data System (ADS)
Reguero Naredo, Ignacio; Lobato Pardavila, Lorena
2017-10-01
Load Balancing is one of the technologies enabling deployment of large-scale applications on cloud resources. A DNS Load Balancer Daemon (LBD) has been developed at CERN as a cost-effective way to balance applications accepting DNS timing dynamics and not requiring persistence. It currently serves over 450 load-balanced aliases with two small VMs acting as master and slave. The aliases are mapped to DNS subdomains. These subdomains are managed with DDNS according to a load metric, which is collected from the alias member nodes with SNMP. During the last years, several improvements were brought to the software, for instance: support for IPv6, parallelization of the status requests, implementing the client in Python to allow for multiple aliases with differentiated states on the same machine or support for application state. The configuration of the Load Balancer is currently managed by a Puppet type. It discovers the alias member nodes and gets the alias definitions from the Ermis REST service. The Aiermis self-service GUI for the management of the LB aliases has been produced and is based on the Ermis service above that implements a form of Load Balancing as a Service (LBaaS). The Ermis REST API has authorisation based in Foreman hostgroups. The CERN DNS LBD is Open Software with Apache 2 license.
Data decomposition method for parallel polygon rasterization considering load balancing
NASA Astrophysics Data System (ADS)
Zhou, Chen; Chen, Zhenjie; Liu, Yongxue; Li, Feixue; Cheng, Liang; Zhu, A.-xing; Li, Manchun
2015-12-01
It is essential to adopt parallel computing technology to rapidly rasterize massive polygon data. In parallel rasterization, it is difficult to design an effective data decomposition method. Conventional methods ignore load balancing of polygon complexity in parallel rasterization and thus fail to achieve high parallel efficiency. In this paper, a novel data decomposition method based on polygon complexity (DMPC) is proposed. First, four factors that possibly affect the rasterization efficiency were investigated. Then, a metric represented by the boundary number and raster pixel number in the minimum bounding rectangle was developed to calculate the complexity of each polygon. Using this metric, polygons were rationally allocated according to the polygon complexity, and each process could achieve balanced loads of polygon complexity. To validate the efficiency of DMPC, it was used to parallelize different polygon rasterization algorithms and tested on different datasets. Experimental results showed that DMPC could effectively parallelize polygon rasterization algorithms. Furthermore, the implemented parallel algorithms with DMPC could achieve good speedup ratios of at least 15.69 and generally outperformed conventional decomposition methods in terms of parallel efficiency and load balancing. In addition, the results showed that DMPC exhibited consistently better performance for different spatial distributions of polygons.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahn, Tae-Hyuk; Sandu, Adrian; Watson, Layne T.
2015-08-01
Ensembles of simulations are employed to estimate the statistics of possible future states of a system, and are widely used in important applications such as climate change and biological modeling. Ensembles of runs can naturally be executed in parallel. However, when the CPU times of individual simulations vary considerably, a simple strategy of assigning an equal number of tasks per processor can lead to serious work imbalances and low parallel efficiency. This paper presents a new probabilistic framework to analyze the performance of dynamic load balancing algorithms for ensembles of simulations where many tasks are mapped onto each processor, andmore » where the individual compute times vary considerably among tasks. Four load balancing strategies are discussed: most-dividing, all-redistribution, random-polling, and neighbor-redistribution. Simulation results with a stochastic budding yeast cell cycle model are consistent with the theoretical analysis. It is especially significant that there is a provable global decrease in load imbalance for the local rebalancing algorithms due to scalability concerns for the global rebalancing algorithms. The overall simulation time is reduced by up to 25 %, and the total processor idle time by 85 %.« less
Dynamic programming on a shared-memory multiprocessor
NASA Technical Reports Server (NTRS)
Edmonds, Phil; Chu, Eleanor; George, Alan
1993-01-01
Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.
Performance Analysis and Portability of the PLUM Load Balancing System
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.
1998-01-01
The ability to dynamically adapt an unstructured mesh is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult. To address this problem, we have developed PLUM, an automatic portable framework for performing adaptive numerical computations in a message-passing environment. PLUM requires that all data be globally redistributed after each mesh adaption to achieve load balance. We present an algorithm for minimizing this remapping overhead by guaranteeing an optimal processor reassignment. We also show that the data redistribution cost can be significantly reduced by applying our heuristic processor reassignment algorithm to the default mapping of the parallel partitioner. Portability is examined by comparing performance on a SP2, an Origin2000, and a T3E. Results show that PLUM can be successfully ported to different platforms without any code modifications.
GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.
Hess, Berk; Kutzner, Carsten; van der Spoel, David; Lindahl, Erik
2008-03-01
Molecular simulation is an extremely useful, but computationally very expensive tool for studies of chemical and biomolecular systems. Here, we present a new implementation of our molecular simulation toolkit GROMACS which now both achieves extremely high performance on single processors from algorithmic optimizations and hand-coded routines and simultaneously scales very well on parallel machines. The code encompasses a minimal-communication domain decomposition algorithm, full dynamic load balancing, a state-of-the-art parallel constraint solver, and efficient virtual site algorithms that allow removal of hydrogen atom degrees of freedom to enable integration time steps up to 5 fs for atomistic simulations also in parallel. To improve the scaling properties of the common particle mesh Ewald electrostatics algorithms, we have in addition used a Multiple-Program, Multiple-Data approach, with separate node domains responsible for direct and reciprocal space interactions. Not only does this combination of algorithms enable extremely long simulations of large systems but also it provides that simulation performance on quite modest numbers of standard cluster nodes.
Comparing the Performance of Two Dynamic Load Distribution Methods
NASA Technical Reports Server (NTRS)
Kale, L. V.
1987-01-01
Parallel processing of symbolic computations on a message-passing multi-processor presents one challenge: To effectively utilize the available processors, the load must be distributed uniformly to all the processors. However, the structure of these computations cannot be predicted in advance. go, static scheduling methods are not applicable. In this paper, we compare the performance of two dynamic, distributed load balancing methods with extensive simulation studies. The two schemes are: the Contracting Within a Neighborhood (CWN) scheme proposed by us, and the Gradient Model proposed by Lin and Keller. We conclude that although simpler, the CWN is significantly more effective at distributing the work than the Gradient model.
A parallel implementation of an off-lattice individual-based model of multicellular populations
NASA Astrophysics Data System (ADS)
Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe
2015-07-01
As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
2017-04-13
modelling code, a parallel benchmark , and a communication avoiding version of the QR algorithm. Further, several improvements to the OmpSs model were...movement; and a port of the dynamic load balancing library to OmpSs. Finally, several updates to the tools infrastructure were accomplished, including: an...OmpSs: a basic algorithm on image processing applications, a mini application representative of an ocean modelling code, a parallel benchmark , and a
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers
NASA Technical Reports Server (NTRS)
Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)
1997-01-01
The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers
1991-07-01
DTIC ~ ELECTE OCT 2 41991 AD-A242 045 Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers’ Edoardo S. Biagioni Jan F. Prins...Department of Computer Science University of North Carolina Chapel Hill, N.C. 27599-3175 USA biagioni @cs.unc.edu prinsOcs.unc.edu Abstract Scan Directed...MasPar Computer Corpora- tion. Bibliography [1] Edoardo S. Biagioni . Scan Directed Load Balancing. PhD thesis., University of North Carolina, Chapel Hill
Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing
NASA Technical Reports Server (NTRS)
Sterling, T. L.; Zima, H. P.
2002-01-01
Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh, a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers, a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing. The paper concludes with a discussion of related research activities and an outlook to future work.
NASA Astrophysics Data System (ADS)
Furuichi, Mikito; Nishiura, Daisuke
2017-10-01
We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
Parallel Programming Strategies for Irregular Adaptive Applications
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance for such computations. In this work, we examine two typical irregular adaptive applications, Dynamic Remeshing and N-Body, under competing programming methodologies and across various parallel architectures. The Dynamic Remeshing application simulates flow over an airfoil, and refines localized regions of the underlying unstructured mesh. The N-Body experiment models two neighboring Plummer galaxies that are about to undergo a merger. Both problems demonstrate dramatic changes in processor workloads and interprocessor communication with time; thus, dynamic load balancing is a required component.
Parallel adaptive wavelet collocation method for PDEs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
NASA Technical Reports Server (NTRS)
Eidson, T. M.; Erlebacher, G.
1994-01-01
While parallel computers offer significant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem - a periodic tridiagonal solver - are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate the various strategies. The particular tridiagonal solver evaluated is used in many computational fluid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solutions for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus a Gaussian elimination type algorithm can be used in a parallel computation and the more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data is moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires significant data movement between processor memories across a network. The second strategy attempts to have the algorithm allow the data across processor boundaries in a chained manner. This usually requires significantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian elimination type algorithm into the parallel chained, load-balanced algorithm.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.; Joy, K. I.
2013-07-01
In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics
NASA Technical Reports Server (NTRS)
Kurdila, Andrew J.
1990-01-01
While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.
An efficient dynamic load balancing algorithm
NASA Astrophysics Data System (ADS)
Lagaros, Nikos D.
2014-01-01
In engineering problems, randomness and uncertainties are inherent. Robust design procedures, formulated in the framework of multi-objective optimization, have been proposed in order to take into account sources of randomness and uncertainty. These design procedures require orders of magnitude more computational effort than conventional analysis or optimum design processes since a very large number of finite element analyses is required to be dealt. It is therefore an imperative need to exploit the capabilities of computing resources in order to deal with this kind of problems. In particular, parallel computing can be implemented at the level of metaheuristic optimization, by exploiting the physical parallelization feature of the nondominated sorting evolution strategies method, as well as at the level of repeated structural analyses required for assessing the behavioural constraints and for calculating the objective functions. In this study an efficient dynamic load balancing algorithm for optimum exploitation of available computing resources is proposed and, without loss of generality, is applied for computing the desired Pareto front. In such problems the computation of the complete Pareto front with feasible designs only, constitutes a very challenging task. The proposed algorithm achieves linear speedup factors and almost 100% speedup factor values with reference to the sequential procedure.
Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P
2014-10-30
Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
Dynamics of a split torque helicopter transmission
NASA Technical Reports Server (NTRS)
Rashidi, Majid; Krantz, Timothy
1992-01-01
A high reduction ratio split torque gear train has been proposed as an alternative to a planetary configuration for the final stage of a helicopter transmission. A split torque design allows a high ratio of power-to-weight for the transmission. The design studied in this work includes a pivoting beam that acts to balance thrust loads produced by the helical gear meshes in each of two parallel power paths. When the thrust loads are balanced, the torque is split evenly. A mathematical model was developed to study the dynamics of the system. The effects of time varying gear mesh stiffness, static transmission errors, and flexible bearing supports are included in the model. The model was demonstrated with a test case. Results show that although the gearbox has a symmetric configuration, the simulated dynamic behavior of the first and second compound gears are not the same. Also, results show that shaft location and mesh stiffness tuning are significant design parameters that influence the motions of the system.
Implicit schemes and parallel computing in unstructured grid CFD
NASA Technical Reports Server (NTRS)
Venkatakrishnam, V.
1995-01-01
The development of implicit schemes for obtaining steady state solutions to the Euler and Navier-Stokes equations on unstructured grids is outlined. Applications are presented that compare the convergence characteristics of various implicit methods. Next, the development of explicit and implicit schemes to compute unsteady flows on unstructured grids is discussed. Next, the issues involved in parallelizing finite volume schemes on unstructured meshes in an MIMD (multiple instruction/multiple data stream) fashion are outlined. Techniques for partitioning unstructured grids among processors and for extracting parallelism in explicit and implicit solvers are discussed. Finally, some dynamic load balancing ideas, which are useful in adaptive transient computations, are presented.
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.
1989-01-01
Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.
Numerical computation of solar neutrino flux attenuated by the MSW mechanism
NASA Astrophysics Data System (ADS)
Kim, Jai Sam; Chae, Yoon Sang; Kim, Jung Dae
1999-07-01
We compute the survival probability of an electron neutrino in its flight through the solar core experiencing the Mikheyev-Smirnov-Wolfenstein effect with all three neutrino species considered. We adopted a hybrid method that uses an accurate approximation formula in the non-resonance region and numerical integration in the non-adiabatic resonance region. The key of our algorithm is to use the importance sampling method for sampling the neutrino creation energy and position and to find the optimum radii to start and stop numerical integration. We further developed a parallel algorithm for a message passing parallel computer. By using an idea of job token, we have developed a dynamical load balancing mechanism which is effective under any irregular load distributions
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems.
Niethammer, Christoph; Becker, Stefan; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin
2014-10-14
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.
A Domain Decomposition Parallelization of the Fast Marching Method
NASA Technical Reports Server (NTRS)
Herrmann, M.
2003-01-01
In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
A parallel algorithm for multi-level logic synthesis using the transduction method. M.S. Thesis
NASA Technical Reports Server (NTRS)
Lim, Chieng-Fai
1991-01-01
The Transduction Method has been shown to be a powerful tool in the optimization of multilevel networks. Many tools such as the SYLON synthesis system (X90), (CM89), (LM90) have been developed based on this method. A parallel implementation is presented of SYLON-XTRANS (XM89) on an eight processor Encore Multimax shared memory multiprocessor. It minimizes multilevel networks consisting of simple gates through parallel pruning, gate substitution, gate merging, generalized gate substitution, and gate input reduction. This implementation, called Parallel TRANSduction (PTRANS), also uses partitioning to break large circuits up and performs inter- and intra-partition dynamic load balancing. With this, good speedups and high processor efficiencies are achievable without sacrificing the resulting circuit quality.
Load Balancing in Stochastic Networks: Algorithms, Analysis, and Game Theory
2014-04-16
SECURITY CLASSIFICATION OF: The classic randomized load balancing model is the so-called supermarket model, which describes a system in which...P.O. Box 12211 Research Triangle Park, NC 27709-2211 mean-field limits, supermarket model, thresholds, game, randomized load balancing REPORT...balancing model is the so-called supermarket model, which describes a system in which customers arrive to a service center with n parallel servers according
Grid computing in large pharmaceutical molecular modeling.
Claus, Brian L; Johnson, Stephen R
2008-07-01
Most major pharmaceutical companies have employed grid computing to expand their compute resources with the intention of minimizing additional financial expenditure. Historically, one of the issues restricting widespread utilization of the grid resources in molecular modeling is the limited set of suitable applications amenable to coarse-grained parallelization. Recent advances in grid infrastructure technology coupled with advances in application research and redesign will enable fine-grained parallel problems, such as quantum mechanics and molecular dynamics, which were previously inaccessible to the grid environment. This will enable new science as well as increase resource flexibility to load balance and schedule existing workloads.
Dynamic load balance scheme for the DSMC algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jin; Geng, Xiangren; Jiang, Dingwu
The direct simulation Monte Carlo (DSMC) algorithm, devised by Bird, has been used over a wide range of various rarified flow problems in the past 40 years. While the DSMC is suitable for the parallel implementation on powerful multi-processor architecture, it also introduces a large load imbalance across the processor array, even for small examples. The load imposed on a processor by a DSMC calculation is determined to a large extent by the total of simulator particles upon it. Since most flows are impulsively started with initial distribution of particles which is surely quite different from the steady state, themore » total of simulator particles will change dramatically. The load balance based upon an initial distribution of particles will break down as the steady state of flow is reached. The load imbalance and huge computational cost of DSMC has limited its application to rarefied or simple transitional flows. In this paper, by taking advantage of METIS, a software for partitioning unstructured graphs, and taking the total of simulator particles in each cell as a weight information, the repartitioning based upon the principle that each processor handles approximately the equal total of simulator particles has been achieved. The computation must pause several times to renew the total of simulator particles in each processor and repartition the whole domain again. Thus the load balance across the processors array holds in the duration of computation. The parallel efficiency can be improved effectively. The benchmark solution of a cylinder submerged in hypersonic flow has been simulated numerically. Besides, hypersonic flow past around a complex wing-body configuration has also been simulated. The results have displayed that, for both of cases, the computational time can be reduced by about 50%.« less
Time Warp Operating System, Version 2.5.1
NASA Technical Reports Server (NTRS)
Bellenot, Steven F.; Gieselman, John S.; Hawley, Lawrence R.; Peterson, Judy; Presley, Matthew T.; Reiher, Peter L.; Springer, Paul L.; Tupman, John R.; Wedel, John J., Jr.; Wieland, Frederick P.;
1993-01-01
Time Warp Operating System, TWOS, is special purpose computer program designed to support parallel simulation of discrete events. Complete implementation of Time Warp software mechanism, which implements distributed protocol for virtual synchronization based on rollback of processes and annihilation of messages. Supports simulations and other computations in which both virtual time and dynamic load balancing used. Program utilizes underlying resources of operating system. Written in C programming language.
Multi-jagged: A scalable parallel spatial partitioning algorithm
Deveci, Mehmet; Rajamanickam, Sivasankaran; Devine, Karen D.; ...
2015-03-18
Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those requiring geometric locality of data (particle methods, crash simulations). We present, to our knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our algorithm does recursive multi-section with a given number of parts in each dimension. By computing multiple cut lines concurrently and intelligently deciding when to migrate data while computing the partition, we minimize data movement compared to efficientmore » implementations of recursive bisection. We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and scales better than RCB in terms of run-time without degrading the load balance. Lastly, our implementation partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak scaling up to 6K cores.« less
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations
Mitchell, William F.
1998-01-01
Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.
Mitchell, William F
1998-01-01
Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.
Parallel Computing Strategies for Irregular Algorithms
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
High performance computing in biology: multimillion atom simulations of nanoscale systems
Sanbonmatsu, K. Y.; Tung, C.-S.
2007-01-01
Computational methods have been used in biology for sequence analysis (bioinformatics), all-atom simulation (molecular dynamics and quantum calculations), and more recently for modeling biological networks (systems biology). Of these three techniques, all-atom simulation is currently the most computationally demanding, in terms of compute load, communication speed, and memory load. Breakthroughs in electrostatic force calculation and dynamic load balancing have enabled molecular dynamics simulations of large biomolecular complexes. Here, we report simulation results for the ribosome, using approximately 2.64 million atoms, the largest all-atom biomolecular simulation published to date. Several other nanoscale systems with different numbers of atoms were studied to measure the performance of the NAMD molecular dynamics simulation program on the Los Alamos National Laboratory Q Machine. We demonstrate that multimillion atom systems represent a 'sweet spot' for the NAMD code on large supercomputers. NAMD displays an unprecedented 85% parallel scaling efficiency for the ribosome system on 1024 CPUs. We also review recent targeted molecular dynamics simulations of the ribosome that prove useful for studying conformational changes of this large biomolecular complex in atomic detail. PMID:17187988
Algorithms for parallel flow solvers on message passing architectures
NASA Technical Reports Server (NTRS)
Vanderwijngaart, Rob F.
1995-01-01
The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those immediately adjacent to them, then the first processor in the pipeline will receive a computational load that is less than that of subsequent processors, magnifying the pipeline slowdown effect. Extra compensation is needed for grid boundary effects, even if all grid blocks are equally sized.
Towards scalable Byzantine fault-tolerant replication
NASA Astrophysics Data System (ADS)
Zbierski, Maciej
2017-08-01
Byzantine fault-tolerant (BFT) replication is a powerful technique, enabling distributed systems to remain available and correct even in the presence of arbitrary faults. Unfortunately, existing BFT replication protocols are mostly load-unscalable, i.e. they fail to respond with adequate performance increase whenever new computational resources are introduced into the system. This article proposes a universal architecture facilitating the creation of load-scalable distributed services based on BFT replication. The suggested approach exploits parallel request processing to fully utilize the available resources, and uses a load balancer module to dynamically adapt to the properties of the observed client workload. The article additionally provides a discussion on selected deployment scenarios, and explains how the proposed architecture could be used to increase the dependability of contemporary large-scale distributed systems.
Hallock, Michael J.; Stone, John E.; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida
2014-01-01
Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli. Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems. PMID:24882911
Hallock, Michael J; Stone, John E; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida
2014-05-01
Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli . Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems.
Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu
1995-01-01
As computing technology continues to advance, computational modeling of scientific and engineering problems produces data of increasing complexity: large in size and unstructured in shape. Volume visualization of such data is a challenging problem. This paper proposes a distributed parallel solution that makes ray-casting volume rendering of unstructured-grid data practical. Both the data and the rendering process are distributed among processors. At each processor, ray-casting of local data is performed independent of the other processors. The global image composing processes, which require inter-processor communication, are overlapped with the local ray-casting processes to achieve maximum parallel efficiency. This algorithm differs from previous ones in four ways: it is completely distributed, less view-dependent, reasonably scalable, and flexible. Without using dynamic load balancing, test results on the Intel Paragon using from two to 128 processors show, on average, about 60% parallel efficiency.
Multidimensional spectral load balancing
Hendrickson, Bruce A.; Leland, Robert W.
1996-12-24
A method of and apparatus for graph partitioning involving the use of a plurality of eigenvectors of the Laplacian matrix of the graph of the problem for which load balancing is desired. The invention is particularly useful for optimizing parallel computer processing of a problem and for minimizing total pathway lengths of integrated circuits in the design stage.
Dynamic Calibration of the NASA Ames Rotor Test Apparatus Steady/Dynamic Rotor Balance
NASA Technical Reports Server (NTRS)
Peterson, Randall L.; vanAken, Johannes M.
1996-01-01
The NASA Ames Rotor Test Apparatus was modified to include a Steady/Dynamic Rotor Balance. The dynamic calibration procedures and configurations are discussed. Random excitation was applied at the rotor hub, and vibratory force and moment responses were measured on the steady/dynamic rotor balance. Transfer functions were computed using the load cell data and the vibratory force and moment responses from the rotor balance. Calibration results showing the influence of frequency bandwidth, hub mass, rotor RPM, thrust preload, and dynamic loads through the stationary push rods are presented and discussed.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Shake Test Results and Dynamic Calibration Efforts for the Large Rotor Test Apparatus
NASA Technical Reports Server (NTRS)
Russell, Carl R.
2014-01-01
A shake test of the Large Rotor Test Apparatus (LRTA) was performed in an effort to enhance NASAscapability to measure dynamic hub loads for full-scale rotor tests. This paper documents the results of theshake test as well as efforts to calibrate the LRTA balance system to measure dynamic loads.Dynamic rotor loads are the primary source of vibration in helicopters and other rotorcraft, leading topassenger discomfort and damage due to fatigue of aircraft components. There are novel methods beingdeveloped to reduce rotor vibrations, but measuring the actual vibration reductions on full-scale rotorsremains a challenge. In order to measure rotor forces on the LRTA, a balance system in the non-rotatingframe is used. The forces at the balance can then be translated to the hub reference frame to measure therotor loads. Because the LRTA has its own dynamic response, the balance system must be calibrated toinclude the natural frequencies of the test rig.
NASA Astrophysics Data System (ADS)
Penner, Joyce E.; Andronova, Natalia; Oehmke, Robert C.; Brown, Jonathan; Stout, Quentin F.; Jablonowski, Christiane; van Leer, Bram; Powell, Kenneth G.; Herzog, Michael
2007-07-01
One of the most important advances needed in global climate models is the development of atmospheric General Circulation Models (GCMs) that can reliably treat convection. Such GCMs require high resolution in local convectively active regions, both in the horizontal and vertical directions. During previous research we have developed an Adaptive Mesh Refinement (AMR) dynamical core that can adapt its grid resolution horizontally. Our approach utilizes a finite volume numerical representation of the partial differential equations with floating Lagrangian vertical coordinates and requires resolving dynamical processes on small spatial scales. For the latter it uses a newly developed general-purpose library, which facilitates 3D block-structured AMR on spherical grids. The library manages neighbor information as the blocks adapt, and handles the parallel communication and load balancing, freeing the user to concentrate on the scientific modeling aspects of their code. In particular, this library defines and manages adaptive blocks on the sphere, provides user interfaces for interpolation routines and supports the communication and load-balancing aspects for parallel applications. We have successfully tested the library in a 2-D (longitude-latitude) implementation. During the past year, we have extended the library to treat adaptive mesh refinement in the vertical direction. Preliminary results are discussed. This research project is characterized by an interdisciplinary approach involving atmospheric science, computer science and mathematical/numerical aspects. The work is done in close collaboration between the Atmospheric Science, Computer Science and Aerospace Engineering Departments at the University of Michigan and NOAA GFDL.
NASA Astrophysics Data System (ADS)
Yue, Yingchao; Fan, Wenhui; Xiao, Tianyuan; Ma, Cheng
2013-07-01
High level architecture(HLA) is the open standard in the collaborative simulation field. Scholars have been paying close attention to theoretical research on and engineering applications of collaborative simulation based on HLA/RTI, which extends HLA in various aspects like functionality and efficiency. However, related study on the load balancing problem of HLA collaborative simulation is insufficient. Without load balancing, collaborative simulation under HLA/RTI may encounter performance reduction or even fatal errors. In this paper, load balancing is further divided into static problems and dynamic problems. A multi-objective model is established and the randomness of model parameters is taken into consideration for static load balancing, which makes the model more credible. The Monte Carlo based optimization algorithm(MCOA) is excogitated to gain static load balance. For dynamic load balancing, a new type of dynamic load balancing problem is put forward with regards to the variable-structured collaborative simulation under HLA/RTI. In order to minimize the influence against the running collaborative simulation, the ordinal optimization based algorithm(OOA) is devised to shorten the optimization time. Furthermore, the two algorithms are adopted in simulation experiments of different scenarios, which demonstrate their effectiveness and efficiency. An engineering experiment about collaborative simulation under HLA/RTI of high speed electricity multiple units(EMU) is also conducted to indentify credibility of the proposed models and supportive utility of MCOA and OOA to practical engineering systems. The proposed research ensures compatibility of traditional HLA, enhances the ability for assigning simulation loads onto computing units both statically and dynamically, improves the performance of collaborative simulation system and makes full use of the hardware resources.
NASA Astrophysics Data System (ADS)
Salafian, Iman; Stewart, Blake; Newman, Matthew; Zygielbaum, Arthur I.; Terry, Benjamin
2017-04-01
A four cable-driven parallel manipulator (CDPM), consisting of sophisticated spectrometers and imagers, is under development for use in acquiring phenotypic and environmental data over an acre-sized crop field. To obtain accurate and high quality data from the instruments, the end effector must be stable during sensing. One of the factors that reduces stability is the center of mass offset of the end effector, which can cause a pendulum effect or undesired tilt angle. The purpose of this work is to develop a system and method for balancing the center of mass of a 12th-scale CDPM to minimize vibration that can cause error in the acquired data. A simple method for balancing the end effector is needed to enable end users of the CDPM to arbitrarily add and remove sensors and imagers from the end effector as their experiments may require. A Center of Mass Balancing System (CMBS) is developed in this study which consists of an adjustable system of weights and a gimbal for tilt mitigation. An electronic circuit board including an orientation sensor, wireless data communication, and load cells was designed to validate the CMBS. To measure improvements gained by the CMBS, several static and dynamic experiments are carried out. In the experiments, the dynamic vibrations due to the translational motion and static orientation were measured with and without CMBS use. The results show that the CMBS system improves the stability of the end-effector by decreasing vibration and static tilt angle.
Method of up-front load balancing for local memory parallel processors
NASA Technical Reports Server (NTRS)
Baffes, Paul Thomas (Inventor)
1990-01-01
In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.
Computer Science Techniques Applied to Parallel Atomistic Simulation
NASA Astrophysics Data System (ADS)
Nakano, Aiichiro
1998-03-01
Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
Hielscher, Andreas H; Bartel, Sebastian
2004-02-01
Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
Dynamically allocating sets of fine-grained processors to running computations
NASA Technical Reports Server (NTRS)
Middleton, David
1988-01-01
Researchers explore an approach to using general purpose parallel computers which involves mapping hardware resources onto computations instead of mapping computations onto hardware. Problems such as processor allocation, task scheduling and load balancing, which have traditionally proven to be challenging, change significantly under this approach and may become amenable to new attacks. Researchers describe the implementation of this approach used by the FFP Machine whose computation and communication resources are repeatedly partitioned into disjoint groups that match the needs of available tasks from moment to moment. Several consequences of this system are examined.
Recent Progress on the Parallel Implementation of Moving-Body Overset Grid Schemes
NASA Technical Reports Server (NTRS)
Wissink, Andrew; Allen, Edwin (Technical Monitor)
1998-01-01
Viscous calculations about geometrically complex bodies in which there is relative motion between component parts is one of the most computationally demanding problems facing CFD researchers today. This presentation documents results from the first two years of a CHSSI-funded effort within the U.S. Army AFDD to develop scalable dynamic overset grid methods for unsteady viscous calculations with moving-body problems. The first pan of the presentation will focus on results from OVERFLOW-D1, a parallelized moving-body overset grid scheme that employs traditional Chimera methodology. The two processes that dominate the cost of such problems are the flow solution on each component and the intergrid connectivity solution. Parallel implementations of the OVERFLOW flow solver and DCF3D connectivity software are coupled with a proposed two-part static-dynamic load balancing scheme and tested on the IBM SP and Cray T3E multi-processors. The second part of the presentation will cover some recent results from OVERFLOW-D2, a new flow solver that employs Cartesian grids with various levels of refinement, facilitating solution adaption. A study of the parallel performance of the scheme on large distributed- memory multiprocessor computer architectures will be reported.
Monitoring dynamic loads on wind tunnel force balances
NASA Technical Reports Server (NTRS)
Ferris, Alice T.; White, William C.
1989-01-01
Two devices have been developed at NASA Langley to monitor the dynamic loads incurred during wind-tunnel testing. The Balance Dynamic Display Unit (BDDU), displays and monitors the combined static and dynamic forces and moments in the orthogonal axes. The Balance Critical Point Analyzer scales and sums each normalized signal from the BDDU to obtain combined dynamic and static signals that represent the dynamic loads at predefined high-stress points. The display of each instrument is a multiplex of six analog signals in a way that each channel is displayed sequentially as one-sixth of the horizontal axis on a single oscilloscope trace. Thus this display format permits the operator to quickly and easily monitor the combined static and dynamic level of up to six channels at the same time.
Predicting Flows of Rarefied Gases
NASA Technical Reports Server (NTRS)
LeBeau, Gerald J.; Wilmoth, Richard G.
2005-01-01
DSMC Analysis Code (DAC) is a flexible, highly automated, easy-to-use computer program for predicting flows of rarefied gases -- especially flows of upper-atmospheric, propulsion, and vented gases impinging on spacecraft surfaces. DAC implements the direct simulation Monte Carlo (DSMC) method, which is widely recognized as standard for simulating flows at densities so low that the continuum-based equations of computational fluid dynamics are invalid. DAC enables users to model complex surface shapes and boundary conditions quickly and easily. The discretization of a flow field into computational grids is automated, thereby relieving the user of a traditionally time-consuming task while ensuring (1) appropriate refinement of grids throughout the computational domain, (2) determination of optimal settings for temporal discretization and other simulation parameters, and (3) satisfaction of the fundamental constraints of the method. In so doing, DAC ensures an accurate and efficient simulation. In addition, DAC can utilize parallel processing to reduce computation time. The domain decomposition needed for parallel processing is completely automated, and the software employs a dynamic load-balancing mechanism to ensure optimal parallel efficiency throughout the simulation.
Distributed computing for membrane-based modeling of action potential propagation.
Porras, D; Rogers, J M; Smith, W M; Pollard, A E
2000-08-01
Action potential propagation simulations with physiologic membrane currents and macroscopic tissue dimensions are computationally expensive. We, therefore, analyzed distributed computing schemes to reduce execution time in workstation clusters by parallelizing solutions with message passing. Four schemes were considered in two-dimensional monodomain simulations with the Beeler-Reuter membrane equations. Parallel speedups measured with each scheme were compared to theoretical speedups, recognizing the relationship between speedup and code portions that executed serially. A data decomposition scheme based on total ionic current provided the best performance. Analysis of communication latencies in that scheme led to a load-balancing algorithm in which measured speedups at 89 +/- 2% and 75 +/- 8% of theoretical speedups were achieved in homogeneous and heterogeneous clusters of workstations. Speedups in this scheme with the Luo-Rudy dynamic membrane equations exceeded 3.0 with eight distributed workstations. Cluster speedups were comparable to those measured during parallel execution on a shared memory machine.
Load Balancing Strategies for Multiphase Flows on Structured Grids
NASA Astrophysics Data System (ADS)
Olshefski, Kristopher; Owkes, Mark
2017-11-01
The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.
Tile-based Level of Detail for the Parallel Age
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niski, K; Cohen, J D
Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
NASA Astrophysics Data System (ADS)
Chan, Chia-Hsin; Tu, Chun-Chuan; Tsai, Wen-Jiin
2017-01-01
High efficiency video coding (HEVC) not only improves the coding efficiency drastically compared to the well-known H.264/AVC but also introduces coding tools for parallel processing, one of which is tiles. Tile partitioning is allowed to be arbitrary in HEVC, but how to decide tile boundaries remains an open issue. An adaptive tile boundary (ATB) method is proposed to select a better tile partitioning to improve load balancing (ATB-LoadB) and coding efficiency (ATB-Gain) with a unified scheme. Experimental results show that, compared to ordinary uniform-space partitioning, the proposed ATB can save up to 17.65% of encoding times in parallel encoding scenarios and can reduce up to 0.8% of total bit rates for coding efficiency.
NASA Technical Reports Server (NTRS)
Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos
1996-01-01
An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.
Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study
DOE Office of Scientific and Technical Information (OSTI.GOV)
E, J. C.; Huang, J. Y.; Bie, B. X.
Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less
Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study
E, J. C.; Huang, J. Y.; Bie, B. X.; ...
2016-08-02
Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less
NASA Astrophysics Data System (ADS)
Work, Paul R.
1991-12-01
This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
Short-term Power Load Forecasting Based on Balanced KNN
NASA Astrophysics Data System (ADS)
Lv, Xianlong; Cheng, Xingong; YanShuang; Tang, Yan-mei
2018-03-01
To improve the accuracy of load forecasting, a short-term load forecasting model based on balanced KNN algorithm is proposed; According to the load characteristics, the historical data of massive power load are divided into scenes by the K-means algorithm; In view of unbalanced load scenes, the balanced KNN algorithm is proposed to classify the scene accurately; The local weighted linear regression algorithm is used to fitting and predict the load; Adopting the Apache Hadoop programming framework of cloud computing, the proposed algorithm model is parallelized and improved to enhance its ability of dealing with massive and high-dimension data. The analysis of the household electricity consumption data for a residential district is done by 23-nodes cloud computing cluster, and experimental results show that the load forecasting accuracy and execution time by the proposed model are the better than those of traditional forecasting algorithm.
Multi-agent grid system Agent-GRID with dynamic load balancing of cluster nodes
NASA Astrophysics Data System (ADS)
Satymbekov, M. N.; Pak, I. T.; Naizabayeva, L.; Nurzhanov, Ch. A.
2017-12-01
In this study the work presents the system designed for automated load balancing of the contributor by analysing the load of compute nodes and the subsequent migration of virtual machines from loaded nodes to less loaded ones. This system increases the performance of cluster nodes and helps in the timely processing of data. A grid system balances the work of cluster nodes the relevance of the system is the award of multi-agent balancing for the solution of such problems.
NASA Technical Reports Server (NTRS)
Harper, Richard
1989-01-01
In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
BALANCING THE LOAD: A VORONOI BASED SCHEME FOR PARALLEL COMPUTATIONS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinberg, Elad; Yalinewich, Almog; Sari, Re'em
2015-01-01
One of the key issues when running a simulation on multiple CPUs is maintaining a proper load balance throughout the run and minimizing communications between CPUs. We propose a novel method of utilizing a Voronoi diagram to achieve a nearly perfect load balance without the need of any global redistributions of data. As a show case, we implement our method in RICH, a two-dimensional moving mesh hydrodynamical code, but it can be extended trivially to other codes in two or three dimensions. Our tests show that this method is indeed efficient and can be used in a large variety ofmore » existing hydrodynamical codes.« less
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2014-01-01
This paper presents one-of-a-kind MPI-parallel computational fluid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft of a Boeing 747SP. These simulations focus on how the unsteady flow field inside and over the cavity interferes with the optical path and mounting of the telescope. A temporally fourth-order Runge-Kutta, and spatially fifth-order WENO-5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh refinement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32,000 cores and 4 billion cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregularities caused by the highly complex geometry. Limits to scaling beyond 32K cores are identified, and targeted code optimizations are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Littlefield, R.J.
1990-02-01
To implement an efficient data-parallel program on a non-shared memory MIMD multicomputer, data and computations must be properly partitioned to achieve good load balance and locality of reference. Programs with irregular data reference patterns often require irregular partitions. Although good partitions may be easy to determine, they can be difficult or impossible to implement in programming languages that provide only regular data distributions, such as blocked or cyclic arrays. We are developing Onyx, a programming system that provides a shared memory model of distributed data structures and extends the concept of data distribution to include irregular and dynamic distributions. Thismore » provides a powerful means to specify irregular partitions. Perhaps surprisingly, programs using it can also execute efficiently. In this paper, we describe and evaluate the Onyx implementation of a model problem that repeatedly executes an irregular but fixed data reference pattern. On an NCUBE hypercube, the speed of the Onyx implementation is comparable to that of carefully handwritten message-passing code.« less
Mapping a battlefield simulation onto message-passing parallel architectures
NASA Technical Reports Server (NTRS)
Nicol, David M.
1987-01-01
Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.
2015-03-01
We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
Adaptive Load-Balancing Algorithms Using Symmetric Broadcast Networks
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
In a distributed-computing environment, it is important to ensure that the processor workloads are adequately balanced. Among numerous load-balancing algorithms, a unique approach due to Dam and Prasad defines a symmetric broadcast network (SBN) that provides a robust communication pattern among the processors in a topology-independent manner. In this paper, we propose and analyze three novel SBN-based load-balancing algorithms, and implement them on an SP2. A thorough experimental study with Poisson-distributed synthetic loads demonstrates that these algorithms are very effective in balancing system load while minimizing processor idle time. They also compare favorably with several other existing load-balancing techniques. Additional experiments performed with real data demonstrate that the SBN approach is effective in adaptive computational science and engineering applications where dynamic load balancing is extremely crucial.
A Dynamic Calibration Method for Experimental and Analytical Hub Load Comparison
NASA Technical Reports Server (NTRS)
Kreshock, Andrew R.; Thornburgh, Robert P.; Wilbur, Matthew L.
2017-01-01
This paper presents the results from an ongoing effort to produce improved correlation between analytical hub force and moment prediction and those measured during wind-tunnel testing on the Aeroelastic Rotor Experimental System (ARES), a conventional rotor testbed commonly used at the Langley Transonic Dynamics Tunnel (TDT). A frequency-dependent transformation between loads at the rotor hub and outputs of the testbed balance is produced from frequency response functions measured during vibration testing of the system. The resulting transformation is used as a dynamic calibration of the balance to transform hub loads predicted by comprehensive analysis into predicted balance outputs. In addition to detailing the transformation process, this paper also presents a set of wind-tunnel test cases, with comparisons between the measured balance outputs and transformed predictions from the comprehensive analysis code CAMRAD II. The modal response of the testbed is discussed and compared to a detailed finite-element model. Results reveal that the modal response of the testbed exhibits a number of characteristics that make accurate dynamic balance predictions challenging, even with the use of the balance transformation.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Analytical study of pressure balancing in gas film seals
NASA Technical Reports Server (NTRS)
Zuk, J.
1973-01-01
The load factor is investigated for subsonic and choked flow conditions, laminar and turbulent flows, and various seal entrance conditions. Both parallel sealing surfaces and surfaces with small linear deformation were investigated. The load factor for subsonic flow depends strongly on pressure ratio; under choked flow conditions, however the load factor is found to depend more strongly on film thickness and flow entrance conditions rather than pressure ratio. The importance of generating hydrodynamic forces to keep the seal balanced under severe and multipoint operation is also discussed.
Development and Application of a Parallel LCAO Cluster Method
NASA Astrophysics Data System (ADS)
Patton, David C.
1997-08-01
CPU intensive steps in the SCF electronic structure calculations of clusters and molecules with a first-principles LCAO method have been fully parallelized via a message passing paradigm. Identification of the parts of the code that are composed of many independent compute-intensive steps is discussed in detail as they are the most readily parallelized. Most of the parallelization involves spatially decomposing numerical operations on a mesh. One exception is the solution of Poisson's equation which relies on distribution of the charge density and multipole methods. The method we use to parallelize this part of the calculation is quite novel and is covered in detail. We present a general method for dynamically load-balancing a parallel calculation and discuss how we use this method in our code. The results of benchmark calculations of the IR and Raman spectra of PAH molecules such as anthracene (C_14H_10) and tetracene (C_18H_12) are presented. These benchmark calculations were performed on an IBM SP2 and a SUN Ultra HPC server with both MPI and PVM. Scalability and speedup for these calculations is analyzed to determine the efficiency of the code. In addition, performance and usage issues for MPI and PVM are presented.
Unstructured P2P Network Load Balance Strategy Based on Multilevel Partitioning of Hypergraph
NASA Astrophysics Data System (ADS)
Feng, Lv; Chunlin, Gao; Kaiyang, Ma
2017-05-01
With rapid development of computer performance and distributed technology, P2P-based resource sharing mode plays important role in Internet. P2P network users continued to increase so the high dynamic characteristics of the system determine that it is difficult to obtain the load of other nodes. Therefore, a dynamic load balance strategy based on hypergraph is proposed in this article. The scheme develops from the idea of hypergraph theory in multilevel partitioning. It adopts optimized multilevel partitioning algorithms to partition P2P network into several small areas, and assigns each area a supernode for the management and load transferring of the nodes in this area. In the case of global scheduling is difficult to be achieved, the priority of a number of small range of load balancing can be ensured first. By the node load balance in each small area the whole network can achieve relative load balance. The experiments indicate that the load distribution of network nodes in our scheme is obviously compacter. It effectively solves the unbalanced problems in P2P network, which also improve the scalability and bandwidth utilization of system.
Adaptive Load-Balancing Algorithms using Symmetric Broadcast Networks
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan A. (Technical Monitor)
2002-01-01
In a distributed computing environment, it is important to ensure that the processor workloads are adequately balanced, Among numerous load-balancing algorithms, a unique approach due to Das and Prasad defines a symmetric broadcast network (SBN) that provides a robust communication pattern among the processors in a topology-independent manner. In this paper, we propose and analyze three efficient SBN-based dynamic load-balancing algorithms, and implement them on an SGI Origin2000. A thorough experimental study with Poisson distributed synthetic loads demonstrates that our algorithms are effective in balancing system load. By optimizing completion time and idle time, the proposed algorithms are shown to compare favorably with several existing approaches.
NASA Astrophysics Data System (ADS)
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Evolution of a minimal parallel programming model
Lusk, Ewing; Butler, Ralph; Pieper, Steven C.
2017-04-30
Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generalitymore » and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.« less
NASA Technical Reports Server (NTRS)
Kikuchi, Hideaki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Saini, Subhash
2003-01-01
Scalability of a low-cost, Intel Xeon-based, multi-Teraflop Linux cluster is tested for two high-end scientific applications: Classical atomistic simulation based on the molecular dynamics method and quantum mechanical calculation based on the density functional theory. These scalable parallel applications use space-time multiresolution algorithms and feature computational-space decomposition, wavelet-based adaptive load balancing, and spacefilling-curve-based data compression for scalable I/O. Comparative performance tests are performed on a 1,024-processor Linux cluster and a conventional higher-end parallel supercomputer, 1,184-processor IBM SP4. The results show that the performance of the Linux cluster is comparable to that of the SP4. We also study various effects, such as the sharing of memory and L2 cache among processors, on the performance.
An Analysis of Performance Enhancement Techniques for Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)
2002-01-01
The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajbhandari, Samyam; NIkam, Akshay; Lai, Pai-Wei
Tensor contractions represent the most compute-intensive core kernels in ab initio computational quantum chemistry and nuclear physics. Symmetries in these tensor contractions makes them difficult to load balance and scale to large distributed systems. In this paper, we develop an efficient and scalable algorithm to contract symmetric tensors. We introduce a novel approach that avoids data redistribution in contracting symmetric tensors while also avoiding redundant storage and maintaining load balance. We present experimental results on two parallel supercomputers for several symmetric contractions that appear in the CCSD quantum chemistry method. We also present a novel approach to tensor redistribution thatmore » can take advantage of parallel hyperplanes when the initial distribution has replicated dimensions, and use collective broadcast when the final distribution has replicated dimensions, making the algorithm very efficient.« less
Large-Scale Distributed Computational Fluid Dynamics on the Information Power Grid Using Globus
NASA Technical Reports Server (NTRS)
Barnard, Stephen; Biswas, Rupak; Saini, Subhash; VanderWijngaart, Robertus; Yarrow, Maurice; Zechtzer, Lou; Foster, Ian; Larsson, Olle
1999-01-01
This paper describes an experiment in which a large-scale scientific application development for tightly-coupled parallel machines is adapted to the distributed execution environment of the Information Power Grid (IPG). A brief overview of the IPG and a description of the computational fluid dynamics (CFD) algorithm are given. The Globus metacomputing toolkit is used as the enabling device for the geographically-distributed computation. Modifications related to latency hiding and Load balancing were required for an efficient implementation of the CFD application in the IPG environment. Performance results on a pair of SGI Origin 2000 machines indicate that real scientific applications can be effectively implemented on the IPG; however, a significant amount of continued effort is required to make such an environment useful and accessible to scientists and engineers.
Parallel/Vector Integration Methods for Dynamical Astronomy
NASA Astrophysics Data System (ADS)
Fukushima, Toshio
1999-01-01
This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
Parallelization Issues and Particle-In Codes.
NASA Astrophysics Data System (ADS)
Elster, Anne Cathrine
1994-01-01
"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.
Scan line graphics generation on the massively parallel processor
NASA Technical Reports Server (NTRS)
Dorband, John E.
1988-01-01
Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
1989-03-01
comparison between the two. Tyre self-excited vibration can be caused by lack of uniforuity and/or out-of-balance. The authors suggest that driving ... safety is best described by the ’Dynamic Load Factor’ which relates the ainimum rolling dynamic load to the static tyre load. Dynamic Load Factors are
Validation of a robotic balance system for investigations in the control of human standing balance.
Luu, Billy L; Huryn, Thomas P; Van der Loos, H F Machiel; Croft, Elizabeth A; Blouin, Jean-Sébastien
2011-08-01
Previous studies have shown that human body sway during standing approximates the mechanics of an inverted pendulum pivoted at the ankle joints. In this study, a robotic balance system incorporating a Stewart platform base was developed to provide a new technique to investigate the neural mechanisms involved in standing balance. The robotic system, programmed with the mechanics of an inverted pendulum, controlled the motion of the body in response to a change in applied ankle torque. The ability of the robotic system to replicate the load properties of standing was validated by comparing the load stiffness generated when subjects balanced their own body to the robot's mechanical load programmed with a low (concentrated-mass model) or high (distributed-mass model) inertia. The results show that static load stiffness was not significantly (p > 0.05) different for standing and the robotic system. Dynamic load stiffness for the robotic system increased with the frequency of sway, as predicted by the mechanics of an inverted pendulum, with the higher inertia being accurately matched to the load properties of the human body. This robotic balance system accurately replicated the physical model of standing and represents a useful tool to simulate the dynamics of a standing person. © 2011 IEEE
Robertazzi, Thomas G.; Skiena, Steven; Wang, Kai
2017-08-08
Provided are an apparatus and method for load-balancing of a three-phase electric power distribution system having a multi-phase feeder, including obtaining topology information of the feeder identifying supply points for customer loads and feeder sections between the supply points, obtaining customer information that includes peak customer load at each of the points between each of the feeder sections, performing a phase balancing analysis, and recommending phase assignment at the customer load supply points.
Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2016-01-01
This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Parallel Adaptive High-Order CFD Simulations Characterizing SOFIA Cavitiy Acoustics
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2015-01-01
This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A tempo- rally fourth-order accurate Runge-Kutta, and a spatially fth-order accurate WENO-5Z scheme were used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Performance Enhancement Strategies for Multi-Block Overset Grid CFD Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak
2003-01-01
The overset grid methodology has significantly reduced time-to-solution of highfidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement strategies on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machinc. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Details of a sophisticated graph partitioning technique for grid grouping are also provided. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.
Simulation and Analysis of Three-Phase Rectifiers for Aerospace Power Applications
NASA Technical Reports Server (NTRS)
Truong, Long V.; Birchenough, Arthur G.
2004-01-01
Due to the nature of planned planetary missions, fairly large advanced power systems are required for the spacecraft. These future high power spacecrafts are expected to use dynamic power conversion systems incorporating high speed alternators as three-phase AC electrical power source. One of the early design considerations in such systems is the type of rectification to be used with the AC source for DC user loads. This paper address the issues involved with two different rectification methods, namely the conventional six and twelve pulses. Two circuit configurations which involved parallel combinations of the six and twelve-pulse rectifiers were selected for the simulation. The rectifier s input and output power waveforms will be thoroughly examined through simulations. The effects of the parasitic load for power balancing and filter components for reducing the ripple voltage at the DC loads are also included in the analysis. Details of the simulation circuits, simulation results, and design examples for reducing risk from damaging of spacecraft engines will be presented and discussed.
A Latency-Tolerant Partitioner for Distributed Computing on the Information Power Grid
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biwas, Rupak; Kwak, Dochan (Technical Monitor)
2001-01-01
NASA's Information Power Grid (IPG) is an infrastructure designed to harness the power of graphically distributed computers, databases, and human expertise, in order to solve large-scale realistic computational problems. This type of a meta-computing environment is necessary to present a unified virtual machine to application developers that hides the intricacies of a highly heterogeneous environment and yet maintains adequate security. In this paper, we present a novel partitioning scheme. called MinEX, that dynamically balances processor workloads while minimizing data movement and runtime communication, for applications that are executed in a parallel distributed fashion on the IPG. We also analyze the conditions that are required for the IPG to be an effective tool for such distributed computations. Our results show that MinEX is a viable load balancer provided the nodes of the IPG are connected by a high-speed asynchronous interconnection network.
Split torque transmission load sharing
NASA Technical Reports Server (NTRS)
Krantz, T. L.; Rashidi, M.; Kish, J. G.
1992-01-01
Split torque transmissions are attractive alternatives to conventional planetary designs for helicopter transmissions. The split torque designs can offer lighter weight and fewer parts but have not been used extensively for lack of experience, especially with obtaining proper load sharing. Two split torque designs that use different load sharing methods have been studied. Precise indexing and alignment of the geartrain to produce acceptable load sharing has been demonstrated. An elastomeric torque splitter that has large torsional compliance and damping produces even better load sharing while reducing dynamic transmission error and noise. However, the elastomeric torque splitter as now configured is not capable over the full range of operating conditions of a fielded system. A thrust balancing load sharing device was evaluated. Friction forces that oppose the motion of the balance mechanism are significant. A static analysis suggests increasing the helix angle of the input pinion of the thrust balancing design. Also, dynamic analysis of this design predicts good load sharing and significant torsional response to accumulative pitch errors of the gears.
Variable Acceleration Force Calibration System (VACS)
NASA Technical Reports Server (NTRS)
Rhew, Ray D.; Parker, Peter A.; Johnson, Thomas H.; Landman, Drew
2014-01-01
Conventionally, force balances have been calibrated manually, using a complex system of free hanging precision weights, bell cranks, and/or other mechanical components. Conventional methods may provide sufficient accuracy in some instances, but are often quite complex and labor-intensive, requiring three to four man-weeks to complete each full calibration. To ensure accuracy, gravity-based loading is typically utilized. However, this often causes difficulty when applying loads in three simultaneous, orthogonal axes. A complex system of levers, cranks, and cables must be used, introducing increased sources of systematic error, and significantly increasing the time and labor intensity required to complete the calibration. One aspect of the VACS is a method wherein the mass utilized for calibration is held constant, and the acceleration is changed to thereby generate relatively large forces with relatively small test masses. Multiple forces can be applied to a force balance without changing the test mass, and dynamic forces can be applied by rotation or oscillating acceleration. If rotational motion is utilized, a mass is rigidly attached to a force balance, and the mass is exposed to a rotational field. A large force can be applied by utilizing a large rotational velocity. A centrifuge or rotating table can be used to create the rotational field, and fixtures can be utilized to position the force balance. The acceleration may also be linear. For example, a table that moves linearly and accelerates in a sinusoidal manner may also be utilized. The test mass does not have to move in a path that is parallel to the ground, and no re-leveling is therefore required. Balance deflection corrections may be applied passively by monitoring the orientation of the force balance with a three-axis accelerometer package. Deflections are measured during each test run, and adjustments with respect to the true applied load can be made during the post-processing stage. This paper will present the development and testing of the VASC concept.
Energy to the Edge (E2E) U.S. Army Rapid Equipping Force
2014-03-21
generators, parallel multiple sources, prioritize loads, and balance loads. Smart grids are based on complex algorithms and controls. 3. Reduce...stations are not able to be s rviced by prim power because of their location in the middle of a very active airfield and fueling a syst m that c ist
Visual display and alarm system for wind tunnel static and dynamic loads
NASA Technical Reports Server (NTRS)
Hanly, Richard D.; Fogarty, James T.
1987-01-01
A wind tunnel balance monitor and alarm system developed at NASA Ames Research Center will produce several beneficial results. The costs of wind tunnel delays because of inadvertent balance damage and the costs of balance repair or replacement can be greatly reduced or eliminated with better real-time information on the balance static and dynamic loading. The wind tunnel itself will have enhanced utility with the elimination of overly cautious limits on test conditions. The microprocessor-based system features automatic scaling and 16 multicolored LED bargraphs to indicate both static and dynamic components of the signals from eight individual channels. Five individually programmable alarm levels are available with relay closures for internal or external visual and audible warning devices and other functions such as automatic activation of external recording devices, model positioning mechanisms, or tunnel shutdown.
Visual display and alarm system for wind tunnel static and dynamic loads
NASA Technical Reports Server (NTRS)
Hanly, Richard D.; Fogarty, James T.
1987-01-01
A wind tunnel balance monitor and alarm system developed at NASA Ames Research Center will produce several beneficial results. The costs of wind tunnel delays because of inadvertent balance damage and the costs of balance repair or replacement can be greatly reduced or eliminated with better real-time information on the balance static and dynamic loading. The wind tunnel itself will have enhanced utility with the elimination of overly cautious limits on test conditions. The microprocessor-based system features automatic scaling and 16 multicolored LED bargraphs to indicate both static and dynamic components of the signals from eight individual channels. Five individually programmable alarm levels are available with relay closures for internal or external visual and audible warning devices and other functions such as automatic activation of external recording devices, model positioning mechanism, or tunnel shutdown.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arumugam, Kamesh
Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Zhang, Yingchen; Muljadi, Eduard
In this paper, a short-term load forecasting approach based network reconfiguration is proposed in a parallel manner. Specifically, a support vector regression (SVR) based short-term load forecasting approach is designed to provide an accurate load prediction and benefit the network reconfiguration. Because of the nonconvexity of the three-phase balanced optimal power flow, a second-order cone program (SOCP) based approach is used to relax the optimal power flow problem. Then, the alternating direction method of multipliers (ADMM) is used to compute the optimal power flow in distributed manner. Considering the limited number of the switches and the increasing computation capability, themore » proposed network reconfiguration is solved in a parallel way. The numerical results demonstrate the feasible and effectiveness of the proposed approach.« less
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.
1989-01-01
Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
Evaluating SPLASH-2 Applications Using MapReduce
NASA Astrophysics Data System (ADS)
Zhu, Shengkai; Xiao, Zhiwei; Chen, Haibo; Chen, Rong; Zhang, Weihua; Zang, Binyu
MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others.
A nonrecursive 'Order N' preconditioned conjugate gradient/range space formulation of MDOF dynamics
NASA Technical Reports Server (NTRS)
Kurdila, A. J.; Menon, R.; Sunkel, John
1991-01-01
This paper addresses the requirements of present-day mechanical system simulations of algorithms that induce parallelism on a fine scale and of transient simulation methods which must be automatically load balancing for a wide collection of system topologies and hardware configurations. To this end, a combination range space/preconditioned conjugage gradient formulation of multidegree-of-freedon dynamics is developed, which, by employing regular ordering of the system connectivity graph, makes it possible to derive an extremely efficient preconditioner from the range space metric (as opposed to the system coefficient matrix). Because of the effectiveness of the preconditioner, the method can achieve performance rates that depend linearly on the number of substructures. The method, termed 'Order N' does not require the assembly of system mass or stiffness matrices, and is therefore amenable to implementation on work stations. Using this method, a 13-substructure model of the Space Station was constructed.
NASA Astrophysics Data System (ADS)
Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.
2018-05-01
In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
NASA Astrophysics Data System (ADS)
Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.
2018-01-01
In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
Inverse Force Determination on a Small Scale Launch Vehicle Model Using a Dynamic Balance
NASA Technical Reports Server (NTRS)
Ngo, Christina L.; Powell, Jessica M.; Ross, James C.
2017-01-01
A launch vehicle can experience large unsteady aerodynamic forces in the transonic regime that, while usually only lasting for tens of seconds during launch, could be devastating if structural components and electronic hardware are not designed to account for them. These aerodynamic loads are difficult to experimentally measure and even harder to computationally estimate. The current method for estimating buffet loads is through the use of a few hundred unsteady pressure transducers and wind tunnel test. Even with a large number of point measurements, the computed integrated load is not an accurate enough representation of the total load caused by buffeting. This paper discusses an attempt at using a dynamic balance to experimentally determine buffet loads on a generic scale hammer head launch vehicle model tested at NASA Ames Research Center's 11' x 11' transonic wind tunnel. To use a dynamic balance, the structural characteristics of the model needed to be identified so that the natural modal response could be and removed from the aerodynamic forces. A finite element model was created on a simplified version of the model to evaluate the natural modes of the balance flexures, assist in model design, and to compare to experimental data. Several modal tests were conducted on the model in two different configurations to check for non-linearity, and to estimate the dynamic characteristics of the model. The experimental results were used in an inverse force determination technique with a psuedo inverse frequency response function. Due to the non linearity, the model not being axisymmetric, and inconsistent data between the two shake tests from different mounting configuration, it was difficult to create a frequency response matrix that satisfied all input and output conditions for wind tunnel configuration to accurately predict unsteady aerodynamic loads.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Zuwei; Zhao, Haibo, E-mail: klinsmannzhb@163.com; Zheng, Chuguang
2015-01-15
This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule providesmore » a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are demonstrated in a physically realistic Brownian coagulation case. The computational accuracy is validated with benchmark solution of discrete-sectional method. The simulation results show that the comprehensive approach can attain very favorable improvement in cost without sacrificing computational accuracy.« less
Analysis of tasks for dynamic man/machine load balancing in advanced helicopters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jorgensen, C.C.
1987-10-01
This report considers task allocation requirements imposed by advanced helicopter designs incorporating mixes of human pilots and intelligent machines. Specifically, it develops an analogy between load balancing using distributed non-homogeneous multiprocessors and human team functions. A taxonomy is presented which can be used to identify task combinations likely to cause overload for dynamic scheduling and process allocation mechanisms. Designer criteria are given for function decomposition, separation of control from data, and communication handling for dynamic tasks. Possible effects of n-p complete scheduling problems are noted and a class of combinatorial optimization methods are examined.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1997-01-01
Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
NASA Astrophysics Data System (ADS)
Yan, Beichuan; Regueiro, Richard A.
2018-02-01
A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.
Hines, Michael L; Eichner, Hubert; Schürmann, Felix
2008-08-01
Neuron tree topology equations can be split into two subtrees and solved on different processors with no change in accuracy, stability, or computational effort; communication costs involve only sending and receiving two double precision values by each subtree at each time step. Splitting cells is useful in attaining load balance in neural network simulations, especially when there is a wide range of cell sizes and the number of cells is about the same as the number of processors. For compute-bound simulations load balance results in almost ideal runtime scaling. Application of the cell splitting method to two published network models exhibits good runtime scaling on twice as many processors as could be effectively used with whole-cell balancing.
Scalable and balanced dynamic hybrid data assimilation
NASA Astrophysics Data System (ADS)
Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa
2017-04-01
Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them implemented as parallel model runs themselves. The only bottleneck in the process is the gathering and scattering of initial and final model state snapshots before and after the parallel runs which requires a very efficient and low-latency communication network. However, the volume of data communicated is small and the intervening minimization steps are only 3D-Var, which means their computational load is negligible compared with the fully parallel model runs. We present example results of scalable VEnKF with the 4D lake and shallow sea model COHERENS, assimilating simultaneously continuous in situ measurements in a single point and infrequent satellite images that cover a whole lake, with the fully scalable VEnKF.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Huang, Zhenyu; Chavarría-Miranda, Daniel
Contingency analysis is a key function in the Energy Management System (EMS) to assess the impact of various combinations of power system component failures based on state estimation. Contingency analysis is also extensively used in power market operation for feasibility test of market solutions. High performance computing holds the promise of faster analysis of more contingency cases for the purpose of safe and reliable operation of today’s power grids with less operating margin and more intermittent renewable energy sources. This paper evaluates the performance of counter-based dynamic load balancing schemes for massive contingency analysis under different computing environments. Insights frommore » the performance evaluation can be used as guidance for users to select suitable schemes in the application of massive contingency analysis. Case studies, as well as MATLAB simulations, of massive contingency cases using the Western Electricity Coordinating Council power grid model are presented to illustrate the application of high performance computing with counter-based dynamic load balancing schemes.« less
Treadmill Exercise with Increased Body Loading Enhances Post Flight Functional Performance
NASA Technical Reports Server (NTRS)
Bloomberg, J. J.; Batson, C. D.; Buxton, R. E.; Feiveson, A. H.; Kofman, I. S.; Laurie, S.; Lee, S. M. C.; Miller, C. A.; Mulavara, A. P.; Peters, B. T.;
2014-01-01
The goals of the Functional Task Test (FTT) study were to determine the effects of space flight on functional tests that are representative of high priority exploration mission tasks and to identify the key underlying physiological factors that contribute to decrements in performance. Ultimately this information will be used to assess performance risks and inform the design of countermeasures for exploration class missions. We have previously shown that for Shuttle, ISS and bed rest subjects functional tasks requiring a greater demand for dynamic control of postural equilibrium (i.e. fall recovery, seat egress/obstacle avoidance during walking, object translation, jump down) showed the greatest decrement in performance. Functional tests with reduced requirements for postural stability (i.e. hatch opening, ladder climb, manual manipulation of objects and tool use) showed little reduction in performance. These changes in functional performance were paralleled by similar decrements in sensorimotor tests designed to specifically assess postural equilibrium and dynamic gait control. The bed rest analog allows us to investigate the impact of axial body unloading in isolation on both functional tasks and on the underlying physiological factors that lead to decrements in performance and then compare them with the results obtained in our space flight study. These results indicate that body support unloading experienced during space flight plays a central role in postflight alteration of functional task performance. Given the importance of body-support loading we set out to determine if there is a relationship between the load experienced during inflight treadmill exercise (produced by a harness and bungee system) and postflight functional performance. ISS crewmembers (n=13) were tested using the FTT protocol before and after 6 months in space. Crewmembers were tested three times before flight, and on 1, 6, and 30 days after landing. To determine how differences in body-support loading experienced during inflight treadmill exercise impacts postflight functional performance, the loading history for each subject during inflight treadmill (T2) exercise was correlated with postflight measures of performance. Crewmembers who walked on the treadmill with higher pull-down loads had less decrement in postflight postural stability and dynamic locomotor control than those subjects who exercised with lighter loads. These data point to the importance of providing significant body loading during inflight treadmill exercise. This and the addition of specific balance training may further mitigate decrements in critical mission tasks that require dynamic postural stability and mobility. Inflight treadmill exercise provides a multi-disciplinary platform to provide sensorimotor, aerobic and bone mechanical stimuli benefits. Forward work will focus on the development of an inflight training system that will integrate aerobic, resistive and balance training modalities into a single interdisciplinary countermeasure system for exploration class missions.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Concurrent processing simulation of the space station
NASA Technical Reports Server (NTRS)
Gluck, R.; Hale, A. L.; Sunkel, John W.
1989-01-01
The development of a new capability for the time-domain simulation of multibody dynamic systems and its application to the study of a large angle rotational maneuvers of the Space Station is described. The effort was divided into three sequential tasks, which required significant advancements of the state-of-the art to accomplish. These were: (1) the development of an explicit mathematical model via symbol manipulation of a flexible, multibody dynamic system; (2) the development of a methodology for balancing the computational load of an explicit mathematical model for concurrent processing; and (3) the implementation and successful simulation of the above on a prototype Custom Architectured Parallel Processing System (CAPPS) containing eight processors. The throughput rate achieved by the CAPPS operating at only 70 percent efficiency, was 3.9 times greater than that obtained sequentially by the IBM 3090 supercomputer simulating the same problem. More significantly, analysis of the results leads to the conclusion that the relative cost effectiveness of concurrent vs. sequential digital computation will grow substantially as the computational load is increased. This is a welcomed development in an era when very complex and cumbersome mathematical models of large space vehicles must be used as substitutes for full scale testing which has become impractical.
The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation
NASA Astrophysics Data System (ADS)
Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru
1999-09-01
We have developed a new parallel tree method which will be called the forest method hereafter. This new method uses the sectional Voronoi tessellation (SVT) for the domain decomposition. The SVT decomposes a whole space into polyhedra and allows their flat borders to move by assigning different weights. The forest method determines these weights based on the load balancing among processors by means of the overload diffusion (OLD). Moreover, since all the borders are flat, before receiving the data from other processors, each processor can collect enough data to calculate the gravity force with precision. Both the SVT and the OLD are coded in a highly vectorizable manner to accommodate on vector parallel processors. The parallel code based on the forest method with the Message Passing Interface is run on various platforms so that a wide portability is guaranteed. Extensive calculations with 15 processors of Fujitsu VPP300/16R indicate that the code can calculate the gravity force exerted on 105 particles in each second for some ideal dark halo. This code is found to enable an N-body simulation with 107 or more particles for a wide dynamic range and is therefore a very powerful tool for the study of galaxy formation and large-scale structure in the universe.
NASA Astrophysics Data System (ADS)
Frey, Davide; Guerraoui, Rachid; Kermarrec, Anne-Marie; Koldehofe, Boris; Mogensen, Martin; Monod, Maxime; Quéma, Vivien
Gossip-based information dissemination protocols are considered easy to deploy, scalable and resilient to network dynamics. Load-balancing is inherent in these protocols as the dissemination work is evenly spread among all nodes. Yet, large-scale distributed systems are usually heterogeneous with respect to network capabilities such as bandwidth. In practice, a blind load-balancing strategy might significantly hamper the performance of the gossip dissemination.
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response
NASA Technical Reports Server (NTRS)
Abdi, Frank
1996-01-01
A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.
Perry, Stephen D; Radtke, Alison; Goodwin, Chris R
2007-01-01
The purpose of this study was to determine the influence of different midsole hardnesses on dynamic balance control during unexpected gait termination. Twelve healthy young female adults were asked to walk along an 8-m walkway, looking straight ahead. During 25% of the trials, they were signaled (via an audio buzzer) to terminate gait within the next two steps. The four experimental conditions were: (1) soft (A15); (2) standard (A33); (3) hard (A50); (4) barefoot. Center of mass (COM) position relative to the lateral base of support (BOS), center of mass-center of pressure (COM-COP) difference and vertical loading rate were used to evaluate the influence of midsole material on dynamic balance control. The results were a decrease in the medial-lateral range of COM with respect to the lateral BOS, a reduction in the maximum COM-COP difference and an increase in the vertical loading rate due to the presence and hardness level of the midsole material when compared to the barefoot condition. The primary outcomes of this study have illustrated the influence of midsole hardness as an impediment to dynamic balance control during responses to gait termination. In conclusion, the present study suggests that variations in midsole material and even the presence of it, impairs the dynamic balance control system.
Parallel deterministic neutronics with AMR in 3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clouse, C.; Ferguson, J.; Hendrickson, C.
1997-12-31
AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.
Database Reorganization in Parallel Disk Arrays with I/O Service Stealing
NASA Technical Reports Server (NTRS)
Zabback, Peter; Onyuksel, Ibrahim; Scheuermann, Peter; Weikum, Gerhard
1996-01-01
We present a model for data reorganization in parallel disk systems that is geared towards load balancing in an environment with periodic access patterns. Data reorganization is performed by disk cooling, i.e. migrating files or extents from the hottest disks to the coldest ones. We develop an approximate queueing model for determining the effective arrival rates of cooling requests and discuss its use in assessing the costs versus benefits of cooling.
Studying effects of non-equilibrium radiative transfer via HPC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holladay, Daniel
This report presents slides on Ph.D. Research Goals; Local Thermodynamic Equilibrium (LTE) Implications; Calculating an Opacity; Opacity: Pictographic Representation; Opacity: Pictographic Representation; Opacity: Pictographic Representation; Collisional Radiative Modeling; Radiative and Collisional Excitation; Photo and Electron Impact Ionization; Autoionization; The Rate Matrix; Example: Total Photoionization rate; The Rate Coefficients; inlinlte version 1.1; inlinlte: Verification; New capabilities: Rate Matrix – Flexibility; Memory Option Comparison; Improvements over previous DCA solver; Inter- and intra-node load balancing; Load Balance – Full Picture; Load Balance – Full Picture; Load Balance – Internode; Load Balance – Scaling; Description; Performance; xRAGE Simulation; Post-process @ 2hr; Post-process @ 4hr;more » Post-process @ 8hr; Takeaways; Performance for 1 realization; Motivation for QOI; Multigroup Er; Transport and NLTE large effects (1mm, 1keV); Transport large effect, NLTE lesser (1mm, 750eV); Blastwave Diagnostici – Description & Performance; Temperature Comparison; NLTE has effect on dynamics at wall; NLTE has lesser effect in the foam; Global Takeaways; The end.« less
Research on a Method of Geographical Information Service Load Balancing
NASA Astrophysics Data System (ADS)
Li, Heyuan; Li, Yongxing; Xue, Zhiyong; Feng, Tao
2018-05-01
With the development of geographical information service technologies, how to achieve the intelligent scheduling and high concurrent access of geographical information service resources based on load balancing is a focal point of current study. This paper presents an algorithm of dynamic load balancing. In the algorithm, types of geographical information service are matched with the corresponding server group, then the RED algorithm is combined with the method of double threshold effectively to judge the load state of serve node, finally the service is scheduled based on weighted probabilistic in a certain period. At the last, an experiment system is built based on cluster server, which proves the effectiveness of the method presented in this paper.
I/O load balancing for big data HPC applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paul, Arnab K.; Goyal, Arpit; Wang, Feiyi
High Performance Computing (HPC) big data problems require efficient distributed storage systems. However, at scale, such storage systems often experience load imbalance and resource contention due to two factors: the bursty nature of scientific application I/O; and the complex I/O path that is without centralized arbitration and control. For example, the extant Lustre parallel file system-that supports many HPC centers-comprises numerous components connected via custom network topologies, and serves varying demands of a large number of users and applications. Consequently, some storage servers can be more loaded than others, which creates bottlenecks and reduces overall application I/O performance. Existing solutionsmore » typically focus on per application load balancing, and thus are not as effective given their lack of a global view of the system. In this paper, we propose a data-driven approach to load balance the I/O servers at scale, targeted at Lustre deployments. To this end, we design a global mapper on Lustre Metadata Server, which gathers runtime statistics from key storage components on the I/O path, and applies Markov chain modeling and a minimum-cost maximum-flow algorithm to decide where data should be placed. Evaluation using a realistic system simulator and a real setup shows that our approach yields better load balancing, which in turn can improve end-to-end performance.« less
Population-based learning of load balancing policies for a distributed computer system
NASA Technical Reports Server (NTRS)
Mehra, Pankaj; Wah, Benjamin W.
1993-01-01
Effective load-balancing policies use dynamic resource information to schedule tasks in a distributed computer system. We present a novel method for automatically learning such policies. At each site in our system, we use a comparator neural network to predict the relative speedup of an incoming task using only the resource-utilization patterns obtained prior to the task's arrival. Outputs of these comparator networks are broadcast periodically over the distributed system, and the resource schedulers at each site use these values to determine the best site for executing an incoming task. The delays incurred in propagating workload information and tasks from one site to another, as well as the dynamic and unpredictable nature of workloads in multiprogrammed multiprocessors, may cause the workload pattern at the time of execution to differ from patterns prevailing at the times of load-index computation and decision making. Our load-balancing policy accommodates this uncertainty by using certain tunable parameters. We present a population-based machine-learning algorithm that adjusts these parameters in order to achieve high average speedups with respect to local execution. Our results show that our load-balancing policy, when combined with the comparator neural network for workload characterization, is effective in exploiting idle resources in a distributed computer system.
Dynamic balance abilities of collegiate men for the bench press.
Piper, Timothy J; Radlo, Steven J; Smith, Thomas J; Woodward, Ryan W
2012-12-01
This study investigated the dynamic balance detection ability of college men for the bench press exercise. Thirty-five college men (mean ± SD: age = 22.4 ± 2.76 years, bench press experience = 8.3 ± 2.79 years, and estimated 1RM = 120.1 ± 21.8 kg) completed 1 repetition of the bench press repetitions for each of 3 bar loading arrangements. In a randomized fashion, subjects performed the bench press with a 20-kg barbell loaded with one of the following: a balanced load, one 20-kg plate on each side; an imbalanced asymmetrical load, one 20-kg plate on one side and a 20-kg plate plus a 1.25-kg plate on the other side; or an imbalanced asymmetrical center of mass, 20-kg plate on one side and sixteen 1.25-kg plates on the other side. Subjects were blindfolded and wore ear protection throughout all testing to decrease the ability to otherwise detect loads. Binomial data analysis indicated that subjects correctly detected the imbalance of the imbalanced asymmetrical center of mass condition (p[correct detection] = 0.89, p < 0.01) but did not correctly detect the balanced condition (p[correct detection] = 0.46, p = 0.74) or the imbalanced asymmetrical condition (p[correct detection] = 0.60, p = 0.31). Although it appears that a substantial shift in the center of mass of plates leads to the detection of barbell imbalance, minor changes of the addition of 1.25 kg (2.5 lb) to the asymmetrical condition did not result in consistent detection. Our data indicate that the establishment of a biofeedback loop capable of determining balance detection was only realized under a high degree of imbalance. Although balance detection was not present in either the even or the slightly uneven loading condition, the inclusion of balance training for upper body may be futile if exercises are unable to establish such a feedback loop and thus eliciting an improvement of balance performance.
Selective randomized load balancing and mesh networks with changing demands
NASA Astrophysics Data System (ADS)
Shepherd, F. B.; Winzer, P. J.
2006-05-01
We consider the problem of building cost-effective networks that are robust to dynamic changes in demand patterns. We compare several architectures using demand-oblivious routing strategies. Traditional approaches include single-hop architectures based on a (static or dynamic) circuit-switched core infrastructure and multihop (packet-switched) architectures based on point-to-point circuits in the core. To address demand uncertainty, we seek minimum cost networks that can carry the class of hose demand matrices. Apart from shortest-path routing, Valiant's randomized load balancing (RLB), and virtual private network (VPN) tree routing, we propose a third, highly attractive approach: selective randomized load balancing (SRLB). This is a blend of dual-hop hub routing and randomized load balancing that combines the advantages of both architectures in terms of network cost, delay, and delay jitter. In particular, we give empirical analyses for the cost (in terms of transport and switching equipment) for the discussed architectures, based on three representative carrier networks. Of these three networks, SRLB maintains the resilience properties of RLB while achieving significant cost reduction over all other architectures, including RLB and multihop Internet protocol/multiprotocol label switching (IP/MPLS) networks using VPN-tree routing.
Shake Test Results and Dynamic Calibration Efforts for the Large Rotor Test Apparatus
NASA Technical Reports Server (NTRS)
Russell, Carl R.
2014-01-01
Prior to the full-scale wind tunnel test of the UH-60A Airloads rotor, a shake test was completed on the Large Rotor Test Apparatus. The goal of the shake test was to characterize the oscillatory response of the test rig and provide a dynamic calibration of the balance to accurately measure vibratory hub loads. This paper provides a summary of the shake test results, including balance, shaft bending gauge, and accelerometer measurements. Sensitivity to hub mass and angle of attack were investigated during the shake test. Hub mass was found to have an important impact on the vibratory forces and moments measured at the balance, especially near the UH-60A 4/rev frequency. Comparisons were made between the accelerometer data and an existing finite-element model, showing agreement on mode shapes, but not on natural frequencies. Finally, the results of a simple dynamic calibration are presented, showing the effects of changes in hub mass. The results show that the shake test data can be used to correct in-plane loads measurements up to 10 Hz and normal loads up to 30 Hz.
Accelerating semantic graph databases on commodity clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Castellana, Vito G.; Haglin, David J.
We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lusk, Ewing; Butler, Ralph; Pieper, Steven C.
Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generalitymore » and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.« less
Development of a 5-Component Balance for Water Tunnel Applications
NASA Technical Reports Server (NTRS)
Suarez, Carlos J.; Kramer, Brian R.; Smith, Brooke C.
1999-01-01
The principal objective of this research/development effort was to develop a multi-component strain gage balance to measure both static and dynamic forces and moments on models tested in flow visualization water tunnels. A balance was designed that allows measuring normal and side forces, and pitching, yawing and rolling moments (no axial force). The balance mounts internally in the model and is used in a manner typical of wind tunnel balances. The key differences between a water tunnel balance and a wind tunnel balance are the requirement for very high sensitivity since the loads are very low (typical normal force is 90 grams or 0.2 lbs), the need for water proofing the gage elements, and the small size required to fit into typical water tunnel models. The five-component balance was calibrated and demonstrated linearity in the responses of the primary components to applied loads, very low interactions between the sections and no hysteresis. Static experiments were conducted in the Eidetics water tunnel with delta wings and F/A-18 models. The data were compared to forces and moments from wind tunnel tests of the same or similar configurations. The comparison showed very good agreement, providing confidence that loads can be measured accurately in the water tunnel with a relatively simple multi-component internal balance. The success of the static experiments encouraged the use of the balance for dynamic experiments. Among the advantages of conducting dynamic tests in a water tunnel are less demanding motion and data acquisition rates than in a wind tunnel test (because of the low-speed flow) and the capability of performing flow visualization and force/moment (F/M) measurements simultaneously with relative simplicity. This capability of simultaneous flow visualization and for F/M measurements proved extremely useful to explain the results obtained during these dynamic tests. In general, the development of this balance should encourage the use of water tunnels for a wider range of quantitative and qualitative experiments, especially during the preliminary phase of aircraft design.
Multiscale Multilevel Approach to Solution of Nanotechnology Problems
NASA Astrophysics Data System (ADS)
Polyakov, Sergey; Podryga, Viktoriia
2018-02-01
The paper is devoted to a multiscale multilevel approach for the solution of nanotechnology problems on supercomputer systems. The approach uses the combination of continuum mechanics models and the Newton dynamics for individual particles. This combination includes three scale levels: macroscopic, mesoscopic and microscopic. For gas-metal technical systems the following models are used. The quasihydrodynamic system of equations is used as a mathematical model at the macrolevel for gas and solid states. The system of Newton equations is used as a mathematical model at the mesoand microlevels; it is written for nanoparticles of the medium and larger particles moving in the medium. The numerical implementation of the approach is based on the method of splitting into physical processes. The quasihydrodynamic equations are solved by the finite volume method on grids of different types. The Newton equations of motion are solved by Verlet integration in each cell of the grid independently or in groups of connected cells. In the framework of the general methodology, four classes of algorithms and methods of their parallelization are provided. The parallelization uses the principles of geometric parallelism and the efficient partitioning of the computational domain. A special dynamic algorithm is used for load balancing the solvers. The testing of the developed approach was made by the example of the nitrogen outflow from a balloon with high pressure to a vacuum chamber through a micronozzle and a microchannel. The obtained results confirm the high efficiency of the developed methodology.
Sivakumar, B; Bhalaji, N; Sivakumar, D
2014-01-01
In mobile ad hoc networks connectivity is always an issue of concern. Due to dynamism in the behavior of mobile nodes, efficiency shall be achieved only with the assumption of good network infrastructure. Presence of critical links results in deterioration which should be detected in advance to retain the prevailing communication setup. This paper discusses a short survey on the specialized algorithms and protocols related to energy efficient load balancing for critical link detection in the recent literature. This paper also suggests a machine learning based hybrid power-aware approach for handling critical nodes via load balancing.
Sivakumar, B.; Bhalaji, N.; Sivakumar, D.
2014-01-01
In mobile ad hoc networks connectivity is always an issue of concern. Due to dynamism in the behavior of mobile nodes, efficiency shall be achieved only with the assumption of good network infrastructure. Presence of critical links results in deterioration which should be detected in advance to retain the prevailing communication setup. This paper discusses a short survey on the specialized algorithms and protocols related to energy efficient load balancing for critical link detection in the recent literature. This paper also suggests a machine learning based hybrid power-aware approach for handling critical nodes via load balancing. PMID:24790546
A dynamic re-partitioning strategy based on the distribution of key in Spark
NASA Astrophysics Data System (ADS)
Zhang, Tianyu; Lian, Xin
2018-05-01
Spark is a memory-based distributed data processing framework, has the ability of processing massive data and becomes a focus in Big Data. But the performance of Spark Shuffle depends on the distribution of data. The naive Hash partition function of Spark can not guarantee load balancing when data is skewed. The time of job is affected by the node which has more data to process. In order to handle this problem, dynamic sampling is used. In the process of task execution, histogram is used to count the key frequency distribution of each node, and then generate the global key frequency distribution. After analyzing the distribution of key, load balance of data partition is achieved. Results show that the Dynamic Re-Partitioning function is better than the default Hash partition, Fine Partition and the Balanced-Schedule strategy, it can reduce the execution time of the task and improve the efficiency of the whole cluster.
Effects of a dynamic balance training protocol on podalic support in older women. Pilot Study.
Battaglia, Giuseppe; Bellafiore, Marianna; Bianco, Antonino; Paoli, Antonio; Palma, Antonio
2010-01-01
The foot provides the only direct contact with supporting surfaces and therefore plays an important role in all postural tasks. Changes in the musculoskeletal and neurological characteristics of the foot with advancing age can alter plantar loading patterns and postural balance. Several studies have reported that exercise training improves postural performance in elderly individuals. The aim of our study was to investigate the effectiveness of a dynamic balance training protocol performed for 5 weeks on the support surface, percentage distribution of load in both feet, and body balance performance in healthy elderly women. Ten subjects (68.67±5.50 yrs old; 28.17±3.35 BMI) were evaluated with a monopodalic performance test and baropodometric analyses before and after the training period. We found a significant improvement in balance unipedal performance times on left and right foot by 20.18% and 26.23% respectively (p<0.05). The support surface of the right foot significantly increased in response to the training protocol and, in particular, in both forefoot and rearfoot regions (p<0.05). In addition, before the training period, load distribution on the left foot was greater than on the right one; equal load redistribution was measured on both feet in response to exercise (p>0.05). The increased support surface and equal redistribution of body weight on both feet obtained in response to our training protocol may be postural adaptations sufficient to improve static balance in elderly women.
Lumped transmission line avalanche pulser
Booth, R.
1995-07-18
A lumped linear avalanche transistor pulse generator utilizes stacked transistors in parallel within a stage and couples a plurality of said stages, in series with increasing zener diode limited voltages per stage and decreasing balanced capacitance load per stage to yield a high voltage, high and constant current, very short pulse. 8 figs.
Lumped transmission line avalanche pulser
Booth, Rex
1995-01-01
A lumped linear avalanche transistor pulse generator utilizes stacked transistors in parallel within a stage and couples a plurality of said stages, in series with increasing zener diode limited voltages per stage and decreasing balanced capacitance load per stage to yield a high voltage, high and constant current, very short pulse.
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory
NASA Astrophysics Data System (ADS)
Challacombe, Matt
2000-06-01
A general approach to the parallel sparse-blocked matrix-matrix multiply is developed in the context of linear scaling self-consistent-field (SCF) theory. The data-parallel message passing method uses non-blocking communication to overlap computation and communication. The space filling curve heuristic is used to achieve data locality for sparse matrix elements that decay with “separation”. Load balance is achieved by solving the bin packing problem for blocks with variable size.With this new method as the kernel, parallel performance of the simplified density matrix minimization (SDMM) for solution of the SCF equations is investigated for RHF/6-31G ∗∗ water clusters and RHF/3-21G estane globules. Sustained rates above 5.7 GFLOPS for the SDMM have been achieved for (H 2 O) 200 with 95 Origin 2000 processors. Scalability is found to be limited by load imbalance, which increases with decreasing granularity, due primarily to the inhomogeneous distribution of variable block sizes.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.
1999-01-01
The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
NASA Astrophysics Data System (ADS)
Fonseca, R. A.; Vieira, J.; Fiuza, F.; Davidson, A.; Tsung, F. S.; Mori, W. B.; Silva, L. O.
2013-12-01
A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ˜106 cores and sustained performance over ˜2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios.
Modeling high-temperature superconductors and metallic alloys on the Intel IPSC/860
NASA Astrophysics Data System (ADS)
Geist, G. A.; Peyton, B. W.; Shelton, W. A.; Stocks, G. M.
Oak Ridge National Laboratory has embarked on several computational Grand Challenges, which require the close cooperation of physicists, mathematicians, and computer scientists. One of these projects is the determination of the material properties of alloys from first principles and, in particular, the electronic structure of high-temperature superconductors. While the present focus of the project is on superconductivity, the approach is general enough to permit study of other properties of metallic alloys such as strength and magnetic properties. This paper describes the progress to date on this project. We include a description of a self-consistent KKR-CPA method, parallelization of the model, and the incorporation of a dynamic load balancing scheme into the algorithm. We also describe the development and performance of a consolidated KKR-CPA code capable of running on CRAYs, workstations, and several parallel computers without source code modification. Performance of this code on the Intel iPSC/860 is also compared to a CRAY 2, CRAY YMP, and several workstations. Finally, some density of state calculations of two perovskite superconductors are given.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
NASA Astrophysics Data System (ADS)
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten
2017-11-01
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.
Karpievitch, Yuliya V; Almeida, Jonas S
2006-01-01
Background Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. Results mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Conclusion Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet. PMID:16539707
Karpievitch, Yuliya V; Almeida, Jonas S
2006-03-15
Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet.
Kinematics and dynamics analysis of a quadruped walking robot with parallel leg mechanism
NASA Astrophysics Data System (ADS)
Wang, Hongbo; Sang, Lingfeng; Hu, Xing; Zhang, Dianfan; Yu, Hongnian
2013-09-01
It is desired to require a walking robot for the elderly and the disabled to have large capacity, high stiffness, stability, etc. However, the existing walking robots cannot achieve these requirements because of the weight-payload ratio and simple function. Therefore, Improvement of enhancing capacity and functions of the walking robot is an important research issue. According to walking requirements and combining modularization and reconfigurable ideas, a quadruped/biped reconfigurable walking robot with parallel leg mechanism is proposed. The proposed robot can be used for both a biped and a quadruped walking robot. The kinematics and performance analysis of a 3-UPU parallel mechanism which is the basic leg mechanism of a quadruped walking robot are conducted and the structural parameters are optimized. The results show that performance of the walking robot is optimal when the circumradius R, r of the upper and lower platform of leg mechanism are 161.7 mm, 57.7 mm, respectively. Based on the optimal results, the kinematics and dynamics of the quadruped walking robot in the static walking mode are derived with the application of parallel mechanism and influence coefficient theory, and the optimal coordination distribution of the dynamic load for the quadruped walking robot with over-determinate inputs is analyzed, which solves dynamic load coupling caused by the branches’ constraint of the robot in the walk process. Besides laying a theoretical foundation for development of the prototype, the kinematics and dynamics studies on the quadruped walking robot also boost the theoretical research of the quadruped walking and the practical applications of parallel mechanism.
NASA Astrophysics Data System (ADS)
Gan, Chee Kwan; Challacombe, Matt
2003-05-01
Recently, early onset linear scaling computation of the exchange-correlation matrix has been achieved using hierarchical cubature [J. Chem. Phys. 113, 10037 (2000)]. Hierarchical cubature differs from other methods in that the integration grid is adaptive and purely Cartesian, which allows for a straightforward domain decomposition in parallel computations; the volume enclosing the entire grid may be simply divided into a number of nonoverlapping boxes. In our data parallel approach, each box requires only a fraction of the total density to perform the necessary numerical integrations due to the finite extent of Gaussian-orbital basis sets. This inherent data locality may be exploited to reduce communications between processors as well as to avoid memory and copy overheads associated with data replication. Although the hierarchical cubature grid is Cartesian, naive boxing leads to irregular work loads due to strong spatial variations of the grid and the electron density. In this paper we describe equal time partitioning, which employs time measurement of the smallest sub-volumes (corresponding to the primitive cubature rule) to load balance grid-work for the next self-consistent-field iteration. After start-up from a heuristic center of mass partitioning, equal time partitioning exploits smooth variation of the density and grid between iterations to achieve load balance. With the 3-21G basis set and a medium quality grid, equal time partitioning applied to taxol (62 heavy atoms) attained a speedup of 61 out of 64 processors, while for a 110 molecule water cluster at standard density it achieved a speedup of 113 out of 128. The efficiency of equal time partitioning applied to hierarchical cubature improves as the grid work per processor increases. With a fine grid and the 6-311G(df,p) basis set, calculations on the 26 atom molecule α-pinene achieved a parallel efficiency better than 99% with 64 processors. For more coarse grained calculations, superlinear speedups are found to result from reduced computational complexity associated with data parallelism.
On delay adjustment for dynamic load balancing in distributed virtual environments.
Deng, Yunhua; Lau, Rynson W H
2012-04-01
Distributed virtual environments (DVEs) are becoming very popular in recent years, due to the rapid growing of applications, such as massive multiplayer online games (MMOGs). As the number of concurrent users increases, scalability becomes one of the major challenges in designing an interactive DVE system. One solution to address this scalability problem is to adopt a multi-server architecture. While some methods focus on the quality of partitioning the load among the servers, others focus on the efficiency of the partitioning process itself. However, all these methods neglect the effect of network delay among the servers on the accuracy of the load balancing solutions. As we show in this paper, the change in the load of the servers due to network delay would affect the performance of the load balancing algorithm. In this work, we conduct a formal analysis of this problem and discuss two efficient delay adjustment schemes to address the problem. Our experimental results show that our proposed schemes can significantly improve the performance of the load balancing algorithm with neglectable computation overhead.
A transient-enhanced NMOS low dropout voltage regulator with parallel feedback compensation
NASA Astrophysics Data System (ADS)
Han, Wang; Lin, Tan
2016-02-01
This paper presents a transient-enhanced NMOS low-dropout regulator (LDO) for portable applications with parallel feedback compensation. The parallel feedback structure adds a dynamic zero to get an adequate phase margin with a load current variation from 0 to 1 A. A class-AB error amplifier and a fast charging/discharging unit are adopted to enhance the transient performance. The proposed LDO has been implemented in a 0.35 μm BCD process. From experimental results, the regulator can operate with a minimum dropout voltage of 150 mV at a maximum 1 A load and IQ of 165 μA. Under the full range load current step, the voltage undershoot and overshoot of the proposed LDO are reduced to 38 mV and 27 mV respectively.
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; ...
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
BowMapCL: Burrows-Wheeler Mapping on Multiple Heterogeneous Accelerators.
Nogueira, David; Tomas, Pedro; Roma, Nuno
2016-01-01
The computational demand of exact-search procedures has pressed the exploitation of parallel processing accelerators to reduce the execution time of many applications. However, this often imposes strict restrictions in terms of the problem size and implementation efforts, mainly due to their possibly distinct architectures. To circumvent this limitation, a new exact-search alignment tool (BowMapCL) based on the Burrows-Wheeler Transform and FM-Index is presented. Contrasting to other alternatives, BowMapCL is based on a unified implementation using OpenCL, allowing the exploitation of multiple and possibly different devices (e.g., NVIDIA, AMD/ATI, and Intel GPUs/APUs). Furthermore, to efficiently exploit such heterogeneous architectures, BowMapCL incorporates several techniques to promote its performance and scalability, including multiple buffering, work-queue task-distribution, and dynamic load-balancing, together with index partitioning, bit-encoding, and sampling. When compared with state-of-the-art tools, the attained results showed that BowMapCL (using a single GPU) is 2 × to 7.5 × faster than mainstream multi-threaded CPU BWT-based aligners, like Bowtie, BWA, and SOAP2; and up to 4 × faster than the best performing state-of-the-art GPU implementations (namely, SOAP3 and HPG-BWT). When multiple and completely distinct devices are considered, BowMapCL efficiently scales the offered throughput, ensuring a convenient load-balance of the involved processing in the several distinct devices.
DOE Office of Scientific and Technical Information (OSTI.GOV)
HENNIGAN, GARY; SHADID, JOHN; SJAARDEMA, GREGORY
2009-06-08
Nem_spread reads it's input command file (default name nem_spread.inp), takes the named ExodusII geometry definition and spreads out the geometry (and optionally results) contained in that file out to a parallel disk system. The decomposition is taken from a scalar Nemesis load balance file generated by the companion utility nem_slice.
Efficient parallelization for AMR MHD multiphysics calculations; implementation in AstroBEAR
NASA Astrophysics Data System (ADS)
Carroll-Nellenback, Jonathan J.; Shroyer, Brandon; Frank, Adam; Ding, Chen
2013-03-01
Current adaptive mesh refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (>80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Knee Kinetics during Squats of Varying Loads and Depths in Recreationally Trained Females.
Flores, Victoria; Becker, James; Burkhardt, Eric; Cotter, Joshua
2018-03-06
The back squat exercise is typically practiced with varying squat depths and barbell loads. However, depth has been inconsistently defined, resulting in unclear safety precautions when squatting with loads. Additionally, females exhibit anatomical and kinematic differences to males which may predispose them to knee joint injuries. The purpose of this study was to characterize peak knee extensor moments (pKEMs) at three commonly practiced squat depths of above parallel, parallel, and full depth, and with three loads of 0% (unloaded), 50%, and 85% depth-specific one repetition maximum (1RM) in recreationally active females. Nineteen females (age, 25.1 ± 5.8 years; body mass, 62.5 ± 10.2 kg; height, 1.6 ± 0.10 m; mean ± SD) performed squats of randomized depth and load. Inverse dynamics were used to obtain pKEMs from three-dimensional knee kinematics. Depth and load had significant interaction effects on pKEMs (p = 0.014). Significantly greater pKEMs were observed at full depth compared to parallel depth with 50% 1RM load (p = 0.001, d = 0.615), and 85% 1RM load (p = 0.010, d = 0.714). Greater pKEMs were also observed at full depth compared to above parallel depth with 50% 1RM load (p = 0.003, d = 0.504). Results indicate effect of load on female pKEMs do not follow a progressively increasing pattern with either increasing depth or load. Therefore, when high knee loading is a concern, individuals are must carefully consider both the depth of squat being performed and the relative load they are using.
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sitaraman, Hariswaran; Grout, Ray
2015-10-30
The load balancing strategies for hybrid solvers that involve grid based partial differential equation solution coupled with particle tracking are presented in this paper. A typical Message Passing Interface (MPI) based parallelization of grid based solves are done using a spatial domain decomposition while particle tracking is primarily done using either of the two techniques. One of the techniques is to distribute the particles to MPI ranks to whose grid they belong to while the other is to share the particles equally among all ranks, irrespective of their spatial location. The former technique provides spatial locality for field interpolation butmore » cannot assure load balance in terms of number of particles, which is achieved by the latter. The two techniques are compared for a case of particle tracking in a homogeneous isotropic turbulence box as well as a turbulent jet case. We performed a strong scaling study for more than 32,000 cores, which results in particle densities representative of anticipated exascale machines. The use of alternative implementations of MPI collectives and efficient load equalization strategies are studied to reduce data communication overheads.« less
NASA Astrophysics Data System (ADS)
Liu, Wei; Li, Ying-jun; Jia, Zhen-yuan; Zhang, Jun; Qian, Min
2011-01-01
In working process of huge heavy-load manipulators, such as the free forging machine, hydraulic die-forging press, forging manipulator, heavy grasping manipulator, large displacement manipulator, measurement of six-dimensional heavy force/torque and real-time force feedback of the operation interface are basis to realize coordinate operation control and force compliance control. It is also an effective way to raise the control accuracy and achieve highly efficient manufacturing. Facing to solve dynamic measurement problem on six-dimensional time-varying heavy load in extremely manufacturing process, the novel principle of parallel load sharing on six-dimensional heavy force/torque is put forward. The measuring principle of six-dimensional force sensor is analyzed, and the spatial model is built and decoupled. The load sharing ratios are analyzed and calculated in vertical and horizontal directions. The mapping relationship between six-dimensional heavy force/torque value to be measured and output force value is built. The finite element model of parallel piezoelectric six-dimensional heavy force/torque sensor is set up, and its static characteristics are analyzed by ANSYS software. The main parameters, which affect load sharing ratio, are analyzed. The experiments for load sharing with different diameters of parallel axis are designed. The results show that the six-dimensional heavy force/torque sensor has good linearity. Non-linearity errors are less than 1%. The parallel axis makes good effect of load sharing. The larger the diameter is, the better the load sharing effect is. The results of experiments are in accordance with the FEM analysis. The sensor has advantages of large measuring range, good linearity, high inherent frequency, and high rigidity. It can be widely used in extreme environments for real-time accurate measurement of six-dimensional time-varying huge loads on manipulators.
Performance analysis of parallel branch and bound search with the hypercube architecture
NASA Technical Reports Server (NTRS)
Mraz, Richard T.
1987-01-01
With the availability of commercial parallel computers, researchers are examining new classes of problems which might benefit from parallel computing. This paper presents results of an investigation of the class of search intensive problems. The specific problem discussed is the Least-Cost Branch and Bound search method of deadline job scheduling. The object-oriented design methodology was used to map the problem into a parallel solution. While the initial design was good for a prototype, the best performance resulted from fine-tuning the algorithm for a specific computer. The experiments analyze the computation time, the speed up over a VAX 11/785, and the load balance of the problem when using loosely coupled multiprocessor system based on the hypercube architecture.
Analysis of dynamic behavior of multiple-stage planetary gear train used in wind driven generator.
Wang, Jungang; Wang, Yong; Huo, Zhipu
2014-01-01
A dynamic model of multiple-stage planetary gear train composed of a two-stage planetary gear train and a one-stage parallel axis gear is proposed to be used in wind driven generator to analyze the influence of revolution speed and mesh error on dynamic load sharing characteristic based on the lumped parameter theory. Dynamic equation of the model is solved using numerical method to analyze the uniform load distribution of the system. It is shown that the load sharing property of the system is significantly affected by mesh error and rotational speed; load sharing coefficient and change rate of internal and external meshing of the system are of obvious difference from each other. The study provides useful theoretical guideline for the design of the multiple-stage planetary gear train of wind driven generator.
Analysis of Dynamic Behavior of Multiple-Stage Planetary Gear Train Used in Wind Driven Generator
Wang, Jungang; Wang, Yong; Huo, Zhipu
2014-01-01
A dynamic model of multiple-stage planetary gear train composed of a two-stage planetary gear train and a one-stage parallel axis gear is proposed to be used in wind driven generator to analyze the influence of revolution speed and mesh error on dynamic load sharing characteristic based on the lumped parameter theory. Dynamic equation of the model is solved using numerical method to analyze the uniform load distribution of the system. It is shown that the load sharing property of the system is significantly affected by mesh error and rotational speed; load sharing coefficient and change rate of internal and external meshing of the system are of obvious difference from each other. The study provides useful theoretical guideline for the design of the multiple-stage planetary gear train of wind driven generator. PMID:24511295
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erdmann, Thorsten; Albert, Philipp J.; Schwarz, Ulrich S.
2013-11-07
Non-processive molecular motors have to work together in ensembles in order to generate appreciable levels of force or movement. In skeletal muscle, for example, hundreds of myosin II molecules cooperate in thick filaments. In non-muscle cells, by contrast, small groups with few tens of non-muscle myosin II motors contribute to essential cellular processes such as transport, shape changes, or mechanosensing. Here we introduce a detailed and analytically tractable model for this important situation. Using a three-state crossbridge model for the myosin II motor cycle and exploiting the assumptions of fast power stroke kinetics and equal load sharing between motors inmore » equivalent states, we reduce the stochastic reaction network to a one-step master equation for the binding and unbinding dynamics (parallel cluster model) and derive the rules for ensemble movement. We find that for constant external load, ensemble dynamics is strongly shaped by the catch bond character of myosin II, which leads to an increase of the fraction of bound motors under load and thus to firm attachment even for small ensembles. This adaptation to load results in a concave force-velocity relation described by a Hill relation. For external load provided by a linear spring, myosin II ensembles dynamically adjust themselves towards an isometric state with constant average position and load. The dynamics of the ensembles is now determined mainly by the distribution of motors over the different kinds of bound states. For increasing stiffness of the external spring, there is a sharp transition beyond which myosin II can no longer perform the power stroke. Slow unbinding from the pre-power-stroke state protects the ensembles against detachment.« less
Analysis of series resonant converter with series-parallel connection
NASA Astrophysics Data System (ADS)
Lin, Bor-Ren; Huang, Chien-Lan
2011-02-01
In this study, a parallel inductor-inductor-capacitor (LLC) resonant converter series-connected on the primary side and parallel-connected on the secondary side is presented for server power supply systems. Based on series resonant behaviour, the power metal-oxide-semiconductor field-effect transistors are turned on at zero voltage switching and the rectifier diodes are turned off at zero current switching. Thus, the switching losses on the power semiconductors are reduced. In the proposed converter, the primary windings of the two LLC converters are connected in series. Thus, the two converters have the same primary currents to ensure that they can supply the balance load current. On the output side, two LLC converters are connected in parallel to share the load current and to reduce the current stress on the secondary windings and the rectifier diodes. In this article, the principle of operation, steady-state analysis and design considerations of the proposed converter are provided and discussed. Experiments with a laboratory prototype with a 24 V/21 A output for server power supply were performed to verify the effectiveness of the proposed converter.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jain, Atul K.
The overall objectives of this DOE funded project is to combine scientific and computational challenges in climate modeling by expanding our understanding of the biogeophysical-biogeochemical processes and their interactions in the northern high latitudes (NHLs) using an earth system modeling (ESM) approach, and by adopting an adaptive parallel runtime system in an ESM to achieve efficient and scalable climate simulations through improved load balancing algorithms.
A Proposal for Kelly CriterionBased Lossy Network Compression
2016-03-01
warehousing and data mining techniques for cyber security. New York (NY): Springer; 2007. p. 83–108. 34. Münz G, Li S, Carle G. Traffic anomaly...p. 188–196. 48. Kim NU, Park MW, Park SH, Jung SM, Eom JH, Chung TM. A study on ef- fective hash-based load balancing scheme for parallel nids. In
Computational evaluation of load carriage effects on gait balance stability.
Mummolo, Carlotta; Park, Sukyung; Mangialardi, Luigi; Kim, Joo H
2016-01-01
Evaluating the effects of load carriage on gait balance stability is important in various applications. However, their quantification has not been rigorously addressed in the current literature, partially due to the lack of relevant computational indices. The novel Dynamic Gait Measure (DGM) characterizes gait balance stability by quantifying the relative effects of inertia in terms of zero-moment point, ground projection of center of mass, and time-varying foot support region. In this study, the DGM is formulated in terms of the gait parameters that explicitly reflect the gait strategy of a given walking pattern and is used for computational evaluation of the distinct balance stability of loaded walking. The observed gait adaptations caused by load carriage (decreased single support duration, inertia effects, and step length) result in decreased DGM values (p < 0.0001), which indicate that loaded walking motions are more statically stable compared with the unloaded normal walking. Comparison of the DGM with other common gait stability indices (the maximum Floquet multiplier and the margin of stability) validates the unique characterization capability of the DGM, which is consistently informative of the presence of the added load.
Dynamic Imbalance Would Counter Offcenter Thrust
NASA Technical Reports Server (NTRS)
Mccanna, Jason
1994-01-01
Dynamic imbalance generated by offcenter thrust on rotating body eliminated by shifting some of mass of body to generate opposing dynamic imbalance. Technique proposed originally for spacecraft including massive crew module connected via long, lightweight intermediate structure to massive engine module, such that artificial gravitation in crew module generated by rotating spacecraft around axis parallel to thrust generated by engine. Also applicable to dynamic balancing of rotating terrestrial equipment to which offcenter forces applied.
Vasseur, David A; Fox, Jeremy W
2011-10-01
Consumers acquire essential nutrients by ingesting the tissues of resource species. When these tissues contain essential nutrients in a suboptimal ratio, consumers may benefit from ingesting a mixture of nutritionally complementary resource species. We investigate the joint ecological and evolutionary consequences of competition for complementary resources, using an adaptive dynamics model of two consumers and two resources that differ in their relative content of two essential nutrients. In the absence of competition, a nutritionally balanced diet rarely maximizes fitness because of the dynamic feedbacks between uptake rate and resource density, whereas in sympatry, nutritionally balanced diets maximize fitness because competing consumers with different nutritional requirements tend to equalize the relative abundances of the two resources. Adaptation from allopatric to sympatric fitness optima can generate character convergence, divergence, and parallel shifts, depending not on the degree of diet overlap but on the match between resource nutrient content and consumer nutrient requirements. Contrary to previous verbal arguments that suggest that character convergence leads to neutral stability, coadaptation of competing consumers always leads to stable coexistence. Furthermore, we show that incorporating costs of consuming or excreting excess nonlimiting nutrients selects for nutritionally balanced diets and so promotes character convergence. This article demonstrates that resource-use overlap has little bearing on coexistence when resources are nutritionally complementary, and it highlights the importance of using mathematical models to infer the stability of ecoevolutionary dynamics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hirata, So
2003-11-20
We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
GSHR-Tree: a spatial index tree based on dynamic spatial slot and hash table in grid environments
NASA Astrophysics Data System (ADS)
Chen, Zhanlong; Wu, Xin-cai; Wu, Liang
2008-12-01
Computation Grids enable the coordinated sharing of large-scale distributed heterogeneous computing resources that can be used to solve computationally intensive problems in science, engineering, and commerce. Grid spatial applications are made possible by high-speed networks and a new generation of Grid middleware that resides between networks and traditional GIS applications. The integration of the multi-sources and heterogeneous spatial information and the management of the distributed spatial resources and the sharing and cooperative of the spatial data and Grid services are the key problems to resolve in the development of the Grid GIS. The performance of the spatial index mechanism is the key technology of the Grid GIS and spatial database affects the holistic performance of the GIS in Grid Environments. In order to improve the efficiency of parallel processing of a spatial mass data under the distributed parallel computing grid environment, this paper presents a new grid slot hash parallel spatial index GSHR-Tree structure established in the parallel spatial indexing mechanism. Based on the hash table and dynamic spatial slot, this paper has improved the structure of the classical parallel R tree index. The GSHR-Tree index makes full use of the good qualities of R-Tree and hash data structure. This paper has constructed a new parallel spatial index that can meet the needs of parallel grid computing about the magnanimous spatial data in the distributed network. This arithmetic splits space in to multi-slots by multiplying and reverting and maps these slots to sites in distributed and parallel system. Each sites constructs the spatial objects in its spatial slot into an R tree. On the basis of this tree structure, the index data was distributed among multiple nodes in the grid networks by using large node R-tree method. The unbalance during process can be quickly adjusted by means of a dynamical adjusting algorithm. This tree structure has considered the distributed operation, reduplication operation transfer operation of spatial index in the grid environment. The design of GSHR-Tree has ensured the performance of the load balance in the parallel computation. This tree structure is fit for the parallel process of the spatial information in the distributed network environments. Instead of spatial object's recursive comparison where original R tree has been used, the algorithm builds the spatial index by applying binary code operation in which computer runs more efficiently, and extended dynamic hash code for bit comparison. In GSHR-Tree, a new server is assigned to the network whenever a split of a full node is required. We describe a more flexible allocation protocol which copes with a temporary shortage of storage resources. It uses a distributed balanced binary spatial tree that scales with insertions to potentially any number of storage servers through splits of the overloaded ones. The application manipulates the GSHR-Tree structure from a node in the grid environment. The node addresses the tree through its image that the splits can make outdated. This may generate addressing errors, solved by the forwarding among the servers. In this paper, a spatial index data distribution algorithm that limits the number of servers has been proposed. We improve the storage utilization at the cost of additional messages. The structure of GSHR-Tree is believed that the scheme of this grid spatial index should fit the needs of new applications using endlessly larger sets of spatial data. Our proposal constitutes a flexible storage allocation method for a distributed spatial index. The insertion policy can be tuned dynamically to cope with periods of storage shortage. In such cases storage balancing should be favored for better space utilization, at the price of extra message exchanges between servers. This structure makes a compromise in the updating of the duplicated index and the transformation of the spatial index data. Meeting the needs of the grid computing, GSHRTree has a flexible structure in order to satisfy new needs in the future. The GSHR-Tree provides the R-tree capabilities for large spatial datasets stored over interconnected servers. The analysis, including the experiments, confirmed the efficiency of our design choices. The scheme should fit the needs of new applications of spatial data, using endlessly larger datasets. Using the system response time of the parallel processing of spatial scope query algorithm as the performance evaluation factor, According to the result of the simulated the experiments, GSHR-Tree is performed to prove the reasonable design and the high performance of the indexing structure that the paper presented.
On the Impact of Execution Models: A Case Study in Computational Chemistry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Halappanavar, Mahantesh; Krishnamoorthy, Sriram
2015-05-25
Efficient utilization of high-performance computing (HPC) platforms is an important and complex problem. Execution models, abstract descriptions of the dynamic runtime behavior of the execution stack, have significant impact on the utilization of HPC systems. Using a computational chemistry kernel as a case study and a wide variety of execution models combined with load balancing techniques, we explore the impact of execution models on the utilization of an HPC system. We demonstrate a 50 percent improvement in performance by using work stealing relative to a more traditional static scheduling approach. We also use a novel semi-matching technique for load balancingmore » that has comparable performance to a traditional hypergraph-based partitioning implementation, which is computationally expensive. Using this study, we found that execution model design choices and assumptions can limit critical optimizations such as global, dynamic load balancing and finding the correct balance between available work units and different system and runtime overheads. With the emergence of multi- and many-core architectures and the consequent growth in the complexity of HPC platforms, we believe that these lessons will be beneficial to researchers tuning diverse applications on modern HPC platforms, especially on emerging dynamic platforms with energy-induced performance variability.« less
Accelerating Climate Simulations Through Hybrid Computing
NASA Technical Reports Server (NTRS)
Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark
2009-01-01
Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.
Monte Carlo simulation of biomolecular systems with BIOMCSIM
NASA Astrophysics Data System (ADS)
Kamberaj, H.; Helms, V.
2001-12-01
A new Monte Carlo simulation program, BIOMCSIM, is presented that has been developed in particular to simulate the behaviour of biomolecular systems, leading to insights and understanding of their functions. The computational complexity in Monte Carlo simulations of high density systems, with large molecules like proteins immersed in a solvent medium, or when simulating the dynamics of water molecules in a protein cavity, is enormous. The program presented in this paper seeks to provide these desirable features putting special emphasis on simulations in grand canonical ensembles. It uses different biasing techniques to increase the convergence of simulations, and periodic load balancing in its parallel version, to maximally utilize the available computer power. In periodic systems, the long-ranged electrostatic interactions can be treated by Ewald summation. The program is modularly organized, and implemented using an ANSI C dialect, so as to enhance its modifiability. Its performance is demonstrated in benchmark applications for the proteins BPTI and Cytochrome c Oxidase.
Dynamic Transfers Of Tasks Among Computers
NASA Technical Reports Server (NTRS)
Liu, Howard T.; Silvester, John A.
1989-01-01
Allocation scheme gives jobs to idle computers. Ideal resource-sharing algorithm should have following characteristics: Dynamics, decentralized, and heterogeneous. Proposed enhanced receiver-initiated dynamic algorithm (ERIDA) for resource sharing fulfills all above criteria. Provides method balancing workload among hosts, resulting in improvement in response time and throughput performance of total system. Adjusts dynamically to traffic load of each station.
A physics-motivated Centroidal Voronoi Particle domain decomposition method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fu, Lin, E-mail: lin.fu@tum.de; Hu, Xiangyu Y., E-mail: xiangyu.hu@tum.de; Adams, Nikolaus A., E-mail: nikolaus.adams@tum.de
2017-04-15
In this paper, we propose a novel domain decomposition method for large-scale simulations in continuum mechanics by merging the concepts of Centroidal Voronoi Tessellation (CVT) and Voronoi Particle dynamics (VP). The CVT is introduced to achieve a high-level compactness of the partitioning subdomains by the Lloyd algorithm which monotonically decreases the CVT energy. The number of computational elements between neighboring partitioning subdomains, which scales the communication effort for parallel simulations, is optimized implicitly as the generated partitioning subdomains are convex and simply connected with small aspect-ratios. Moreover, Voronoi Particle dynamics employing physical analogy with a tailored equation of state ismore » developed, which relaxes the particle system towards the target partition with good load balance. Since the equilibrium is computed by an iterative approach, the partitioning subdomains exhibit locality and the incremental property. Numerical experiments reveal that the proposed Centroidal Voronoi Particle (CVP) based algorithm produces high-quality partitioning with high efficiency, independently of computational-element types. Thus it can be used for a wide range of applications in computational science and engineering.« less
A physics-motivated Centroidal Voronoi Particle domain decomposition method
NASA Astrophysics Data System (ADS)
Fu, Lin; Hu, Xiangyu Y.; Adams, Nikolaus A.
2017-04-01
In this paper, we propose a novel domain decomposition method for large-scale simulations in continuum mechanics by merging the concepts of Centroidal Voronoi Tessellation (CVT) and Voronoi Particle dynamics (VP). The CVT is introduced to achieve a high-level compactness of the partitioning subdomains by the Lloyd algorithm which monotonically decreases the CVT energy. The number of computational elements between neighboring partitioning subdomains, which scales the communication effort for parallel simulations, is optimized implicitly as the generated partitioning subdomains are convex and simply connected with small aspect-ratios. Moreover, Voronoi Particle dynamics employing physical analogy with a tailored equation of state is developed, which relaxes the particle system towards the target partition with good load balance. Since the equilibrium is computed by an iterative approach, the partitioning subdomains exhibit locality and the incremental property. Numerical experiments reveal that the proposed Centroidal Voronoi Particle (CVP) based algorithm produces high-quality partitioning with high efficiency, independently of computational-element types. Thus it can be used for a wide range of applications in computational science and engineering.
Dynamic strain distribution of FRP plate under blast loading
NASA Astrophysics Data System (ADS)
Saburi, T.; Yoshida, M.; Kubota, S.
2017-02-01
The dynamic strain distribution of a fiber re-enforced plastic (FRP) plate under blast loading was investigated using a Digital Image Correlation (DIC) image analysis method. The testing FRP plates were mounted in parallel to each other on a steel frame. 50 g of composition C4 explosive was used as a blast loading source and set in the center of the FRP plates. The dynamic behavior of the FRP plate under blast loading were observed by two high-speed video cameras. The set of two high-speed video image sequences were used to analyze the FRP three-dimensional strain distribution by means of DIC method. A point strain profile extracted from the analyzed strain distribution data was compared with a directly observed strain profile using a strain gauge and it was shown that the strain profile under the blast loading by DIC method is quantitatively accurate.
A parallel Discrete Element Method to model collisions between non-convex particles
NASA Astrophysics Data System (ADS)
Rakotonirina, Andriarimina Daniel; Delenne, Jean-Yves; Wachs, Anthony
2017-06-01
In many dry granular and suspension flow configurations, particles can be highly non-spherical. It is now well established in the literature that particle shape affects the flow dynamics or the microstructure of the particles assembly in assorted ways as e.g. compacity of packed bed or heap, dilation under shear, resistance to shear, momentum transfer between translational and angular motions, ability to form arches and block the flow. In this talk, we suggest an accurate and efficient way to model collisions between particles of (almost) arbitrary shape. For that purpose, we develop a Discrete Element Method (DEM) combined with a soft particle contact model. The collision detection algorithm handles contacts between bodies of various shape and size. For nonconvex bodies, our strategy is based on decomposing a non-convex body into a set of convex ones. Therefore, our novel method can be called "glued-convex method" (in the sense clumping convex bodies together), as an extension of the popular "glued-spheres" method, and is implemented in our own granular dynamics code Grains3D. Since the whole problem is solved explicitly, our fully-MPI parallelized code Grains3D exhibits a very high scalability when dynamic load balancing is not required. In particular, simulations on up to a few thousands cores in configurations involving up to a few tens of millions of particles can readily be performed. We apply our enhanced numerical model to (i) the collapse of a granular column made of convex particles and (i) the microstructure of a heap of non-convex particles in a cylindrical reactor.
Robust synchronization of spin-torque oscillators with an LCR load.
Pikovsky, Arkady
2013-09-01
We study dynamics of a serial array of spin-torque oscillators with a parallel inductor-capacitor-resistor (LCR) load. In a large range of parameters the fully synchronous regime, where all the oscillators have the same state and the output field is maximal, is shown to be stable. However, not always such a robust complete synchronization develops from a random initial state; in many cases nontrivial clustering is observed, with a partial synchronization resulting in a quasiperiodic or chaotic mean-field dynamics.
Accelerating Climate and Weather Simulations through Hybrid Computing
NASA Technical Reports Server (NTRS)
Zhou, Shujia; Cruz, Carlos; Duffy, Daniel; Tucker, Robert; Purcell, Mark
2011-01-01
Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; ...
2017-11-27
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Parallel Performance Optimizations on Unstructured Mesh-based Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas
2015-01-01
© The Authors. Published by Elsevier B.V. This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cachemore » efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
NASA Astrophysics Data System (ADS)
Ji, X.; Shen, C.
2017-12-01
Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
Large scale cardiac modeling on the Blue Gene supercomputer.
Reumann, Matthias; Fitch, Blake G; Rayshubskiy, Aleksandr; Keller, David U; Weiss, Daniel L; Seemann, Gunnar; Dössel, Olaf; Pitman, Michael C; Rice, John J
2008-01-01
Multi-scale, multi-physical heart models have not yet been able to include a high degree of accuracy and resolution with respect to model detail and spatial resolution due to computational limitations of current systems. We propose a framework to compute large scale cardiac models. Decomposition of anatomical data in segments to be distributed on a parallel computer is carried out by optimal recursive bisection (ORB). The algorithm takes into account a computational load parameter which has to be adjusted according to the cell models used. The diffusion term is realized by the monodomain equations. The anatomical data-set was given by both ventricles of the Visible Female data-set in a 0.2 mm resolution. Heterogeneous anisotropy was included in the computation. Model weights as input for the decomposition and load balancing were set to (a) 1 for tissue and 0 for non-tissue elements; (b) 10 for tissue and 1 for non-tissue elements. Scaling results for 512, 1024, 2048, 4096 and 8192 computational nodes were obtained for 10 ms simulation time. The simulations were carried out on an IBM Blue Gene/L parallel computer. A 1 s simulation was then carried out on 2048 nodes for the optimal model load. Load balances did not differ significantly across computational nodes even if the number of data elements distributed to each node differed greatly. Since the ORB algorithm did not take into account computational load due to communication cycles, the speedup is close to optimal for the computation time but not optimal overall due to the communication overhead. However, the simulation times were reduced form 87 minutes on 512 to 11 minutes on 8192 nodes. This work demonstrates that it is possible to run simulations of the presented detailed cardiac model within hours for the simulation of a heart beat.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Neural simulations on multi-core architectures.
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Rizk, Yehia M.
1999-01-01
The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
Scaling Irregular Applications through Data Aggregation and Software Multithreading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel
Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less
Theory of the deformation of aligned polyethylene.
Hammad, A; Swinburne, T D; Hasan, H; Del Rosso, S; Iannucci, L; Sutton, A P
2015-08-08
Solitons are proposed as the agents of plastic and viscoelastic deformation in aligned polyethylene. Interactions between straight, parallel molecules are mapped rigorously onto the Frenkel-Kontorova model. It is shown that these molecular interactions distribute an applied load between molecules, with a characteristic transfer length equal to the soliton width. Load transfer leads to the introduction of tensile and compressive solitons at the chain ends to mark the onset of plasticity at a well-defined yield stress, which is much less than the theoretical pull-out stress. Interaction energies between solitons and an equation of motion for solitons are derived. The equation of motion is based on Langevin dynamics and the fluctuation-dissipation theorem and it leads to the rigorous definition of an effective mass for solitons. It forms the basis of a soliton dynamics in direct analogy to dislocation dynamics. Close parallels are drawn between solitons in aligned polymers and dislocations in crystals, including the configurational force on a soliton. The origins of the strain rate and temperature dependencies of the viscoelastic behaviour are discussed in terms of the formation energy of solitons. A failure mechanism is proposed involving soliton condensation under a tensile load.
NASA Astrophysics Data System (ADS)
Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.
2011-10-01
SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
NASA Astrophysics Data System (ADS)
Rivier, Leonard Gilles
Using an efficient parallel code solving the primitive equations of atmospheric dynamics, the jet structure of a Jupiter like atmosphere is modeled. In the first part of this thesis, a parallel spectral code solving both the shallow water equations and the multi-level primitive equations of atmospheric dynamics is built. The implementation of this code called BOB is done so that it runs effectively on an inexpensive cluster of workstations. A one dimensional decomposition and transposition method insuring load balancing among processes is used. The Legendre transform is cache-blocked. A "compute on the fly" of the Legendre polynomials used in the spectral method produces a lower memory footprint and enables high resolution runs on relatively small memory machines. Performance studies are done using a cluster of workstations located at the National Center for Atmospheric Research (NCAR). BOB performances are compared to the parallel benchmark code PSTSWM and the dynamical core of NCAR's CCM3.6.6. In both cases, the comparison favors BOB. In the second part of this thesis, the primitive equation version of the code described in part I is used to study the formation of organized zonal jets and equatorial superrotation in a planetary atmosphere where the parameters are chosen to best model the upper atmosphere of Jupiter. Two levels are used in the vertical and only large scale forcing is present. The model is forced towards a baroclinically unstable flow, so that eddies are generated by baroclinic instability. We consider several types of forcing, acting on either the temperature or the momentum field. We show that only under very specific parametric conditions, zonally elongated structures form and persist resembling the jet structure observed near the cloud level top (1 bar) on Jupiter. We also study the effect of an equatorial heat source, meant to be a crude representation of the effect of the deep convective planetary interior onto the outer atmospheric layer. We show that such heat forcing is able to produce strong equatorial superrotating winds, one of the most striking feature of the Jovian circulation.
Dynamic Response of Acrylonitrile Butadiene Styrene Under Impact Loading (Open Access)
2016-03-16
of contraction and expansion was observed as the impact load was applied. Thismultistage deformation behavior may be attributable to the ring formed ...ABS fabricated by FDM. Results of the experimental characterization show that rasters formed parallel to the loading direction fabricated in the... formed using a solid ABS block to determine the mechanical property at various strain rates (Fig. 1). Through the analysis of the solid ABS, a linear
NASA Astrophysics Data System (ADS)
Tran Quoc, Tinh; Khong Trong, Toan; Luong Van, Hai
2018-04-01
In this paper, Improved Moving Element Method (IMEM) is used to analyze the dynamic response of Euler-Bernoulli beam structures on the dynamic foundation model subjected to the moving load. The effects of characteristic foundation model parameters such as Winkler stiffness, shear layer based on the Pasternak model, viscoelastic dashpot and characteristic parameter of mass on foundation. Beams are modeled by moving elements while the load is fixed. Based on the principle of the publicly virtual balancing and the theory of moving element method, the motion differential equation of the system is established and solved by means of the numerical integration based on the Newmark algorithm. The influence of mass on foundation and the roughness of the beam surface on the dynamic response of beam are examined in details.
[Parallel virtual reality visualization of extreme large medical datasets].
Tang, Min
2010-04-01
On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.
Lee, Suemin; Shim, Jemyung
2015-01-01
[Purpose] The purpose of this study was to measure and observe the changes in dynamic plantar pressures when school children carried specific bag loads, and to determine whether improved physical balance after an eight-week spinal stabilization exercise program can influences plantar pressures. [Subjects] The subjects were 10 school students with Cobb angles of 10° or greater. [Methods] Gait View Pro 1.0 (Alfoots, Korea) was were based on to measure the pressure of the participants’ feet. Spinal stabilization exercises used TOGU Multi-roll Functional (TOGU, Germany) training. Dynamic plantar pressures were measured with bag loads of 0% no bag and 15% of subjects’ body weight. The independent t test was performed to analyze changes in plantar pressures. [Results] The plantar pressure measurements of bag load of 0% of subjects’ body weight before and after the spinal stabilization exercise program were not significantly different, but those of two foot areas with a 15% load were statistically significant (mt5, 67.32±24.25 and 51.77±25.52 kPa; lat heel, 126.00±20.46 and 102.08±23.87 kPa). [Conclusion] After performance of the spinal stabilization exercises subjects’ overall plantar pressures were reduced, which may suggest that physical balance improved. PMID:26311964
Dynamics of a split torque helicopter transmission
NASA Astrophysics Data System (ADS)
Krantz, Timothy L.
1994-06-01
Split torque designs, proposed as alternatives to traditional planetary designs for helicopter main rotor transmissions, can save weight and be more reliable than traditional designs. This report presents the results of an analytical study of the system dynamics and performance of a split torque gearbox that uses a balance beam mechanism for load sharing. The Lagrange method was applied to develop a system of equations of motion. The mathematical model includes time-varying gear mesh stiffness, friction, and manufacturing errors. Cornell's method for calculating the stiffness of spur gear teeth was extended and applied to helical gears. The phenomenon of sidebands spaced at shaft frequencies about gear mesh fundamental frequencies was simulated by modeling total composite gear errors as sinusoid functions. Although the gearbox has symmetric geometry, the loads and motions of the two power paths differ. Friction must be considered to properly evaluate the balance beam mechanism. For the design studied, the balance beam is not an effective device for load sharing unless the coefficient of friction is less than 0.003. The complete system stiffness as represented by the stiffness matrix used in this analysis must be considered to precisely determine the optimal tooth indexing position.
Dynamics of a split torque helicopter transmission. M.S. Thesis - Cleveland State Univ.
NASA Technical Reports Server (NTRS)
Krantz, Timothy L.
1994-01-01
Split torque designs, proposed as alternatives to traditional planetary designs for helicopter main rotor transmissions, can save weight and be more reliable than traditional designs. This report presents the results of an analytical study of the system dynamics and performance of a split torque gearbox that uses a balance beam mechanism for load sharing. The Lagrange method was applied to develop a system of equations of motion. The mathematical model includes time-varying gear mesh stiffness, friction, and manufacturing errors. Cornell's method for calculating the stiffness of spur gear teeth was extended and applied to helical gears. The phenomenon of sidebands spaced at shaft frequencies about gear mesh fundamental frequencies was simulated by modeling total composite gear errors as sinusoid functions. Although the gearbox has symmetric geometry, the loads and motions of the two power paths differ. Friction must be considered to properly evaluate the balance beam mechanism. For the design studied, the balance beam is not an effective device for load sharing unless the coefficient of friction is less than 0.003. The complete system stiffness as represented by the stiffness matrix used in this analysis must be considered to precisely determine the optimal tooth indexing position.
Li, Ying-Jun; Yang, Cong; Wang, Gui-Cong; Zhang, Hui; Cui, Huan-Yong; Zhang, Yong-Liang
2017-09-01
This paper presents a novel integrated piezoelectric six-dimensional force sensor which can realize dynamic measurement of multi-dimensional space load. Firstly, the composition of the sensor, the spatial layout of force-sensitive components, and measurement principle are analyzed and designed. There is no interference of piezoelectric six-dimensional force sensor in theoretical analysis. Based on the principle of actual work and deformation compatibility coherence, this paper deduces the parallel load sharing principle of the piezoelectric six-dimensional force sensor. The main effect factors which affect the load sharing ratio are obtained. The finite element model of the piezoelectric six-dimensional force sensor is established. In order to verify the load sharing principle of the sensor, a load sharing test device of piezoelectric force sensor is designed and fabricated. The load sharing experimental platform is set up. The experimental results are in accordance with the theoretical analysis and simulation results. The experiments show that the multi-dimensional and heavy force measurement can be realized by the parallel arrangement of the load sharing ring and the force sensitive element in the novel integrated piezoelectric six-dimensional force sensor. The ideal load sharing effect of the sensor can be achieved by appropriate size parameters. This paper has an important guide for the design of the force measuring device according to the load sharing mode. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Multitasking the three-dimensional transport code TORT on CRAY platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azmy, Y.Y.; Barnett, D.A.; Burre, C.A.
1996-04-01
The multitasking options in the three-dimensional neutral particle transport code TORT originally implemented for Cray`s CTSS operating system are revived and extended to run on Cray Y/MP and C90 computers using the UNICOS operating system. These include two coarse-grained domain decompositions; across octants, and across directions within an octant, termed Octant Parallel (OP), and Direction Parallel (DP), respectively. Parallel performance of the DP is significantly enhanced by increasing the task grain size and reducing load imbalance via dynamic scheduling of the discrete angles among the participating tasks. Substantial Wall Clock speedup factors, approaching 4.5 using 8 tasks, have been measuredmore » in a time-sharing environment, and generally depend on the test problem specifications, number of tasks, and machine loading during execution.« less
The research of hourglass worm dynamic balancing simulation based on SolidWorks motion
NASA Astrophysics Data System (ADS)
Wang, Zhuangzhuang; Yang, Jie; Liu, Pingyi; Zhao, Junpeng
2018-02-01
Hourglass worm is extensively used in industry due to its characteristic of heavy-load and a large reduction ratio. Varying sizes of unbalanced mass distribution appeared in the design of a single head worm. With machines developing towards higher speed and precision, the vibration and shock caused by the unbalanced mass distribution of rotating parts must be considered. Therefore, the balance grade of these parts must meet higher requirements. A method based on theoretical analysis and SolidWorks motion software simulation is presented in this paper; the virtual dynamic balance simulation test of the hourglass worm was carried out during the design of the product, so as to ensure that the hourglass worm meet the requirements of dynamic balance in the design process. This can effectively support the structural design of the hourglass worm and provide a way of thinking and designing the same type of products.
Wang, Zihao; Chen, Yu; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Liu, Zhiyong; Sun, Fei; Zhang, Fa
2018-03-01
Electron tomography (ET) is an important technique for studying the three-dimensional structures of the biological ultrastructure. Recently, ET has reached sub-nanometer resolution for investigating the native and conformational dynamics of macromolecular complexes by combining with the sub-tomogram averaging approach. Due to the limited sampling angles, ET reconstruction typically suffers from the "missing wedge" problem. Using a validation procedure, iterative compressed-sensing optimized nonuniform fast Fourier transform (NUFFT) reconstruction (ICON) demonstrates its power in restoring validated missing information for a low-signal-to-noise ratio biological ET dataset. However, the huge computational demand has become a bottleneck for the application of ICON. In this work, we implemented a parallel acceleration technology ICON-many integrated core (MIC) on Xeon Phi cards to address the huge computational demand of ICON. During this step, we parallelize the element-wise matrix operations and use the efficient summation of a matrix to reduce the cost of matrix computation. We also developed parallel versions of NUFFT on MIC to achieve a high acceleration of ICON by using more efficient fast Fourier transform (FFT) calculation. We then proposed a hybrid task allocation strategy (two-level load balancing) to improve the overall performance of ICON-MIC by making full use of the idle resources on Tianhe-2 supercomputer. Experimental results using two different datasets show that ICON-MIC has high accuracy in biological specimens under different noise levels and a significant acceleration, up to 13.3 × , compared with the CPU version. Further, ICON-MIC has good scalability efficiency and overall performance on Tianhe-2 supercomputer.
ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms
NASA Astrophysics Data System (ADS)
Pilla, Laércio L.; Bozzetti, Tiago C.; Castro, Márcio; Navaux, Philippe O. A.; Méhaut, Jean-François
2015-10-01
Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Etingov, Pavel; Makarov, PNNL Yuri; Subbarao, PNNL Kris
RUT software is designed for use by the Balancing Authorities to predict and display additional requirements caused by the variability and uncertainty in load and generation. The prediction is made for the next operating hours as well as for the next day. The tool predicts possible deficiencies in generation capability and ramping capability. This deficiency of balancing resources can cause serious risks to power system stability and also impact real-time market energy prices. The tool dynamically and adaptively correlates changing system conditions with the additional balancing needs triggered by the interplay between forecasted and actual load and output of variablemore » resources. The assessment is performed using a specially developed probabilistic algorithm incorporating multiple sources of uncertainty including wind, solar and load forecast errors. The tool evaluates required generation for a worst case scenario, with a user-specified confidence level.« less
Paralleling power MOSFETs in their active region: Extended range of passively forced current sharing
NASA Technical Reports Server (NTRS)
Niedra, Janis M.
1989-01-01
A simple passive circuit that improves current balance in parallelled power MOSFETs that are not precisely matched and that are operated in their active region from a common gate drive are exhibited. A nonlinear circuit consisting of diodes and resistors generates the differential gate potential required to correct for unbalance while maintaining low losses over a range of current. Also application of a thin tape wound magnetic core to effect dynamic current balance is reviewed, and a simple theory is presented showing that for operation in the active region the branch currents tend to revert to their normal unbalanced values even if the core is not driven into saturation. Results of several comparative experiments are given.
Inflight Treadmill Exercise Can Serve as Multi-Disciplinary Countermeasure System
NASA Technical Reports Server (NTRS)
Bloomberg, J. J.; Batson, C. D.; Buxton, R. E.; Feiveson, A. H.; Kofman, I. S.; Laurie, S.; Lee, S. M. C.; Miller, C. A.; Mulavara, A. P.; Peters, B. T.;
2014-01-01
The goals of the Functional Task Test (FTT) study were to determine the effects of space flight on functional tests that are representative of high priority exploration mission tasks and to identify the key underlying physiological factors that contribute to decrements in performance. Ultimately this information will be used to assess performance risks and inform the design of countermeasures for exploration class missions. We have previously shown that for Shuttle, ISS and bed rest subjects, functional tasks requiring a greater demand for dynamic control of postural equilibrium (i.e. fall recovery, seat egress/obstacle avoidance during walking, object translation, jump down) showed the greatest decrement in performance. Functional tests with reduced requirements for postural stability (i.e. hatch opening, ladder climb, manual manipulation of objects and tool use) showed little reduction in performance. These changes in functional performance were paralleled by similar decrements in sensorimotor tests designed to specifically assess postural equilibrium and dynamic gait control. The bed rest analog allows us to investigate the impact of axial body unloading in isolation on both functional tasks and on the underlying physiological factors that lead to decrements in performance and then compare them with the results obtained in our space flight study. These results indicate that body support unloading experienced during space flight plays a central role in postflight alteration of functional task performance. These data also support the concept that space flight may cause central adaptation of converging body-load somatosensory and vestibular input during gravitational transitions [1]. Therefore, we conclude that providing significant body-support loading during inflight treadmill along with balance training is necessary to mitigate decrements in critical mission tasks that require dynamic postural stability and mobility. Data obtained from space flight and bed rest support the notion that in-flight treadmill exercise, in addition to providing aerobic exercise and mechanical stimuli to the bone, also has a number of sensorimotor benefits by providing: 1) A balance challenge during locomotion requiring segmental coordination in response to a downward force. 2) Body-support loading during performance of a full-body active motor task. 3) Oscillatory stimulation of the otoliths and synchronized periodic foot impacts that facilitate the coordination of gait motions and tune the full-body gaze control system. 4) Appropriate sensory input (foot tactile input, muscle and tendon stretch input) to spinal locomotor central pattern generators required for the control of locomotion. Forward work will focus on a follow-up bed rest study that incorporates aerobic and resistance exercise with a treadmill balance and gait training system that can serve as an integrated interdisciplinary countermeasure system for future exploration class missions.
, developing the framework to model whole-campus dynamics to coordinate and synchronize its multiple loads developing the controls for a self-balancing two-wheel electrical vehicle. Education M.S. Electrical
NASA Technical Reports Server (NTRS)
French, K. W., Jr.
1985-01-01
The flexibility of the PHOENICS computational fluid dynamics package was assessed along two general avenues; parallel modeling and analog modeling. In parallel modeling the dependent and independent variables retain their identity within some scaling factors, even though the boundary conditions and especially the constitutive relations do not correspond to any realistic fluid dynamic situation. PHOENICS was used to generate a CFD model that should exhibit the physical anomalies of a granular medium and permit reasonable similarity with boundary conditions typical to membrane or porous piston loading. A considerable portion of the study was spent prying into the existing code with a prejudice toward rate type and disarming any inherent fluid behavior. The final stages of the study were directed at the more specific problem of multiaxis loading of cylindrical geometry with a concern for the appearance of bulging, cross slab shear failure modes.
Adaptive Control of Four-Leg VSC Based DSTATCOM in Distribution System
NASA Astrophysics Data System (ADS)
Singh, Bhim; Arya, Sabha Raj
2014-01-01
This work discusses an experimental performance of a four-leg Distribution Static Compensator (DSTATCOM) using an adaptive filter based approach. It is used for estimation of reference supply currents through extracting the fundamental active power components of three-phase distorted load currents. This control algorithm is implemented on an assembled DSTATCOM for harmonics elimination, neutral current compensation and load balancing, under nonlinear loads. Experimental results are discussed, and it is noticed that DSTATCOM is effective solution to perform satisfactory performance under load dynamics.
Read, S J; Vanman, E J; Miller, L C
1997-01-01
We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.
The critical loads and levels approach for nitrogen
T.A. Clair; T. Blett; J. Aherne; M.P.M. Aidar; R. Artz; W.J. Bealey; W. Budd; J.N. Cape; C.J. Curtis; L. Duan; M.E. Fenn; P. Groffman; R. Haeuber; J.R. Hall; J.-P. Hettelingh; D. López-Hernández; B. Mathieson; L. Pardo; M. Posch; R.V. Pouyat; T. Spranger; H. Sverdrup; H. van Dobben; A. van Hinsberg
2014-01-01
This chapter reports the findings of a Working Group to review the critical loads (CLs) and levels approach for nitrogen (N). The three main approaches to estimating CLs are empirical, mass balance and dynamic modelling. Examples are given of recent developments in Europe, North America and Asia and it is concluded that other countries should be encouraged to develop...
NASA Technical Reports Server (NTRS)
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.
2013-01-01
The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.
NASA Technical Reports Server (NTRS)
Lee, Jeh Won
1990-01-01
The objective is the theoretical analysis and the experimental verification of dynamics and control of a two link flexible manipulator with a flexible parallel link mechanism. Nonlinear equations of motion of the lightweight manipulator are derived by the Lagrangian method in symbolic form to better understand the structure of the dynamic model. The resulting equation of motion have a structure which is useful to reduce the number of terms calculated, to check correctness, or to extend the model to higher order. A manipulator with a flexible parallel link mechanism is a constrained dynamic system whose equations are sensitive to numerical integration error. This constrained system is solved using singular value decomposition of the constraint Jacobian matrix. Elastic motion is expressed by the assumed mode method. Mode shape functions of each link are chosen using the load interfaced component mode synthesis. The discrepancies between the analytical model and the experiment are explained using a simplified and a detailed finite element model.
Parallel processing approach to transform-based image coding
NASA Astrophysics Data System (ADS)
Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.
1991-06-01
This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
CFTLB: a novel cross-layer fault tolerant and load balancing protocol for WMN
NASA Astrophysics Data System (ADS)
Krishnaveni, N. N.; Chitra, K.
2017-12-01
Wireless mesh network (WMN) forms a wireless backbone framework for multi-hop transmission among the routers and clients in the extensible coverage area. To improve the throughput of WMNs with multiple gateways (GWs), several issues related to GW selection, load balancing and frequent link failures due to the presence of dynamic obstacles and channel interference should be addressed. This paper presents a novel cross-layer fault tolerant and load balancing (CFTLB) protocol to overcome the issues in WMN. Initially, the neighbour GW is searched and channel load is calculated. The GW having least channel load is selected which is estimated during the arrival of the new node. The proposed algorithm finds the alternate GWs and calculates the channel availability under high loading scenarios. If the current load in the GW is high, another GW is found and channel availability is calculated. Besides, it initiates the channel switching and establishes the communication with the mesh client effectively. The utilisation of hashing technique in proposed CFTLB verifies the status of the packets and achieves better performance in terms of router average throughput, throughput, average channel access time and lower end-to-end delay, communication overhead and average data loss in the channel compared to the existing protocols.
A 10 bit 200 MS/s pipeline ADC using loading-balanced architecture in 0.18 μm CMOS
NASA Astrophysics Data System (ADS)
Wang, Linfeng; Meng, Qiao; Zhi, Hao; Li, Fei
2017-07-01
A new loading-balanced architecture for high speed and low power consumption pipeline analog-to-digital converter (ADC) is presented in this paper. The proposed ADC uses SHA-less, op-amp and capacitor-sharing technique, capacitor-scaling scheme to reduce the die area and power consumption. A new capacitor-sharing scheme was proposed to cancel the extra reset phase of the feedback capacitors. The non-standard inter-stage gain increases the feedback factor of the first stage and makes it equal to the second stage, by which, the load capacitor of op-amp shared by the first and second stages is balanced. As for the fourth stage, the capacitor and op-amp no longer scale down. From the system’s point of view, all load capacitors of the shared OTAs are balanced by employing a loading-balanced architecture. The die area and power consumption are optimized maximally. The ADC is implemented in a 0.18 μm 1P6M CMOS technology, and occupies a die area of 1.2 × 1.2 mm{}2. The measurement results show a 55.58 dB signal-to-noise-and-distortion ratio (SNDR) and 62.97 dB spurious-free dynamic range (SFDR) with a 25 MHz input operating at a 200 MS/s sampling rate. The proposed ADC consumes 115 mW at 200 MS/s from a 1.8 V supply.
Scott-Pandorf, Melissa M; O'Connor, Daniel P; Layne, Charles S; Josić, Kresimir; Kurz, Max J
2009-09-01
With human exploration of the moon and Mars on the horizon, research considerations for space suit redesign have surfaced. The portable life support system (PLSS) used in conjunction with the space suit during the Apollo missions may have influenced the dynamic balance of the gait pattern. This investigation explored potential issues with the PLSS design that may arise during the Mars exploration. A better understanding of how the location of the PLSS load influences the dynamic stability of the gait pattern may provide insight, such that space missions may have more productive missions with a smaller risk of injury and damaging equipment while falling. We explored the influence the PLSS load position had on the dynamic stability of the walking pattern. While walking, participants wore a device built to simulate possible PLSS load configurations. Floquet and Lyapunov analysis techniques were used to quantify the dynamic stability of the gait pattern. The dynamic stability of the gait pattern was influenced by the position of load. PLSS loads that are placed high and forward on the torso resulted in less dynamically stable walking patterns than loads placed evenly and low on the torso. Furthermore, the kinematic results demonstrated that all joints of the lower extremity may be important for adjusting to different load placements and maintaining dynamic stability. Space scientists and engineers may want to consider PLSS designs that distribute loads evenly and low, and space suit designs that will not limit the sagittal plane range of motion at the lower extremity joints.
SIERRA Multimechanics Module: Aria User Manual Version 4.44
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sierra Thermal /Fluid Team
2017-04-01
Aria is a Galerkin fnite element based program for solving coupled-physics problems described by systems of PDEs and is capable of solving nonlinear, implicit, transient and direct-to-steady state problems in two and three dimensions on parallel architectures. The suite of physics currently supported by Aria includes thermal energy transport, species transport, and electrostatics as well as generalized scalar, vector and tensor transport equations. Additionally, Aria includes support for manufacturing process fows via the incompressible Navier-Stokes equations specialized to a low Reynolds number ( %3C 1 ) regime. Enhanced modeling support of manufacturing processing is made possible through use of eithermore » arbitrary Lagrangian- Eulerian (ALE) and level set based free and moving boundary tracking in conjunction with quasi-static nonlinear elastic solid mechanics for mesh control. Coupled physics problems are solved in several ways including fully-coupled Newton's method with analytic or numerical sensitivities, fully-coupled Newton- Krylov methods and a loosely-coupled nonlinear iteration about subsets of the system that are solved using combinations of the aforementioned methods. Error estimation, uniform and dynamic h -adaptivity and dynamic load balancing are some of Aria's more advanced capabilities. Aria is based upon the Sierra Framework.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sierra Thermal/Fluid Team
Aria is a Galerkin fnite element based program for solving coupled-physics problems described by systems of PDEs and is capable of solving nonlinear, implicit, transient and direct-to-steady state problems in two and three dimensions on parallel architectures. The suite of physics currently supported by Aria includes thermal energy transport, species transport, and electrostatics as well as generalized scalar, vector and tensor transport equations. Additionally, Aria includes support for manufacturing process fows via the incompressible Navier-Stokes equations specialized to a low Reynolds number ( %3C 1 ) regime. Enhanced modeling support of manufacturing processing is made possible through use of eithermore » arbitrary Lagrangian- Eulerian (ALE) and level set based free and moving boundary tracking in conjunction with quasi-static nonlinear elastic solid mechanics for mesh control. Coupled physics problems are solved in several ways including fully-coupled Newton's method with analytic or numerical sensitivities, fully-coupled Newton- Krylov methods and a loosely-coupled nonlinear iteration about subsets of the system that are solved using combinations of the aforementioned methods. Error estimation, uniform and dynamic h -adaptivity and dynamic load balancing are some of Aria's more advanced capabilities. Aria is based upon the Sierra Framework.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sierra Thermal /Fluid Team
Aria is a Galerkin finite element based program for solving coupled-physics problems described by systems of PDEs and is capable of solving nonlinear, implicit, transient and direct-to-steady state problems in two and three dimensions on parallel architectures. The suite of physics currently supported by Aria includes thermal energy transport, species transport, and electrostatics as well as generalized scalar, vector and tensor transport equations. Additionally, Aria includes support for manufacturing process flows via the incompressible Navier-Stokes equations specialized to a low Reynolds number (Re %3C 1) regime. Enhanced modeling support of manufacturing processing is made possible through use of either arbitrarymore » Lagrangian- Eulerian (ALE) and level set based free and moving boundary tracking in conjunction with quasi-static nonlinear elastic solid mechanics for mesh control. Coupled physics problems are solved in several ways including fully-coupled Newton's method with analytic or numerical sensitivities, fully-coupled Newton- Krylov methods and a loosely-coupled nonlinear iteration about subsets of the system that are solved using combinations of the aforementioned methods. Error estimation, uniform and dynamic h-adaptivity and dynamic load balancing are some of Aria's more advanced capabilities. Aria is based upon the Sierra Framework.« less
Gianico, A; Braguglia, C M; Cesarini, R; Mininni, G
2013-09-01
The performance of thermophilic digestion of waste activated sludge, either untreated or thermal pretreated, was evaluated through semi-continuous tests carried out at organic loading rates in the range of 1-3.7 kg VS/m(3)d. Although the thermal pretreatment at T=134 °C proved to be effective in solubilizing organic matter, no significant gain in organics degradation was observed. However, the digestion of pretreated sludge showed significant soluble COD removal (more than 55%) whereas no removal occurred in control reactors. The lower the initial sludge biodegradability, the higher the efficiency of thermal pretreated digestion was observed, in particular as regards higher biogas and methane production rates with respect to the parallel untreated sludge digestion. Heat balance of the combined thermal hydrolysis/thermophilic digestion process, applied on full-scale scenarios, showed positive values for direct combustion of methane. In case of combined heat and power generation, attractive electric energy recoveries were obtained, with a positive heat balance at high load. Copyright © 2013. Published by Elsevier Ltd.
1980-02-01
maneuver conditions, and transmit the net axial thrust force between the turbine and fan sections due to pressure and aero dynamic gas loads . 49 Lm...stiffness composite material shaft. Both~~ balancing demonstration and the composite shaft design ad as their objective the management of small turbofan ...CONFIGURATIONS 99 LIST OF ILLUSTRATIONS Figure Title Page 1 High Speed Balancing Program Schedule 4 2 Teledyne CAE Model 471-11DX Turbofan Engine
Devi, D Chitra; Uthariaraj, V Rhymend
2016-01-01
Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM's multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM), the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods.
Devi, D. Chitra; Uthariaraj, V. Rhymend
2016-01-01
Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM's multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM), the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods. PMID:26955656
The Potsdam Parallel Ice Sheet Model (PISM-PIK) - Part 1: Model description
NASA Astrophysics Data System (ADS)
Winkelmann, R.; Martin, M. A.; Haseloff, M.; Albrecht, T.; Bueler, E.; Khroulev, C.; Levermann, A.
2010-08-01
We present the Potsdam Parallel Ice Sheet Model (PISM-PIK), developed at the Potsdam Institute for Climate Impact Research to be used for simulations of large-scale ice sheet-shelf systems. It is derived from the Parallel Ice Sheet Model (Bueler and Brown, 2009). Velocities are calculated by superposition of two shallow stress balance approximations within the entire ice covered region: the shallow ice approximation (SIA) is dominant in grounded regions and accounts for shear deformation parallel to the geoid. The plug-flow type shallow shelf approximation (SSA) dominates the velocity field in ice shelf regions and serves as a basal sliding velocity in grounded regions. Ice streams naturally emerge through this approach and can be identified diagnostically as regions with a significant contribution of membrane stresses to the local momentum balance. All lateral boundaries in PISM-PIK are free to evolve, including the grounding line and ice fronts. Ice shelf margins in particular are modeled using Neumann boundary conditions for the SSA equations, reflecting a hydrostatic stress imbalance along the vertical calving face. The ice front position is modeled using a subgrid scale representation of calving front motion (Albrecht et al., 2010) and a physically motivated dynamic calving law based on horizontal spreading rates. The model is validated within the Marine Ice Sheet Model Intercomparison Project (MISMIP) and is used for a dynamic equilibrium simulation of Antarctica under present-day conditions in the second part of this paper (Martin et al., 2010).
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters
Bajaj, Chandrajit
2009-01-01
Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
Vortex-induced dynamic loads on a non-spinning volleyball
NASA Astrophysics Data System (ADS)
Qing-ding, Wei; Rong-sheng, Lin; Zhi-jie, Liu
1988-09-01
An experiment on vortex-induced dynamic loads on a non-spinning Volleyball was conducted in a wind tunnel. The flow past the Volleyball was visualized, and the aerodynamic load was measured by use of a strain gauge balance. The separation on the Volleyball was measured with hot-film. The experimental results suggest that under the action of an unstable tail vortex system the separation region is changeable, and that the fluctuation of drag and lateral forces is the same order of magnitude as the mean drag, no matter whether the seam of the Volleyball is symmetric or asymmetric, with regard to the flow. Based on the experimental data a numerical simulation of Volleyball swerve motion was made.
Modeling the fracture of ice sheets on parallel computers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Waisman, Haim; Bell, Robin; Keyes, David
2010-03-01
The objective of this project is to investigate the complex fracture of ice and understand its role within larger ice sheet simulations and global climate change. At the present time, ice fracture is not explicitly considered within ice sheet models due in part to large computational costs associated with the accurate modeling of this complex phenomena. However, fracture not only plays an extremely important role in regional behavior but also influences ice dynamics over much larger zones in ways that are currently not well understood. Dramatic illustrations of fracture-induced phenomena most notably include the recent collapse of ice shelves inmore » Antarctica (e.g. partial collapse of the Wilkins shelf in March of 2008 and the diminishing extent of the Larsen B shelf from 1998 to 2002). Other fracture examples include ice calving (fracture of icebergs) which is presently approximated in simplistic ways within ice sheet models, and the draining of supraglacial lakes through a complex network of cracks, a so called ice sheet plumbing system, that is believed to cause accelerated ice sheet flows due essentially to lubrication of the contact surface with the ground. These dramatic changes are emblematic of the ongoing change in the Earth's polar regions and highlight the important role of fracturing ice. To model ice fracture, a simulation capability will be designed centered around extended finite elements and solved by specialized multigrid methods on parallel computers. In addition, appropriate dynamic load balancing techniques will be employed to ensure an approximate equal amount of work for each processor.« less
Vertical Scan (V-SCAN) for 3-D Grid Adaptive Mesh Refinement for an atmospheric Model Dynamical Core
NASA Astrophysics Data System (ADS)
Andronova, N. G.; Vandenberg, D.; Oehmke, R.; Stout, Q. F.; Penner, J. E.
2009-12-01
One of the major building blocks of a rigorous representation of cloud evolution in global atmospheric models is a parallel adaptive grid MPI-based communication library (an Adaptive Blocks for Locally Cartesian Topologies library -- ABLCarT), which manages the block-structured data layout, handles ghost cell updates among neighboring blocks and splits a block as refinements occur. The library has several modules that provide a layer of abstraction for adaptive refinement: blocks, which contain individual cells of user data; shells - the global geometry for the problem, including a sphere, reduced sphere, and now a 3D sphere; a load balancer for placement of blocks onto processors; and a communication support layer which encapsulates all data movement. A major performance concern with adaptive mesh refinement is how to represent calculations that have need to be sequenced in a particular order in a direction, such as calculating integrals along a specific path (e.g. atmospheric pressure or geopotential in the vertical dimension). This concern is compounded if the blocks have varying levels of refinement, or are scattered across different processors, as can be the case in parallel computing. In this paper we describe an implementation in ABLCarT of a vertical scan operation, which allows computing along vertical paths in the correct order across blocks transparent to their resolution and processor location. We test this functionality on a 2D and a 3D advection problem, which tests the performance of the model’s dynamics (transport) and physics (sources and sinks) for different model resolutions needed for inclusion of cloud formation.
An architecture for real-time vision processing
NASA Technical Reports Server (NTRS)
Chien, Chiun-Hong
1994-01-01
To study the feasibility of developing an architecture for real time vision processing, a task queue server and parallel algorithms for two vision operations were designed and implemented on an i860-based Mercury Computing System 860VS array processor. The proposed architecture treats each vision function as a task or set of tasks which may be recursively divided into subtasks and processed by multiple processors coordinated by a task queue server accessible by all processors. Each idle processor subsequently fetches a task and associated data from the task queue server for processing and posts the result to shared memory for later use. Load balancing can be carried out within the processing system without the requirement for a centralized controller. The author concludes that real time vision processing cannot be achieved without both sequential and parallel vision algorithms and a good parallel vision architecture.
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids
NASA Astrophysics Data System (ADS)
Ma, Xinrong; Duan, Zhijian
2018-04-01
High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
Aprà, E; Kowalski, K
2016-03-08
In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
Cain, Stephen M; McGinnis, Ryan S; Davidson, Steven P; Vitali, Rachel V; Perkins, Noel C; McLean, Scott G
2016-01-01
We utilize an array of wireless inertial measurement units (IMUs) to measure the movements of subjects (n=30) traversing an outdoor balance beam (zigzag and sloping) as quickly as possible both with and without load (20.5kg). Our objectives are: (1) to use IMU array data to calculate metrics that quantify performance (speed and stability) and (2) to investigate the effects of load on performance. We hypothesize that added load significantly decreases subject speed yet results in increased stability of subject movements. We propose and evaluate five performance metrics: (1) time to cross beam (less time=more speed), (2) percentage of total time spent in double support (more double support time=more stable), (3) stride duration (longer stride duration=more stable), (4) ratio of sacrum M-L to A-P acceleration (lower ratio=less lateral balance corrections=more stable), and (5) M-L torso range of motion (smaller range of motion=less balance corrections=more stable). We find that the total time to cross the beam increases with load (t=4.85, p<0.001). Stability metrics also change significantly with load, all indicating increased stability. In particular, double support time increases (t=6.04, p<0.001), stride duration increases (t=3.436, p=0.002), the ratio of sacrum acceleration RMS decreases (t=-5.56, p<0.001), and the M-L torso lean range of motion decreases (t=-2.82, p=0.009). Overall, the IMU array successfully measures subject movement and gait parameters that reveal the trade-off between speed and stability in this highly dynamic balance task. Copyright © 2015 Elsevier B.V. All rights reserved.
Characterization of flexure hinges for the French watt balance experiment
NASA Astrophysics Data System (ADS)
Pinot, Patrick; Genevès, Gérard
2014-08-01
In the French watt balance experiment, the translation and rotation functions must have no backlash, no friction, nor the need for lubricants. In addition errors in position and movement must be below 100 nm. Flexure hinges can meet all of these criteria. Different materials, profile shapes and machining techniques have been studied. The flexure pivots have been characterized using three techniques: 1) an optical microscope and, if necessary, a SEM to observe the surface inhomogeneities; 2) a mass comparator to determine the bending stiffness of unloaded pivots; 3) a loaded beam oscillating freely under vacuum to study the dynamic behavior of loaded pivots.
Waiting can be an optimal conservation strategy, even in a crisis discipline.
Iacona, Gwenllian D; Possingham, Hugh P; Bode, Michael
2017-09-26
Biodiversity conservation projects confront immediate and escalating threats with limited funding. Conservation theory suggests that the best response to the species extinction crisis is to spend money as soon as it becomes available, and this is often an explicit constraint placed on funding. We use a general dynamic model of a conservation landscape to show that this decision to "front-load" project spending can be suboptimal if a delay allows managers to use resources more strategically. Our model demonstrates the existence of temporal efficiencies in conservation management, which parallel the spatial efficiencies identified by systematic conservation planning. The optimal timing of decisions balances the rate of biodiversity decline (e.g., the relaxation of extinction debts, or the progress of climate change) against the rate at which spending appreciates in value (e.g., through interest, learning, or capacity building). We contrast the benefits of acting and waiting in two ecosystems where restoration can mitigate forest bird extinction debts: South Australia's Mount Lofty Ranges and Paraguay's Atlantic Forest. In both cases, conservation outcomes cannot be maximized by front-loading spending, and the optimal solution recommends substantial delays before managers undertake conservation actions. Surprisingly, these delays allow superior conservation benefits to be achieved, in less time than front-loading. Our analyses provide an intuitive and mechanistic rationale for strategic delay, which contrasts with the orthodoxy of front-loaded spending for conservation actions. Our results illustrate the conservation efficiencies that could be achieved if decision makers choose when to spend their limited resources, as opposed to just where to spend them.
Probabilistic structural mechanics research for parallel processing computers
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Martin, William R.
1991-01-01
Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical.
Vibration energy harvesting using a piezoelectric circular diaphragm array.
Wang, Wei; Yang, Tongqing; Chen, Xurui; Yao, Xi
2012-09-01
This paper presents a method for harvesting electric energy from mechanical vibration using a mechanically excited piezoelectric circular membrane array. The piezoelectric circular diaphragm array consists of four plates with series and parallel connection, and the electrical characteristics of the array are examined under dynamic conditions. With an optimal load resistor of 160 kΩ, an output power of 28 mW was generated from the array in series connection at 150 Hz under a prestress of 0.8 N and a vibration acceleration of 9.8 m/s(2), whereas a maximal output power of 27 mW can be obtained from the array in parallel connection through a resistive load of 11 kΩ under the same frequency, prestress, and acceleration conditions. The results show that using a piezoelectric circular diaphragm array can significantly increase the output of energy compared with the use of a single plate. By choosing an appropriate connection pattern (series or parallel connections) among the plates, the equivalent impedance of the energy harvesting devices can be tailored to meet the matched load of different applications for maximal power output.
NASA Astrophysics Data System (ADS)
Wang, Liping; Jiang, Yao; Li, Tiemin
2014-09-01
Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.
NASA Astrophysics Data System (ADS)
Ajiatmo, Dwi; Robandi, Imam
2017-03-01
This paper proposes a control scheme photovoltaic, battery and super capacitor connected in parallel for use in a solar vehicle. Based on the features of battery charging, the control scheme consists of three modes, namely, mode dynamic irradian, constant load mode and constant voltage charging mode. The shift of the three modes can be realized by controlling the duty cycle of the mosffet Boost converter system. Meanwhile, the high voltage which is more suitable for the application can be obtained. Compared with normal charging method with parallel connected current limiting detention and charging method with dynamic irradian mode, constant load mode and constant voltage charging mode, the control scheme is proposed to shorten the charging time and increase the use of power generated from the PV array. From the simulation results and analysis conducted to determine the performance of the system in state transient and steady-state by using simulation software Matlab / Simulink. Response simulation results demonstrate the suitability of the proposed concept.
Short-Term Load Forecasting Based Automatic Distribution Network Reconfiguration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Ding, Fei; Zhang, Yingchen
In a traditional dynamic network reconfiguration study, the optimal topology is determined at every scheduled time point by using the real load data measured at that time. The development of the load forecasting technique can provide an accurate prediction of the load power that will happen in a future time and provide more information about load changes. With the inclusion of load forecasting, the optimal topology can be determined based on the predicted load conditions during a longer time period instead of using a snapshot of the load at the time when the reconfiguration happens; thus, the distribution system operatormore » can use this information to better operate the system reconfiguration and achieve optimal solutions. This paper proposes a short-term load forecasting approach to automatically reconfigure distribution systems in a dynamic and pre-event manner. Specifically, a short-term and high-resolution distribution system load forecasting approach is proposed with a forecaster based on support vector regression and parallel parameters optimization. The network reconfiguration problem is solved by using the forecasted load continuously to determine the optimal network topology with the minimum amount of loss at the future time. The simulation results validate and evaluate the proposed approach.« less
Short-Term Load Forecasting Based Automatic Distribution Network Reconfiguration: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Ding, Fei; Zhang, Yingchen
In the traditional dynamic network reconfiguration study, the optimal topology is determined at every scheduled time point by using the real load data measured at that time. The development of load forecasting technique can provide accurate prediction of load power that will happen in future time and provide more information about load changes. With the inclusion of load forecasting, the optimal topology can be determined based on the predicted load conditions during the longer time period instead of using the snapshot of load at the time when the reconfiguration happens, and thus it can provide information to the distribution systemmore » operator (DSO) to better operate the system reconfiguration to achieve optimal solutions. Thus, this paper proposes a short-term load forecasting based approach for automatically reconfiguring distribution systems in a dynamic and pre-event manner. Specifically, a short-term and high-resolution distribution system load forecasting approach is proposed with support vector regression (SVR) based forecaster and parallel parameters optimization. And the network reconfiguration problem is solved by using the forecasted load continuously to determine the optimal network topology with the minimum loss at the future time. The simulation results validate and evaluate the proposed approach.« less
Short-Term Load Forecasting-Based Automatic Distribution Network Reconfiguration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Ding, Fei; Zhang, Yingchen
In a traditional dynamic network reconfiguration study, the optimal topology is determined at every scheduled time point by using the real load data measured at that time. The development of the load forecasting technique can provide an accurate prediction of the load power that will happen in a future time and provide more information about load changes. With the inclusion of load forecasting, the optimal topology can be determined based on the predicted load conditions during a longer time period instead of using a snapshot of the load at the time when the reconfiguration happens; thus, the distribution system operatormore » can use this information to better operate the system reconfiguration and achieve optimal solutions. This paper proposes a short-term load forecasting approach to automatically reconfigure distribution systems in a dynamic and pre-event manner. Specifically, a short-term and high-resolution distribution system load forecasting approach is proposed with a forecaster based on support vector regression and parallel parameters optimization. The network reconfiguration problem is solved by using the forecasted load continuously to determine the optimal network topology with the minimum amount of loss at the future time. The simulation results validate and evaluate the proposed approach.« less
Breaking Barriers to Low-Cost Modular Inverter Production & Use
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bogdan Borowy; Leo Casey; Jerry Foshage
2005-05-31
The goal of this cost share contract is to advance key technologies to reduce size, weight and cost while enhancing performance and reliability of Modular Inverter Product for Distributed Energy Resources (DER). Efforts address technology development to meet technical needs of DER market protection, isolation, reliability, and quality. Program activities build on SatCon Technology Corporation inverter experience (e.g., AIPM, Starsine, PowerGate) for Photovoltaic, Fuel Cell, Energy Storage applications. Efforts focused four technical areas, Capacitors, Cooling, Voltage Sensing and Control of Parallel Inverters. Capacitor efforts developed a hybrid capacitor approach for conditioning SatCon's AIPM unit supply voltages by incorporating several typesmore » and sizes to store energy and filter at high, medium and low frequencies while minimizing parasitics (ESR and ESL). Cooling efforts converted the liquid cooled AIPM module to an air-cooled unit using augmented fin, impingement flow cooling. Voltage sensing efforts successfully modified the existing AIPM sensor board to allow several, application dependent configurations and enabling voltage sensor galvanic isolation. Parallel inverter control efforts realized a reliable technique to control individual inverters, connected in a parallel configuration, without a communication link. Individual inverter currents, AC and DC, were balanced in the paralleled modules by introducing a delay to the individual PWM gate pulses. The load current sharing is robust and independent of load types (i.e., linear and nonlinear, resistive and/or inductive). It is a simple yet powerful method for paralleling both individual devices dramatically improves reliability and fault tolerance of parallel inverter power systems. A patent application has been made based on this control technology.« less
Machine Learning Based Online Performance Prediction for Runtime Parallelization and Task Scheduling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, J; Ma, X; Singh, K
2008-10-09
With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as well as developing next-generation software requires assistance from hardware, compilers and runtime systems to exploit parallelism transparently within applications. These systems must decompose applications into tasks that can be executed in parallel and then schedule those tasks to minimize load imbalance. However, many systems lack a priori knowledge about the execution time of all tasks to perform effective load balancing with low scheduling overhead. In this paper, we approach this fundamental problem using machine learning techniques first to generatemore » performance models for all tasks and then applying those models to perform automatic performance prediction across program executions. We also extend an existing scheduling algorithm to use generated task cost estimates for online task partitioning and scheduling. We implement the above techniques in the pR framework, which transparently parallelizes scripts in the popular R language, and evaluate their performance and overhead with both a real-world application and a large number of synthetic representative test scripts. Our experimental results show that our proposed approach significantly improves task partitioning and scheduling, with maximum improvements of 21.8%, 40.3% and 22.1% and average improvements of 15.9%, 16.9% and 4.2% for LMM (a real R application) and synthetic test cases with independent and dependent tasks, respectively.« less
Information Dynamics as Foundation for Network Management
2014-12-04
developed to adapt to channel dynamics in a mobile network environment. We devise a low- complexity online scheduling algorithm integrated with the...has been accepted for the Journal on Network and Systems Management in 2014. - RINC programmable platform for Infrastructure -as-a-Service public... backend servers. Rather than implementing load balancing in dedicated appliances, commodity SDN switches can perform this function. We design
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Huang, Zhenyu; Rice, Mark J.
Contingency analysis studies are necessary to assess the impact of possible power system component failures. The results of the contingency analysis are used to ensure the grid reliability, and in power market operation for the feasibility test of market solutions. Currently, these studies are performed in real time based on the current operating conditions of the grid with a set of pre-selected contingency list, which might result in overlooking some critical contingencies caused by variable system status. To have a complete picture of a power grid, more contingencies need to be studied to improve grid reliability. High-performance computing techniques holdmore » the promise of being able to perform the analysis for more contingency cases within a much shorter time frame. This paper evaluates the performance of counter-based dynamic load balancing schemes for a massive contingency analysis program on 10,000+ cores. One million N-2 contingency analysis cases with a Western Electricity Coordinating Council power grid model have been used to demonstrate the performance. The speedup of 3964 with 4096 cores and 7877 with 10240 cores are obtained. This paper reports the performance of the load balancing scheme with a single counter and two counters, describes disk I/O issues, and discusses other potential techniques for further improving the performance.« less
Load balance in total knee arthroplasty: an in vitro analysis.
El-Hawary, Ron; Roth, Sandra E; King, Graham J W; Chess, David G; Johnson, James A
2006-09-01
One of the goals of total knee arthroplasty (TKA) is to balance the loads between the compartments of the knee. An instrumented load cell that measures compartment loads in real time is utilized to evaluate conventional, qualitative methods of achieving this balance. TKA was performed on 10 cadaveric knees. Prior to and after load balancing, compartment forces were measured at flexion angles of 0-90 degrees. Knees were randomly assigned into one of two groups, based upon whether or not the surgeons could visualize the load cell's output during balancing. Prior to attempting load balance, there were significant differences between the medial and lateral compartment loads for all knees (p < 0.05). After attempting balance with the aid of the load cell, there was equal load balance at all angles studied. Without the aid of the load cell, balance was not consistently achieved at every angle. Conventional load balancing techniques in TKA are not perfect. Copyright 2006 John Wiley & Sons, Ltd.
Foreign Area Studies: India. A Syllabus.
ERIC Educational Resources Information Center
Brown, Emily C., Ed.
Developed for a one-semester college credit course, this syllabus encourages a cross-cultural approach to the study of Indian society. The objective is to provide students with not only a balanced view of India but also with an idea of dynamics of change. Emphasis is upon paralleling social and political issues in the United States with those of…
Micropollutant loads in the urban water cycle.
Musolff, Andreas; Leschik, Sebastian; Reinstorf, Frido; Strauch, Gerhard; Schirmer, Mario
2010-07-01
The assessment of micropollutants in the urban aquatic environment is a challenging task since both the water balance and the contaminant concentrations are characterized by a pronounced variability in time and space. In this study the water balance of a central European urban drainage catchment is quantified for a period of one year. On the basis of a concentration monitoring of several micropollutants, a contaminant mass balance for the study area's wastewater, surface water, and groundwater is derived. The release of micropollutants from the catchment was mainly driven by the discharge of the wastewater treatment plant. However, combined sewer overflows (CSO) released significant loads of caffeine, bisphenol A, and technical 4-nonylphenol. Since an estimated fraction of 9.9-13.0% of the wastewater's dry weather flow was lost as sewer leakages to the groundwater, considerable loads of bisphenol A and technical 4-nonylphenol were also released by the groundwater pathway. The different temporal dynamics of release loads by CSO as an intermittent source and groundwater as well as treated wastewater as continuous pathways may induce acute as well as chronic effects on the receiving aquatic ecosystem. This study points out the importance of the pollution pathway CSO and groundwater for the contamination assessments of urban water resources.
New Parallel computing framework for radiation transport codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostin, M.A.; /Michigan State U., NSCL; Mokhov, N.V.
A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility canmore » be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
Zhao, Jinsong; Wang, Zhipeng; Zhang, Chuanbi; Yang, Chifu; Bai, Wenjie; Zhao, Zining
2018-06-01
The shaking table based on electro-hydraulic servo parallel mechanism has the advantage of strong carrying capacity. However, the strong coupling caused by the eccentric load not only affects the degree of freedom space control precision, but also brings trouble to the system control. A novel decoupling control strategy is proposed, which is based on modal space to solve the coupling problem for parallel mechanism with eccentric load. The phenomenon of strong dynamic coupling among degree of freedom space is described by experiments, and its influence on control design is discussed. Considering the particularity of plane motion, the dynamic model is built by Lagrangian method to avoid complex calculations. The dynamic equations of the coupling physical space are transformed into the dynamic equations of the decoupling modal space by using the weighted orthogonality of the modal main mode with respect to mass matrix and stiffness matrix. In the modal space, the adjustments of the modal channels are independent of each other. Moreover, the paper discusses identical closed-loop dynamic characteristics of modal channels, which will realize decoupling for degree of freedom space, thus a modal space three-state feedback control is proposed to expand the frequency bandwidth of each modal channel for ensuring their near-identical responses in a larger frequency range. Experimental results show that the concept of modal space three-state feedback control proposed in this paper can effectively reduce the strong coupling problem of degree of freedom space channels, which verify the effectiveness of the proposed model space state feedback control strategy for improving the control performance of the electro-hydraulic servo plane redundant driving mechanism. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations
NASA Technical Reports Server (NTRS)
Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos
2009-01-01
A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
A Force Balanced Fragmentation Method for ab Initio Molecular Dynamic Simulation of Protein.
Xu, Mingyuan; Zhu, Tong; Zhang, John Z H
2018-01-01
A force balanced generalized molecular fractionation with conjugate caps (FB-GMFCC) method is proposed for ab initio molecular dynamic simulation of proteins. In this approach, the energy of the protein is computed by a linear combination of the QM energies of individual residues and molecular fragments that account for the two-body interaction of hydrogen bond between backbone peptides. The atomic forces on the caped H atoms were corrected to conserve the total force of the protein. Using this approach, ab initio molecular dynamic simulation of an Ace-(ALA) 9 -NME linear peptide showed the conservation of the total energy of the system throughout the simulation. Further a more robust 110 ps ab initio molecular dynamic simulation was performed for a protein with 56 residues and 862 atoms in explicit water. Compared with the classical force field, the ab initio molecular dynamic simulations gave better description of the geometry of peptide bonds. Although further development is still needed, the current approach is highly efficient, trivially parallel, and can be applied to ab initio molecular dynamic simulation study of large proteins.
Community Detection on the GPU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh
We present and evaluate a new GPU algorithm based on the Louvain method for community detection. Our algorithm is the first for this problem that parallelizes the access to individual edges. In this way we can fine tune the load balance when processing networks with nodes of highly varying degrees. This is achieved by scaling the number of threads assigned to each node according to its degree. Extensive experiments show that we obtain speedups up to a factor of 270 compared to the sequential algorithm. The algorithm consistently outperforms other recent shared memory implementations and is only one order ofmore » magnitude slower than the current fastest parallel Louvain method running on a Blue Gene/Q supercomputer using more than 500K threads.« less
Inducer Hydrodynamic Load Measurement Devices
NASA Technical Reports Server (NTRS)
Skelley, Stephen E.; Zoladz, Thomas F.
2002-01-01
Marshall Space Flight Center (MSFC) has demonstrated two measurement devices for sensing and resolving the hydrodynamic loads on fluid machinery. The first - a derivative of the six component wind tunnel balance - senses the forces and moments on the rotating device through a weakened shaft section instrumented with a series of strain gauges. This "rotating balance" was designed to directly measure the steady and unsteady hydrodynamic loads on an inducer, thereby defining both the amplitude and frequency content associated with operating in various cavitation modes. The second device - a high frequency response pressure transducer surface mounted on a rotating component - was merely an extension of existing technology for application in water. MSFC has recently completed experimental evaluations of both the rotating balance and surface-mount transducers in a water test loop. The measurement bandwidth of the rotating balance was severely limited by the relative flexibility of the device itself, resulting in an unexpectedly low structural bending mode and invalidating the higher frequency response data. Despite these limitations, measurements confirmed that the integrated loads on the four-bladed inducer respond to both cavitation intensity and cavitation phenomena. Likewise, the surface-mount pressure transducers were subjected to a range of temperatures and flow conditions in a non-rotating environment to record bias shifts and transfer functions between the transducers and a reference device. The pressure transducer static performance was within manufacturer's specifications and dynamic response accurately followed that of the reference.
Inducer Hydrodynamic Load Measurement Devices
NASA Technical Reports Server (NTRS)
Skelley, Stephen E.; Zoladz, Thomas F.; Turner, Jim (Technical Monitor)
2002-01-01
Marshall Space Flight Center (MSFC) has demonstrated two measurement devices for sensing and resolving the hydrodynamic loads on fluid machinery. The first - a derivative of the six-component wind tunnel balance - senses the forces and moments on the rotating device through a weakened shaft section instrumented with a series of strain gauges. This rotating balance was designed to directly measure the steady and unsteady hydrodynamic loads on an inducer, thereby defining both the amplitude and frequency content associated with operating in various cavitation modes. The second device - a high frequency response pressure transducer surface mounted on a rotating component - was merely an extension of existing technology for application in water. MSFC has recently completed experimental evaluations of both the rotating balance and surface-mount transducers in a water test loop. The measurement bandwidth of the rotating balance was severely limited by the relative flexibility of the device itself, resulting in an unexpectedly low structural bending mode and invalidating the higher-frequency response data. Despite these limitations, measurements confirmed that the integrated loads on the four-bladed inducer respond to both cavitation intensity and cavitation phenomena. Likewise, the surface-mount pressure transducers were subjected to a range of temperatures and flow conditions in a non-rotating environment to record bias shifts and transfer functions between the transducers and a reference device. The pressure transducer static performance was within manufacturer's specifications and dynamic response accurately followed that of the reference.
Self-Avoiding Walks Over Adaptive Triangular Grids
NASA Technical Reports Server (NTRS)
Heber, Gerd; Biswas, Rupak; Gao, Guang R.; Saini, Subhash (Technical Monitor)
1999-01-01
Space-filling curves is a popular approach based on a geometric embedding for linearizing computational meshes. We present a new O(n log n) combinatorial algorithm for constructing a self avoiding walk through a two dimensional mesh containing n triangles. We show that for hierarchical adaptive meshes, the algorithm can be locally adapted and easily parallelized by taking advantage of the regularity of the refinement rules. The proposed approach should be very useful in the runtime partitioning and load balancing of adaptive unstructured grids.
Development of a multicomponent force and moment balance for water tunnel applications, volume 1
NASA Technical Reports Server (NTRS)
Suarez, Carlos J.; Malcolm, Gerald N.; Kramer, Brian R.; Smith, Brooke C.; Ayers, Bert F.
1994-01-01
The principal objective of this research effort was to develop a multicomponent strain gauge balance to measure forces and moments on models tested in flow visualization water tunnels. An internal balance was designed that allows measuring normal and side forces, and pitching, yawing and rolling moments (no axial force). The five-components to applied loads, low interactions between the sections and no hysteresis. Static experiments (which are discussed in this Volume) were conducted in the Eidetics water tunnel with delta wings and a model of the F/A-18. Experiments with the F/A-18 model included a thorough baseline study and investigations of the effect of control surface deflections and of several Forebody Vortex Control (FVC) techniques. Results were compared to wind tunnel data and, in general, the agreement is very satisfactory. The results of the static tests provide confidence that loads can be measured accurately in the water tunnel with a relatively simple multicomponent internal balance. Dynamic experiments were also performed using the balance, and the results are discussed in detail in Volume 2 of this report.
Effects of age and loading rate on equine cortical bone failure.
Kulin, Robb M; Jiang, Fengchun; Vecchio, Kenneth S
2011-01-01
Although clinical bone fractures occur predominantly under impact loading (as occurs during sporting accidents, falls, high-speed impacts or other catastrophic events), experimentally validated studies on the dynamic fracture behavior of bone, at the loading rates associated with such events, remain limited. In this study, a series of tests were performed on femoral specimens obtained post-mortem from equine donors ranging in age from 6 months to 28 years. Fracture toughness and compressive tests were performed under both quasi-static and dynamic loading conditions in order to determine the effects of loading rate and age on the mechanical behavior of the cortical bone. Fracture toughness experiments were performed using a four-point bending geometry on single and double-notch specimens in order to measure fracture toughness, as well as observe differences in crack initiation between dynamic and quasi-static experiments. Compressive properties were measured on bone loaded parallel and transverse to the osteonal growth direction. Fracture propagation was then analyzed using scanning electron and scanning confocal microscopy to observe the effects of microstructural toughening mechanisms at different strain rates. Specimens from each horse were also analyzed for dry, wet and mineral densities, as well as weight percent mineral, in order to investigate possible influences of composition on mechanical behavior. Results indicate that bone has a higher compressive strength, but lower fracture toughness when tested dynamically as compared to quasi-static experiments. Fracture toughness also tends to decrease with age when measured quasi-statically, but shows little change with age under dynamic loading conditions, where brittle "cleavage-like" fracture behavior dominates. Copyright © 2010 Elsevier Ltd. All rights reserved.
Parallel satellite orbital situational problems solver for space missions design and control
NASA Astrophysics Data System (ADS)
Atanassov, Atanas Marinov
2016-11-01
Solving different scientific problems for space applications demands implementation of observations, measurements or realization of active experiments during time intervals in which specific geometric and physical conditions are fulfilled. The solving of situational problems for determination of these time intervals when the satellite instruments work optimally is a very important part of all activities on every stage of preparation and realization of space missions. The elaboration of universal, flexible and robust approach for situation analysis, which is easily portable toward new satellite missions, is significant for reduction of missions' preparation times and costs. Every situation problem could be based on one or more situation conditions. Simultaneously solving different kinds of situation problems based on different number and types of situational conditions, each one of them satisfied on different segments of satellite orbit requires irregular calculations. Three formal approaches are presented. First one is related to situation problems description that allows achieving flexibility in situation problem assembling and presentation in computer memory. The second formal approach is connected with developing of situation problem solver organized as processor that executes specific code for every particular situational condition. The third formal approach is related to solver parallelization utilizing threads and dynamic scheduling based on "pool of threads" abstraction and ensures a good load balance. The developed situation problems solver is intended for incorporation in the frames of multi-physics multi-satellite space mission's design and simulation tools.
A comparison of queueing, cluster and distributed computing systems
NASA Technical Reports Server (NTRS)
Kaplan, Joseph A.; Nelson, Michael L.
1993-01-01
Using workstation clusters for distributed computing has become popular with the proliferation of inexpensive, powerful workstations. Workstation clusters offer both a cost effective alternative to batch processing and an easy entry into parallel computing. However, a number of workstations on a network does not constitute a cluster. Cluster management software is necessary to harness the collective computing power. A variety of cluster management and queuing systems are compared: Distributed Queueing Systems (DQS), Condor, Load Leveler, Load Balancer, Load Sharing Facility (LSF - formerly Utopia), Distributed Job Manager (DJM), Computing in Distributed Networked Environments (CODINE), and NQS/Exec. The systems differ in their design philosophy and implementation. Based on published reports on the different systems and conversations with the system's developers and vendors, a comparison of the systems are made on the integral issues of clustered computing.
A compositional reservoir simulator on distributed memory parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rame, M.; Delshad, M.
1995-12-31
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Evaluation of a load cell model for dynamic calibration of the rotor systems research aircraft
NASA Technical Reports Server (NTRS)
Duval, R. W.; Bahrami, H.; Wellman, B.
1985-01-01
The Rotor Systems Research Aircraft uses load cells to isolate the rotor/transmission system from the fuselage. An analytical model of the relationship between applied rotor loads and the resulting load cell measurements is derived by applying a force-and-moment balance to the isolated rotor/transmission system. The model is then used to estimate the applied loads from measured load cell data, as obtained from a ground-based shake test. Using nominal design values for the parameters, the estimation errors, for the case of lateral forcing, were shown to be on the order of the sensor measurement noise in all but the roll axis. An unmodeled external load appears to be the source of the error in this axis.
NASA Astrophysics Data System (ADS)
Stepanova, L. V.
2017-12-01
Atomistic simulations of the central crack growth process in an infinite plane medium under mixed-mode loading using Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), a classical molecular dynamics code, are performed. The inter-atomic potential used in this investigation is the Embedded Atom Method (EAM) potential. Plane specimens with an initial central crack are subjected to mixed-mode loadings. The simulation cell contains 400,000 atoms. The crack propagation direction angles under different values of the mixity parameter in a wide range of values from pure tensile loading to pure shear loading in a wide range of temperatures (from 0.1 K to 800 K) are obtained and analyzed. It is shown that the crack propagation direction angles obtained by molecular dynamics coincide with the crack propagation direction angles given by the multi-parameter fracture criteria based on the strain energy density and the multi-parameter description of the crack-tip fields. The multi-parameter fracture criteria are based on the multi-parameter stress field description taking into account the higher order terms of the Williams series expansion of the crack tip fields.
Optimal pre-scheduling of problem remappings
NASA Technical Reports Server (NTRS)
Nicol, David M.; Saltz, Joel H.
1987-01-01
A large class of scientific computational problems can be characterized as a sequence of steps where a significant amount of computation occurs each step, but the work performed at each step is not necessarily identical. Two good examples of this type of computation are: (1) regridding methods which change the problem discretization during the course of the computation, and (2) methods for solving sparse triangular systems of linear equations. Recent work has investigated a means of mapping such computations onto parallel processors; the method defines a family of static mappings with differing degrees of importance placed on the conflicting goals of good load balance and low communication/synchronization overhead. The performance tradeoffs are controllable by adjusting the parameters of the mapping method. To achieve good performance it may be necessary to dynamically change these parameters at run-time, but such changes can impose additional costs. If the computation's behavior can be determined prior to its execution, it can be possible to construct an optimal parameter schedule using a low-order-polynomial-time dynamic programming algorithm. Since the latter can be expensive, the performance is studied of the effect of a linear-time scheduling heuristic on one of the model problems, and it is shown to be effective and nearly optimal.
Zhao, Yongli; Chen, Zhendong; Zhang, Jie; Wang, Xinbo
2016-07-25
Driven by the forthcoming of 5G mobile communications, the all-IP architecture of mobile core networks, i.e. evolved packet core (EPC) proposed by 3GPP, has been greatly challenged by the users' demands for higher data rate and more reliable end-to-end connection, as well as operators' demands for low operational cost. These challenges can be potentially met by software defined optical networking (SDON), which enables dynamic resource allocation according to the users' requirement. In this article, a novel network architecture for mobile core network is proposed based on SDON. A software defined network (SDN) controller is designed to realize the coordinated control over different entities in EPC networks. We analyze the requirement of EPC-lightpath (EPCL) in data plane and propose an optical switch load balancing (OSLB) algorithm for resource allocation in optical layer. The procedure of establishment and adjustment of EPCLs is demonstrated on a SDON-based EPC testbed with extended OpenFlow protocol. We also evaluate the OSLB algorithm through simulation in terms of bandwidth blocking ratio, traffic load distribution, and resource utilization ratio compared with link-based load balancing (LLB) and MinHops algorithms.
Ashtiani, Mohammed N; Mahmood-Reza, Azghani
2017-01-01
Postural control after applying perturbation involves neural and muscular efforts to limit the center of mass (CoM) motion. Linear dynamical approaches may not unveil all complexities of body efforts. This study was aimed at determining two nonlinear dynamics parameters (fractal dimension (FD) and largest Lyapunov exponent (LLE)) in addition to the linear standing metrics of balance in perturbed stance. Sixteen healthy young males were subjected to sudden rotations of the standing platform. The vision and cognition during the standing were also interfered. Motion capturing was used to measure the lower limb joints and the CoM displacements. The CoM path length as a linear parameter was increased by elimination of vision (p<0.01) and adding a cognitive load (p<0.01). The CoM nonlinear metric FD was decreased due to the cognitive loads (p<0.001). The visual interference increased the FD of all joints when the task included the cognitive loads (p<0.01). The slightly positive LLE values showed weakly-chaotic behavior of the whole body. The local joint rotations indicated higher LLEs. Results indicated weakly chaotic response of the whole body. Increase in the task difficulty by adding sensory interference had difference effects on parameters. Linear and nonlinear metrics of the perturbed stance showed that a combination of them may properly represent the body behavior.
Dynamics of bubble collapse under vessel confinement in 2D hydrodynamic experiments
NASA Astrophysics Data System (ADS)
Shpuntova, Galina; Austin, Joanna
2013-11-01
One trauma mechanism in biomedical treatment techniques based on the application of cumulative pressure pulses generated either externally (as in shock-wave lithotripsy) or internally (by laser-induced plasma) is the collapse of voids. However, prediction of void-collapse driven tissue damage is a challenging problem, involving complex and dynamic thermomechanical processes in a heterogeneous material. We carry out a series of model experiments to investigate the hydrodynamic processes of voids collapsing under dynamic loading in configurations designed to model cavitation with vessel confinement. The baseline case of void collapse near a single interface is also examined. Thin sheets of tissue-surrogate polymer materials with varying acoustic impedance are used to create one or two parallel material interfaces near the void. Shadowgraph photography and two-color, single-frame particle image velocimetry quantify bubble collapse dynamics including jetting, interface dynamics and penetration, and the response of the surrounding material. Research supported by NSF Award #0954769, ``CAREER: Dynamics and damage of void collapse in biological materials under stress wave loading.''
Comprehensive analysis of helicopters with bearingless rotors
NASA Technical Reports Server (NTRS)
Murthy, V. R.
1988-01-01
A modified Galerkin method is developed to analyze the dynamic problems of multiple-load-path bearingless rotor blades. The development and selection of functions are quite parallel to CAMRAD procedures, greatly facilitating the implementation of the method into the CAMRAD program. A software is developed implementing the modified Galerkin method to determine free vibration characteristics of multiple-load-path rotor blades undergoing coupled flapwise bending, chordwise bending, twisting, and extensional motions. Results are in the process of being obtained by debugging the software.
NASA Astrophysics Data System (ADS)
Stepanova, Larisa; Bronnikov, Sergej
2018-03-01
The crack growth directional angles in the isotropic linear elastic plane with the central crack under mixed-mode loading conditions for the full range of the mixity parameter are found. Two fracture criteria of traditional linear fracture mechanics (maximum tangential stress and minimum strain energy density criteria) are used. Atomistic simulations of the central crack growth process in an infinite plane medium under mixed-mode loading using Large-scale Molecular Massively Parallel Simulator (LAMMPS), a classical molecular dynamics code, are performed. The inter-atomic potential used in this investigation is Embedded Atom Method (EAM) potential. The plane specimens with initial central crack were subjected to Mixed-Mode loadings. The simulation cell contains 400000 atoms. The crack propagation direction angles under different values of the mixity parameter in a wide range of values from pure tensile loading to pure shear loading in a wide diapason of temperatures (from 0.1 К to 800 К) are obtained and analyzed. It is shown that the crack propagation direction angles obtained by molecular dynamics method coincide with the crack propagation direction angles given by the multi-parameter fracture criteria based on the strain energy density and the multi-parameter description of the crack-tip fields.
Holviala, Jarkko H S; Sallinen, Janne M; Kraemer, William J; Alen, Markku J; Häkkinen, Keijo K T
2006-05-01
Progressive strength training can lead to substantial increases in maximal strength and mass of trained muscles, even in older women and men, but little information is available about the effects of strength training on functional capabilities and balance. Thus, the effects of 21 weeks of heavy resistance training--including lower loads performed with high movement velocities--twice a week on isometric maximal force (ISOmax) and force-time curve (force produced in 500 milliseconds, F0-500) and dynamic 1 repetition maximum (1RM) strength of the leg extensors, 10-m walking time (10WALK) and dynamic balance test (DYN.D) were investigated in 26 middle-aged (MI; 52.8 +/- 2.4 years) and 22 older women (O; 63.8 +/- 3.8 years). 1RM, ISOmax, and F0-500 increased significantly in MI by 28 +/- 10%, 20 +/- 19%, 31 +/- 34%, and in O by 27 +/- 8%, 20 +/- 16%, 18 +/- 45%, respectively. 10WALK (MI and O, p < 0.001) shortened and DYN.D improved (MI and O, p < 0.001). The present strength-training protocol led to large increases in maximal and explosive strength characteristics of leg extensors and in walking speed, as well to an improvement in the present dynamic balance test performance in both age groups. Although training-induced increase in explosive strength is an important factor for aging women, there are other factors that contribute to improvements in dynamic balance capacity. This study indicates that total body heavy resistance training, including explosive dynamic training, may be applied in rehabilitation or preventive exercise protocols in aging women to improve dynamic balance capabilities.
Application of an impedance matching transformer to a plasma focus.
Bures, B L; James, C; Krishnan, M; Adler, R
2011-10-01
A plasma focus was constructed using an impedance matching transformer to improve power transfer between the pulse power and the dynamic plasma load. The system relied on two switches and twelve transformer cores to produce a 100 kA pulse in short circuit on the secondary at 27 kV on the primary with 110 J stored. With the two transformer systems in parallel, the Thevenin equivalent circuit parameters on the secondary side of the driver are: C = 10.9 μF, V(0) = 4.5 kV, L = 17 nH, and R = 5 mΩ. An equivalent direct drive circuit would require a large number of switches in parallel, to achieve the same Thevenin equivalent. The benefits of this approach are replacement of consumable switches with non-consumable transformer cores, reduction of the driver inductance and resistance as viewed by the dynamic load, and reduction of the stored energy to produce a given peak current. The system is designed to operate at 100 Hz, so minimizing the stored energy results in less load on the thermal management system. When operated at 1 Hz, the neutron yield from the transformer matched plasma focus was similar to the neutron yield from a conventional (directly driven) plasma focus at the same peak current.
A versatile computer package for mechanism analysis, part 2: Dynamics and balance
NASA Astrophysics Data System (ADS)
Davies, T.
The algorithms required for the shaking force components, the shaking moment about the crankshaft axis, and the input torque and bearing load components are discussed using the textile machine as a focus for the discussion. The example is also used to provide illustrations of the output for options on the hodograph of the shaking force vector. This provides estimates of the optimum contrarotating masses and their locations for a generalized primary Lanchester balancer. The suitability of generalized Lanchester balancers particularly for textile machinery, and the overall strategy used during the development of the package are outlined.
Hunter, Eric J.; Titze, Ingo R.
2012-01-01
Objectives To quantify the recovery of voice following a 2-hour vocal loading exercise (oral reading). Methods 86 adult participants tracked their voice recovery using short vocal tasks and perceptual ratings after an initial vocal loading exercise and for the following two days. Results Short-term recovery was apparent with 90% recovery within 4-6 hours and full recovery at 12-18 hours. Recovery was shown to be similar to a dermal wound healing trajectory. Conclusions The new recovery trajectory highlighted by the vocal loading exercise in the current study is called a vocal recovery trajectory. By comparing vocal fatigue to dermal wound healing, this trajectory is parallel to a chronic wound healing trajectory (as opposed to an acute wound healing trajectory). This parallel suggests that vocal fatigue from the daily use of the voice could be treated as a chronic wound, with the healing and repair mechanisms in a state of constant repair. In addition, there is likely a vocal fatigue threshold at which point the level of tissue damage would shift the chronic healing trajectory to an acute healing trajectory. PMID:19663377
The Influence of External Load on Quadriceps Muscle and Tendon Dynamics during Jumping.
Earp, Jacob E; Newton, Robert U; Cormie, Prue; Blazevich, Anthony J
2017-11-01
Tendons possess both viscous (rate-dependent) and elastic (rate-independent) properties that determine tendon function. During high-speed movements external loading increases both the magnitude (FT) and rate (RFDT) of tendon loading. The influence of external loading on muscle and tendon dynamics during maximal vertical jumping was explored. Ten resistance-trained men performed parallel-depth, countermovement vertical jumps with and without additional load (0%, 30%, 60%, and 90% of maximum squat lift strength), while joint kinetics and kinematics, quadriceps tendon length (LT) and patellar tendon FT and RFDT were estimated using integrated ultrasound, motion analysis and force platform data and muscle tendon modelling. Estimated FT and RFDT, but not peak LT, increased with external loading. Temporal comparisons between 0% and 90% loads revealed that FT was greater with 90% loading throughout the majority of the movement (11%-81% and 87%-95% movement duration). However, RFDT was greater with 90% load only during the early movement initiation phase (8%-15% movement duration) but was greater in the 0% load condition later in the eccentric phase (27%-38% movement duration). LT was longer during the early movement (12%-23% movement duration) but shorter in the late eccentric and early concentric phases (48%-55% movement duration) with 90% load. External loading positively influenced peak FT and RFDT but tendon strain appeared unaffected, suggesting no additive effect of external loading on patellar tendon lengthening during human jumping. Temporal analysis revealed that external loading resulted in a large initial RFDT that may have caused dynamic stiffening of the tendon and attenuated tendon strain throughout the movement. These results suggest that external loading influences tendon lengthening in both a load- and movement-dependent manner.
Carrying capacity of water resources in Bandung Basin
NASA Astrophysics Data System (ADS)
Marganingrum, D.
2018-02-01
The concept of carrying capacity is widely used in various sectors as a management tool for sustainable development processes. This idea has also been applied in watershed or basin scale. Bandung Basin is the upstream of Citarum watershed known as one of the national strategic areas. This area has developed into a metropolitan area loaded with various environmental problems. Therefore, research that is related to environmental carrying capacity in this area becomes a strategic issue. However, research on environmental carrying capacity that has been done in this area is still partial either in water balance terminology, land suitability, ecological footprint, or balance of supply and demand of resources. This paper describes the application of the concept of integrated environmental carrying capacity in order to overcome the increasing complexity and dynamic environmental problems. The sector that becomes the focus of attention is the issue of water resources. The approach method to be carried out is to combine the concept of maximum balance and system dynamics. The dynamics of the proposed system is the ecological dynamics and population that cannot be separated from one another as a unity of the Bandung Basin ecosystem.
Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Hui; Shi, Yanjun
A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate amore » pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.« less
Algorithm of dynamic regulation of a system of duct, for a high accuracy climatic system
NASA Astrophysics Data System (ADS)
Arbatskiy, A. A.; Afonina, G. N.; Glazov, V. S.
2017-11-01
Currently, major part of climatic system, are stationary in projected mode only. At the same time, many modern industrial sites, require constant or periodical changes in technological process. That is 80% of the time, the industrial site is not require ventilation system in projected mode and high precision of climatic parameters must maintain. While that not constantly is in use for climatic systems, which use in parallel for different rooms, we will be have a problem for balance of duct system. For this problem, was created the algorithm for quantity regulation, with minimal changes. Dynamic duct system: Developed of parallel control system of air balance, with high precision of climatic parameters. The Algorithm provide a permanent pressure in main duct, in different a flow of air. Therefore, the ending devises air flow have only one parameter for regulation - flaps open area. Precision of regulation increase and the climatic system provide high precision for temperature and humidity (0,5C for temperature, 5% for relative humidity). Result: The research has been made in CFD-system - PHOENICS. Results for velocity of air in duct, for pressure of air in duct for different operation mode, has been obtained. Equation for air valves positions, with different parameters for climate in room’s, has been obtained. Energy saving potential for dynamic duct system, for different types of a rooms, has been calculated.
Effects of Demand Response on Retail and Wholesale Power Markets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chassin, David P.; Kalsi, Karanjit
2012-07-26
Demand response has grown to be a part of the repertoire of resources used by utilities to manage the balance between generation and load. In recent years, advances in communications and control technology have enabled utilities to consider continuously controlling demand response to meet generation, rather than the other way around. This paper discusses the economic applications of a general method for load resource analysis that parallels the approach used to analyze generation resources and uses the method to examine the results of the US Department of Energy’s Olympic Peninsula Demonstration Testbed. A market-based closed-loop system of controllable assets ismore » discussed with necessary and sufficient conditions on system controllability, observability and stability derived.« less
New method to improve dynamic stiffness of electro-hydraulic servo systems
NASA Astrophysics Data System (ADS)
Bai, Yanhong; Quan, Long
2013-09-01
Most current researches working on improving stiffness focus on the application of control theories. But controller in closed-loop hydraulic control system takes effect only after the controlled position is deviated, so the control action is lagged. Thus dynamic performance against force disturbance and dynamic load stiffness can’t be improved evidently by advanced control algorithms. In this paper, the elementary principle of maintaining piston position unchanged under sudden external force load change by charging additional oil is analyzed. On this basis, the conception of raising dynamic stiffness of electro hydraulic position servo system by flow feedforward compensation is put forward. And a scheme using double servo valves to realize flow feedforward compensation is presented, in which another fast response servo valve is added to the regular electro hydraulic servo system and specially utilized to compensate the compressed oil volume caused by load impact in time. The two valves are arranged in parallel to control the cylinder jointly. Furthermore, the model of flow compensation is derived, by which the product of the amplitude and width of the valve’s pulse command signal can be calculated. And determination rules of the amplitude and width of pulse signal are concluded by analysis and simulations. Using the proposed scheme, simulations and experiments at different positions with different force changes are conducted. The simulation and experimental results show that the system dynamic performance against load force impact is largely improved with decreased maximal dynamic position deviation and shortened settling time. That is, system dynamic load stiffness is evidently raised. This paper proposes a new method which can effectively improve the dynamic stiffness of electro-hydraulic servo systems.
Coupling between structure and liquids in a parallel stage space shuttle design
NASA Technical Reports Server (NTRS)
Kana, D. D.; Ko, W. L.; Francis, P. H.; Nagy, A.
1972-01-01
A study was conducted to determine the influence of liquid propellants on the dynamic loads for space shuttle vehicles. A parallel-stage configuration model was designed and tested to determine the influence of liquid propellants on coupled natural modes. A forty degree-of-freedom analytical model was also developed for predicting these modes. Currently available analytical models were used to represent the liquid contributions, even though coupled longitudinal and lateral motions are present in such a complex structure. Agreement between the results was found in the lower few modes.
Precision Measurement of Gear Lubricant Load-Carrying Capacity (Feasibility Study)
1981-11-01
ratio of sliding to rolling is much greater in the second group than in the first. This difference has a great influence on the characteristics which...largest angle, the direction of sliding is more parallel to the contact line than normal to it. This is the major difference in conditions of contact...flow to and over the gear teeth, are quite different matters . The ef- fects of gear dynamics and lubricant flow dynamics on lubrication re- lated
A Baseline Load Schedule for the Manual Calibration of a Force Balance
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Gisler, R.
2013-01-01
A baseline load schedule for the manual calibration of a force balance was developed that takes current capabilities at the NASA Ames Balance Calibration Laboratory into account. The load schedule consists of 18 load series with a total of 194 data points. It was designed to satisfy six requirements: (i) positive and negative loadings should be applied for each load component; (ii) at least three loadings should be applied between 0 % and 100 % load capacity; (iii) normal and side force loadings should be applied at the forward gage location, the aft gage location, and the balance moment center; (iv) the balance should be used in UP and DOWN orientation to get axial force loadings; (v) the constant normal and side force approaches should be used to get the rolling moment loadings; (vi) rolling moment loadings should be obtained for 0, 90, 180, and 270 degrees balance orientation. Three different approaches are also reviewed that may be used to independently estimate the natural zeros of the balance. These three approaches provide gage output differences that may be used to estimate the weight of both the metric and non-metric part of the balance. Manual calibration data of NASA s MK29A balance and machine calibration data of NASA s MC60D balance are used to illustrate and evaluate different aspects of the proposed baseline load schedule design.
NASA Technical Reports Server (NTRS)
Stone, Ralph W., Jr.; Hultz, Burton E.
1949-01-01
The characteristics of a cargo-dropping device having extensible rotating blades as load-carrying surfaces have been studied in simulated vertical descent in the Langley 20-foot free-spinning tunnel. The investigation included tests to determine the variation in vertical sinking speed with load. A study of the blade characteristics and of the test results indicated a method of dynamically balancing the blades to permit proper functioning of the device.
NASA Astrophysics Data System (ADS)
Gassmöller, Rene; Bangerth, Wolfgang
2016-04-01
Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a modern advection-field approach, and demonstrate under which conditions which method is more efficient. We implemented the presented methods in ASPECT (aspect.dealii.org), a freely available open-source community code for geodynamic simulations. The structure of the particle code is highly modular, and segregated from the PDE solver, and can thus be easily transferred to other programs, or adapted for various application cases.
Waiting can be an optimal conservation strategy, even in a crisis discipline
Possingham, Hugh P.; Bode, Michael
2017-01-01
Biodiversity conservation projects confront immediate and escalating threats with limited funding. Conservation theory suggests that the best response to the species extinction crisis is to spend money as soon as it becomes available, and this is often an explicit constraint placed on funding. We use a general dynamic model of a conservation landscape to show that this decision to “front-load” project spending can be suboptimal if a delay allows managers to use resources more strategically. Our model demonstrates the existence of temporal efficiencies in conservation management, which parallel the spatial efficiencies identified by systematic conservation planning. The optimal timing of decisions balances the rate of biodiversity decline (e.g., the relaxation of extinction debts, or the progress of climate change) against the rate at which spending appreciates in value (e.g., through interest, learning, or capacity building). We contrast the benefits of acting and waiting in two ecosystems where restoration can mitigate forest bird extinction debts: South Australia’s Mount Lofty Ranges and Paraguay’s Atlantic Forest. In both cases, conservation outcomes cannot be maximized by front-loading spending, and the optimal solution recommends substantial delays before managers undertake conservation actions. Surprisingly, these delays allow superior conservation benefits to be achieved, in less time than front-loading. Our analyses provide an intuitive and mechanistic rationale for strategic delay, which contrasts with the orthodoxy of front-loaded spending for conservation actions. Our results illustrate the conservation efficiencies that could be achieved if decision makers choose when to spend their limited resources, as opposed to just where to spend them. PMID:28894004
Life and dynamic capacity modeling for aircraft transmissions
NASA Technical Reports Server (NTRS)
Savage, Michael
1991-01-01
A computer program to simulate the dynamic capacity and life of parallel shaft aircraft transmissions is presented. Five basic configurations can be analyzed: single mesh, compound, parallel, reverted, and single plane reductions. In execution, the program prompts the user for the data file prefix name, takes input from a ASCII file, and writes its output to a second ASCII file with the same prefix name. The input data file includes the transmission configuration, the input shaft torque and speed, and descriptions of the transmission geometry and the component gears and bearings. The program output file describes the transmission, its components, their capabilities, locations, and loads. It also lists the dynamic capability, ninety percent reliability, and mean life of each component and the transmission as a system. Here, the program, its input and output files, and the theory behind the operation of the program are described.
A performance study of sparse Cholesky factorization on INTEL iPSC/860
NASA Technical Reports Server (NTRS)
Zubair, M.; Ghose, M.
1992-01-01
The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
Principles for problem aggregation and assignment in medium scale multiprocessors
NASA Technical Reports Server (NTRS)
Nicol, David M.; Saltz, Joel H.
1987-01-01
One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.
A Method of Data Aggregation for Wearable Sensor Systems
Shen, Bo; Fu, Jun-Song
2016-01-01
Data aggregation has been considered as an effective way to decrease the data to be transferred in sensor networks. Particularly for wearable sensor systems, smaller battery has less energy, which makes energy conservation in data transmission more important. Nevertheless, wearable sensor systems usually have features like frequently dynamic changes of topologies and data over a large range, of which current aggregating methods can’t adapt to the demand. In this paper, we study the system composed of many wearable devices with sensors, such as the network of a tactical unit, and introduce an energy consumption-balanced method of data aggregation, named LDA-RT. In the proposed method, we develop a query algorithm based on the idea of ‘happened-before’ to construct a dynamic and energy-balancing routing tree. We also present a distributed data aggregating and sorting algorithm to execute top-k query and decrease the data that must be transferred among wearable devices. Combining these algorithms, LDA-RT tries to balance the energy consumptions for prolonging the lifetime of wearable sensor systems. Results of evaluation indicate that LDA-RT performs well in constructing routing trees and energy balances. It also outperforms the filter-based top-k monitoring approach in energy consumption, load balance, and the network’s lifetime, especially for highly dynamic data sources. PMID:27347953
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levine, Benjamin G., E-mail: ben.levine@temple.ed; Stone, John E., E-mail: johns@ks.uiuc.ed; Kohlmeyer, Axel, E-mail: akohlmey@temple.ed
2011-05-01
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm aremore » presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.« less
Stone, John E.; Kohlmeyer, Axel
2011-01-01
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU’s memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 seconds per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis. PMID:21547007
NASA Astrophysics Data System (ADS)
Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.
2016-05-01
This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
NASA Astrophysics Data System (ADS)
Esmaily, M.; Jofre, L.; Mani, A.; Iaccarino, G.
2018-03-01
A geometric multigrid algorithm is introduced for solving nonsymmetric linear systems resulting from the discretization of the variable density Navier-Stokes equations on nonuniform structured rectilinear grids and high-Reynolds number flows. The restriction operation is defined such that the resulting system on the coarser grids is symmetric, thereby allowing for the use of efficient smoother algorithms. To achieve an optimal rate of convergence, the sequence of interpolation and restriction operations are determined through a dynamic procedure. A parallel partitioning strategy is introduced to minimize communication while maintaining the load balance between all processors. To test the proposed algorithm, we consider two cases: 1) homogeneous isotropic turbulence discretized on uniform grids and 2) turbulent duct flow discretized on stretched grids. Testing the algorithm on systems with up to a billion unknowns shows that the cost varies linearly with the number of unknowns. This O (N) behavior confirms the robustness of the proposed multigrid method regarding ill-conditioning of large systems characteristic of multiscale high-Reynolds number turbulent flows. The robustness of our method to density variations is established by considering cases where density varies sharply in space by a factor of up to 104, showing its applicability to two-phase flow problems. Strong and weak scalability studies are carried out, employing up to 30,000 processors, to examine the parallel performance of our implementation. Excellent scalability of our solver is shown for a granularity as low as 104 to 105 unknowns per processor. At its tested peak throughput, it solves approximately 4 billion unknowns per second employing over 16,000 processors with a parallel efficiency higher than 50%.
Reconfigurable Model Execution in the OpenMDAO Framework
NASA Technical Reports Server (NTRS)
Hwang, John T.
2017-01-01
NASA's OpenMDAO framework facilitates constructing complex models and computing their derivatives for multidisciplinary design optimization. Decomposing a model into components that follow a prescribed interface enables OpenMDAO to assemble multidisciplinary derivatives from the component derivatives using what amounts to the adjoint method, direct method, chain rule, global sensitivity equations, or any combination thereof, using the MAUD architecture. OpenMDAO also handles the distribution of processors among the disciplines by hierarchically grouping the components, and it automates the data transfer between components that are on different processors. These features have made OpenMDAO useful for applications in aircraft design, satellite design, wind turbine design, and aircraft engine design, among others. This paper presents new algorithms for OpenMDAO that enable reconfigurable model execution. This concept refers to dynamically changing, during execution, one or more of: the variable sizes, solution algorithm, parallel load balancing, or set of variables-i.e., adding and removing components, perhaps to switch to a higher-fidelity sub-model. Any component can reconfigure at any point, even when running in parallel with other components, and the reconfiguration algorithm presented here performs the synchronized updates to all other components that are affected. A reconfigurable software framework for multidisciplinary design optimization enables new adaptive solvers, adaptive parallelization, and new applications such as gradient-based optimization with overset flow solvers and adaptive mesh refinement. Benchmarking results demonstrate the time savings for reconfiguration compared to setting up the model again from scratch, which can be significant in large-scale problems. Additionally, the new reconfigurability feature is applied to a mission profile optimization problem for commercial aircraft where both the parametrization of the mission profile and the time discretization are adaptively refined, resulting in computational savings of roughly 10% and the elimination of oscillations in the optimized altitude profile.
Soft switching resonant converter with duty-cycle control in DC micro-grid system
NASA Astrophysics Data System (ADS)
Lin, Bor-Ren
2018-01-01
Resonant converter has been widely used for the benefits of low switching losses and high circuit efficiency. However, the wide frequency variation is the main drawback of resonant converter. This paper studies a new modular resonant converter with duty-cycle control to overcome this problem and realise the advantages of low switching losses, no reverse recovery current loss, balance input split voltages and constant frequency operation for medium voltage direct currentgrid or system network. Series full-bridge (FB) converters are used in the studied circuit in order to reduce the voltage stresses and power rating on power semiconductors. Flying capacitor is used between two FB converters to balance input split voltages. Two circuit modules are paralleled on the secondary side to lessen the current rating of rectifier diodes and the size of magnetic components. The resonant tank is operated at inductive load circuit to help power switches to be turned on at zero voltage with wide load range. The pulse-width modulation scheme is used to regulate output voltage. Experimental verifications are provided to show the performance of the proposed circuit.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu
2012-10-01
We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
On the relationship between parallel computation and graph embedding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gupta, A.K.
1989-01-01
The problem of efficiently simulating an algorithm designed for an n-processor parallel machine G on an m-processor parallel machine H with n > m arises when parallel algorithms designed for an ideal size machine are simulated on existing machines which are of a fixed size. The author studies this problem when every processor of H takes over the function of a number of processors in G, and he phrases the simulation problem as a graph embedding problem. New embeddings presented address relevant issues arising from the parallel computation environment. The main focus centers around embedding complete binary trees into smaller-sizedmore » binary trees, butterflies, and hypercubes. He also considers simultaneous embeddings of r source machines into a single hypercube. Constant factors play a crucial role in his embeddings since they are not only important in practice but also lead to interesting theoretical problems. All of his embeddings minimize dilation and load, which are the conventional cost measures in graph embeddings and determine the maximum amount of time required to simulate one step of G on H. His embeddings also optimize a new cost measure called ({alpha},{beta})-utilization which characterizes how evenly the processors of H are used by the processors of G. Ideally, the utilization should be balanced (i.e., every processor of H simulates at most (n/m) processors of G) and the ({alpha},{beta})-utilization measures how far off from a balanced utilization the embedding is. He presents embeddings for the situation when some processors of G have different capabilities (e.g. memory or I/O) than others and the processors with different capabilities are to be distributed uniformly among the processors of H. Placing such conditions on an embedding results in an increase in some of the cost measures.« less
Inducer Hydrodynamic Forces in a Cavitating Environment
NASA Technical Reports Server (NTRS)
Skelley, Stephen E.
2004-01-01
Marshall Space Flight Center has developed and demonstrated a measurement device for sensing and resolving the hydrodynamic loads on fluid machinery. The device - a derivative of the six-component wind tunnel balance - senses the forces and moments on the rotating device through a weakened shaft section instrumented with a series of strain gauges. This rotating balance was designed to directly measure the steady and unsteady hydrodynamic loads on an inducer, thereby defining the amplitude and frequency content associated with operating in various cavitation modes. The rotating balance was calibrated statically using a dead-weight load system in order to generate the 6 x 12 calibration matrix later used to convert measured voltages to engineering units. Structural modeling suggested that the rotating assembly first bending mode would be significantly reduced with the balance s inclusion. This reduction in structural stiffness was later confirmed experimentally with a hammer-impact test. This effect, coupled with the relatively large damping associated with the rotating balance waterproofing material, limited the device s bandwidth to approximately 50 Hertz Other pre-test validations included sensing the test article rotating assembly built-in imbalance for two configurations and directly measuring the assembly mass and buoyancy while submerged under water. Both tests matched predictions and confirmed the device s sensitivity while stationary and rotating. The rotating balance was then demonstrated in a water test of a full-scale Space Shuttle Main Engine high-pressure liquid oxygen pump inducer. Experimental data was collected a scaled operating conditions at three flow coefficients across a range of cavitation numbers for the single inducer geometry and radial clearance. Two distinct cavitation modes were observed symmetric tip vortex cavitation and alternate-blade cavitation. Although previous experimental tests on the same inducer demonstrated two additional cavitation modes at lower inlet pressures, these conditions proved unreachable with the rotating balance installed due to the intense dynamic environment. The sensed radial load was less influenced by flow coefficient than by cavitation number or cavitation mode although the flow coefficient range was relatively narrow. Transition from symmetric tip vortex to alternate-blade cavitation corresponded to changes in both radial load magnitude and radial load orientation relative to the inducer. Sensed moments indicated that the effective load center moved downstream during this change in cavitation mode. An occurrence of "higher+rdex cavitation" was also detected in both the stationary pressures and the rotating balance data although the frequency of the phenomena was well above the reliable bandwidth of the rotating balance. In summary the experimental tests proved both the concept and device s capability despite the limitations and confirmed that hydrodynamically-induced forces and moments develop in response to the unbalanced pressure field, which is, in turn, a product of the cavitation environment.
NASA Astrophysics Data System (ADS)
Martin, M. A.; Winkelmann, R.; Haseloff, M.; Albrecht, T.; Bueler, E.; Khroulev, C.; Levermann, A.
2010-08-01
We present a dynamic equilibrium simulation of the ice sheet-shelf system on Antarctica with the Potsdam Parallel Ice Sheet Model (PISM-PIK). The simulation is initialized with present-day conditions for topography and ice thickness and then run to steady state with constant present-day surface mass balance. Surface temperature and basal melt distribution are parameterized. Grounding lines and calving fronts are free to evolve, and their modeled equilibrium state is compared to observational data. A physically-motivated dynamic calving law based on horizontal spreading rates allows for realistic calving fronts for various types of shelves. Steady-state dynamics including surface velocity and ice flux are analyzed for whole Antarctica and the Ronne-Filchner and Ross ice shelf areas in particular. The results show that the different flow regimes in sheet and shelves, and the transition zone between them, are captured reasonably well, supporting the approach of superposition of SIA and SSA for the representation of fast motion of grounded ice. This approach also leads to a natural emergence of streams in this new 3-D marine ice sheet model.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-09-12
Mcqueuer is a simple tool that allows anyone from researchers to experienced developers to create multi-node/multi-core jobs by simply creating a file with a list of commands. Users simply combine tasks, which would otherwise each be their own job on the cluster, into a single file that is given to Mcqueuer. Mcqueuer then does the heavy lifting required to process the tasks in parallel in a single multi-node job. In addition, Mcqueuer provides load-balancing, which frees the user from having to worry about complex memory and CPU considerations, and instead focus on the processing itself.
Robertson, Benjamin D; Vadakkeveedu, Siddarth; Sawicki, Gregory S
2017-05-24
We present a novel biorobotic framework comprised of a biological muscle-tendon unit (MTU) mechanically coupled to a feedback controlled robotic environment simulation that mimics in vivo inertial/gravitational loading and mechanical assistance from a parallel elastic exoskeleton. Using this system, we applied select combinations of biological muscle activation (modulated with rate-coded direct neural stimulation) and parallel elastic assistance (applied via closed-loop mechanical environment simulation) hypothesized to mimic human behavior based on previously published modeling studies. These conditions resulted in constant system-level force-length dynamics (i.e., stiffness), reduced biological loads, increased muscle excursion, and constant muscle average positive power output-all consistent with laboratory experiments on intact humans during exoskeleton assisted hopping. Mechanical assistance led to reduced estimated metabolic cost and MTU apparent efficiency, but increased apparent efficiency for the MTU+Exo system as a whole. Findings from this study suggest that the increased natural resonant frequency of the artificially stiffened MTU+Exo system, along with invariant movement frequencies, may underlie observed limits on the benefits of exoskeleton assistance. Our novel approach demonstrates that it is possible to capture the salient features of human locomotion with exoskeleton assistance in an isolated muscle-tendon preparation, and introduces a powerful new tool for detailed, direct examination of how assistive devices affect muscle-level neuromechanics and energetics. Copyright © 2017 Elsevier Ltd. All rights reserved.
Structures and Dynamics Division research and technology plans, FY 1982
NASA Technical Reports Server (NTRS)
Bales, K. S.
1982-01-01
Computational devices to improve efficiency for structural calculations are assessed. The potential of large arrays of microprocessors operating in parallel for finite element analysis is defined, and the impact of specialized computer hardware on static, dynamic, thermal analysis in the optimization of structural analysis and design calculations is determined. General aviation aircraft crashworthiness and occupant survivability is also considered. Mechanics technology required for design coefficient, fault tolerant advanced composite aircraft components subject to combined loads, impact, postbuckling effects and local discontinuities are developed.
A Baseline Load Schedule for the Manual Calibration of a Force Balance
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Gisler, R.
2013-01-01
A baseline load schedule for the manual calibration of a force balance is defined that takes current capabilities at the NASA Ames Balance Calibration Laboratory into account. The chosen load schedule consists of 18 load series with a total of 194 data points. It was designed to satisfy six requirements: (i) positive and negative loadings should be applied for each load component; (ii) at least three loadings should be applied between 0 % and 100 % load capacity; (iii) normal and side force loadings should be applied at the forward gage location, aft gage location, and the balance moment center; (iv) the balance should be used in "up" and "down" orientation to get positive and negative axial force loadings; (v) the constant normal and side force approaches should be used to get the rolling moment loadings; (vi) rolling moment loadings should be obtained for 0, 90, 180, and 270 degrees balance orientation. In addition, three different approaches are discussed in the paper that may be used to independently estimate the natural zeros, i.e., the gage outputs of the absolute load datum of the balance. These three approaches provide gage output differences that can be used to estimate the weight of both the metric and non-metric part of the balance. Data from the calibration of a six-component force balance will be used in the final manuscript of the paper to illustrate characteristics of the proposed baseline load schedule.
Task allocation in a distributed computing system
NASA Technical Reports Server (NTRS)
Seward, Walter D.
1987-01-01
A conceptual framework is examined for task allocation in distributed systems. Application and computing system parameters critical to task allocation decision processes are discussed. Task allocation techniques are addressed which focus on achieving a balance in the load distribution among the system's processors. Equalization of computing load among the processing elements is the goal. Examples of system performance are presented for specific applications. Both static and dynamic allocation of tasks are considered and system performance is evaluated using different task allocation methodologies.
NASA Astrophysics Data System (ADS)
Maaß, Heiko; Cakmak, Hüseyin Kemal; Bach, Felix; Mikut, Ralf; Harrabi, Aymen; Süß, Wolfgang; Jakob, Wilfried; Stucky, Karl-Uwe; Kühnapfel, Uwe G.; Hagenmeyer, Veit
2015-12-01
Power networks will change from a rigid hierarchic architecture to dynamic interconnected smart grids. In traditional power grids, the frequency is the controlled quantity to maintain supply and load power balance. Thereby, high rotating mass inertia ensures for stability. In the future, system stability will have to rely more on real-time measurements and sophisticated control, especially when integrating fluctuating renewable power sources or high-load consumers like electrical vehicles to the low-voltage distribution grid.
A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database
Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...
2013-01-01
Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order in amore » 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less
Efficient dynamic simulation for multiple chain robotic mechanisms
NASA Technical Reports Server (NTRS)
Lilly, Kathryn W.; Orin, David E.
1989-01-01
An efficient O(mN) algorithm for dynamic simulation of simple closed-chain robotic mechanisms is presented, where m is the number of chains, and N is the number of degrees of freedom for each chain. It is based on computation of the operational space inertia matrix (6 x 6) for each chain as seen by the body, load, or object. Also, computation of the chain dynamics, when opened at one end, is required, and the most efficient algorithm is used for this purpose. Parallel implementation of the dynamics for each chain results in an O(N) + O(log sub 2 m+1) algorithm.
NASA Astrophysics Data System (ADS)
Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto
2012-11-01
In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
Parallelization of Nullspace Algorithm for the computation of metabolic pathways
Jevremović, Dimitrije; Trinh, Cong T.; Srienc, Friedrich; Sosa, Carlos P.; Boley, Daniel
2011-01-01
Elementary mode analysis is a useful metabolic pathway analysis tool in understanding and analyzing cellular metabolism, since elementary modes can represent metabolic pathways with unique and minimal sets of enzyme-catalyzed reactions of a metabolic network under steady state conditions. However, computation of the elementary modes of a genome- scale metabolic network with 100–1000 reactions is very expensive and sometimes not feasible with the commonly used serial Nullspace Algorithm. In this work, we develop a distributed memory parallelization of the Nullspace Algorithm to handle efficiently the computation of the elementary modes of a large metabolic network. We give an implementation in C++ language with the support of MPI library functions for the parallel communication. Our proposed algorithm is accompanied with an analysis of the complexity and identification of major bottlenecks during computation of all possible pathways of a large metabolic network. The algorithm includes methods to achieve load balancing among the compute-nodes and specific communication patterns to reduce the communication overhead and improve efficiency. PMID:22058581
Knee Joint Loading during Gait in Healthy Controls and Individuals with Knee Osteoarthritis
Kumar, Deepak; Manal, Kurt T.; Rudolph, Katherine S.
2013-01-01
Objective People with knee osteoarthritis (OA) are thought to walk with high loads at the knee which are yet to be quantfied using modeling techniques that account for subject specific EMG patterns, kinematics and kinetics. The objective was to estimate medial and lateral loading for people with knee OA and controls using an approach that is sensitive to subject specific muscle activation patterns. Methods 16 OA and 12 control (C) subjects walked while kinematic, kinetic and EMG data were collected. Muscle forces were calculated using an EMG-Driven model and loading was calculated by balancing the external moments with internal muscle and contact forces Results OA subjects walked slower and had greater laxity, static and dynamic varus alignment, less flexion and greater knee adduction moment (KAM). Loading (normalized to body weight) was no different between the groups but OA subjects had greater absolute medial load than controls and maintained a greater %total load on the medial compartment. These patterns were associated with body mass, sagittal and frontal plane moments, static alignment and close to signficance for dynamic alignment. Lateral compartment unloading during mid-late stance was observed in 50% of OA subjects. Conclusions Loading for control subjects was similar to data from instrumented prostheses. Knee OA subjects had high medial contact loads in early stance and half of the OA cohort demonstared lateral compartment lift-off. Results suggest that interventions aimed at reducing body weight and dynamic malalignment might be effective in reducing medial compartment loading and establishing normal medio-lateral load sharing patterns. PMID:23182814
Energy balance in a Z pinch with suppressed Rayleigh-Taylor instability
NASA Astrophysics Data System (ADS)
Baksht, R. B.; Oreshkin, V. I.; Rousskikh, A. G.; Zhigalin, A. S.
2018-03-01
At present Z-pinch has evolved into a powerful plasma source of soft x-ray. This paper considers the energy balance in a radiating metallic gas-puff Z pinch. In this type of Z pinch, a power-law density distribution is realized, promoting suppression of Rayleigh-Taylor (RT) instabilities that occur in the pinch plasma during compression. The energy coupled into the pinch plasma, is determined as the difference between the total energy delivered to the load from the generator and the magnetic energy of the load inductance. A calibrated voltage divider and a Rogowski coil were used to determine the coupled energy and the load inductance. Time-gated optical imaging of the pinch plasma showed its stable compression up to the stagnation phase. The pinch implosion was simulated using a 1D two-temperature radiative magnetohydrodynamic code. Comparison of the experimental and simulation results has shown that the simulation adequately describes the pinch dynamics for conditions in which RT instability is suppressed. It has been found that the proportion of the Ohmic heating in the energy balance of a Z pinch with suppressed RT instability is determined by Spitzer resistance and makes no more than ten percent.
Magnetic suspension and balance systems (MSBSs)
NASA Technical Reports Server (NTRS)
Britcher, Colin P.; Kilgore, Robert A.
1987-01-01
The problems of wind tunnel testing are outlined, with attention given to the problems caused by mechanical support systems, such as support interference, dynamic-testing restrictions, and low productivity. The basic principles of magnetic suspension are highlighted, along with the history of magnetic suspension and balance systems. Roll control, size limitations, high angle of attack, reliability, position sensing, and calibration are discussed among the problems and limitations of the existing magnetic suspension and balance systems. Examples of the existing systems are presented, and design studies for future systems are outlined. Problems specific to large-scale magnetic suspension and balance systems, such as high model loads, requirements for high-power electromagnets, high-capacity power supplies, highly sophisticated control systems and position sensors, and high costs are assessed.
Zhou, Xiuze; Lin, Fan; Yang, Lvqing; Nie, Jing; Tan, Qian; Zeng, Wenhua; Zhang, Nian
2016-01-01
With the continuous expansion of the cloud computing platform scale and rapid growth of users and applications, how to efficiently use system resources to improve the overall performance of cloud computing has become a crucial issue. To address this issue, this paper proposes a method that uses an analytic hierarchy process group decision (AHPGD) to evaluate the load state of server nodes. Training was carried out by using a hybrid hierarchical genetic algorithm (HHGA) for optimizing a radial basis function neural network (RBFNN). The AHPGD makes the aggregative indicator of virtual machines in cloud, and become input parameters of predicted RBFNN. Also, this paper proposes a new dynamic load balancing scheduling algorithm combined with a weighted round-robin algorithm, which uses the predictive periodical load value of nodes based on AHPPGD and RBFNN optimized by HHGA, then calculates the corresponding weight values of nodes and makes constant updates. Meanwhile, it keeps the advantages and avoids the shortcomings of static weighted round-robin algorithm.
An implementation of a tree code on a SIMD, parallel computer
NASA Technical Reports Server (NTRS)
Olson, Kevin M.; Dorband, John E.
1994-01-01
We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
Hu, Weiming; Fan, Yabo; Xing, Junliang; Sun, Liang; Cai, Zhaoquan; Maybank, Stephen
2018-09-01
We construct a new efficient near duplicate image detection method using a hierarchical hash code learning neural network and load-balanced locality-sensitive hashing (LSH) indexing. We propose a deep constrained siamese hash coding neural network combined with deep feature learning. Our neural network is able to extract effective features for near duplicate image detection. The extracted features are used to construct a LSH-based index. We propose a load-balanced LSH method to produce load-balanced buckets in the hashing process. The load-balanced LSH significantly reduces the query time. Based on the proposed load-balanced LSH, we design an effective and feasible algorithm for near duplicate image detection. Extensive experiments on three benchmark data sets demonstrate the effectiveness of our deep siamese hash encoding network and load-balanced LSH.
Detection and Use of Load and Gage Output Repeats of Wind Tunnel Strain-Gage Balance Data
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2017-01-01
Criteria are discussed that may be used for the detection of load and gage output repeats of wind tunnel strain-gage balance data. First, empirical thresholds are introduced that help determine if the loads or electrical outputs of a pair of balance calibration or check load data points match. A threshold of 0.01 percent of the load capacity is suggested for the identification of matching loads. Similarly, a threshold of 0.1 microV/V is recommended for the identification of matching electrical outputs. Two examples for the use of load and output repeats are discussed to illustrate benefits of the implementation of a repeat point detection algorithm in a balance data analysis software package. The first example uses the suggested load threshold to identify repeat data points that may be used to compute pure errors of the balance loads. This type of analysis may reveal hidden data quality issues that could potentially be avoided by making calibration process improvements. The second example uses the electrical output threshold for the identification of balance fouling. Data from the calibration of a six-component force balance is used to illustrate the calculation of the pure error of the balance loads.
NASA Astrophysics Data System (ADS)
Rerucha, Simon; Sarbort, Martin; Hola, Miroslava; Cizek, Martin; Hucl, Vaclav; Cip, Ondrej; Lazar, Josef
2016-12-01
The homodyne detection with only a single detector represents a promising approach in the interferometric application which enables a significant reduction of the optical system complexity while preserving the fundamental resolution and dynamic range of the single frequency laser interferometers. We present the design, implementation and analysis of algorithmic methods for computational processing of the single-detector interference signal based on parallel pipelined processing suitable for real time implementation on a programmable hardware platform (e.g. the FPGA - Field Programmable Gate Arrays or the SoC - System on Chip). The algorithmic methods incorporate (a) the single detector signal (sine) scaling, filtering, demodulations and mixing necessary for the second (cosine) quadrature signal reconstruction followed by a conic section projection in Cartesian plane as well as (a) the phase unwrapping together with the goniometric and linear transformations needed for the scale linearization and periodic error correction. The digital computing scheme was designed for bandwidths up to tens of megahertz which would allow to measure the displacements at the velocities around half metre per second. The algorithmic methods were tested in real-time operation with a PC-based reference implementation that employed the advantage pipelined processing by balancing the computational load among multiple processor cores. The results indicate that the algorithmic methods are suitable for a wide range of applications [3] and that they are bringing the fringe counting interferometry closer to the industrial applications due to their optical setup simplicity and robustness, computational stability, scalability and also a cost-effectiveness.
Research on virtual network load balancing based on OpenFlow
NASA Astrophysics Data System (ADS)
Peng, Rong; Ding, Lei
2017-08-01
The Network based on OpenFlow technology separate the control module and data forwarding module. Global deployment of load balancing strategy through network view of control plane is fast and of high efficiency. This paper proposes a Weighted Round-Robin Scheduling algorithm for virtual network and a load balancing plan for server load based on OpenFlow. Load of service nodes and load balancing tasks distribution algorithm will be taken into account.
Extending substructure based iterative solvers to multiple load and repeated analyses
NASA Technical Reports Server (NTRS)
Farhat, Charbel
1993-01-01
Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Brian B; Purba, Victor; Jafarpour, Saber
Given that next-generation infrastructures will contain large numbers of grid-connected inverters and these interfaces will be satisfying a growing fraction of system load, it is imperative to analyze the impacts of power electronics on such systems. However, since each inverter model has a relatively large number of dynamic states, it would be impractical to execute complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the pointmore » of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. That is, we show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as an individual inverter in the paralleled system. Numerical simulations validate the reduced-order models.« less
Improved postural control after dynamic balance training in older overweight women.
Bellafiore, Marianna; Battaglia, Giuseppe; Bianco, Antonino; Paoli, Antonio; Farina, Felicia; Palma, Antonio
2011-01-01
Many studies have reported a greater frequency of falls among older women than men in conditions which stress balance. Previously, we found an improvement in static balance in older women with an increased support surface area and equal load redistribution on both feet, in response to a dynamic balance training protocol. The aim of the present study was to examine whether the same training program and body composition would have effects on the postural control of older overweight women. Ten healthy women (68.67 ± 5.50 yrs; 28.17 ± 3.35 BMI) participated in a five-week physical activity program. This included dynamic balance exercises, such as heel-to-toe walking in different directions, putting their hands on their hips, eyes open (EO) or closed (EC), with a tablet on their heads, going up and down one step, and walking on a mat. Postural stability was assessed before and after training with an optoelectronic platform and a uni-pedal balance performance test. Body composition of the trunk, upper limbs and lower limbs was measured by bio-impedance analysis. The mean speed (MS), medial-lateral MS (MS-x), anterior-posterior MS (MS-y), sway path (SP) and ellipse surface area (ESA) of the pressure center was reduced after training in older women. However, only MS, MS-x, MS-y and SP significantly decreased in bipodalic conditions with EO and MS-y also with EC (p<0.05). Instead, in monopodalic conditions, we found a significant reduction in the ESA of both feet with EO and EC. These data were associated with a significant increase in the lean mass of lower limbs and a higher number of participants who improved their ability to maintain unipedal static balance. Our dynamic balance training protocol appears to be feasible, safe and repeatable for older overweight women and to have positive effects in improving their lateral and anterior-posterior postural control, mainly acting on the visual and skeletal muscle components of the balance control system.
Phase change energy storage for solar dynamic power systems
NASA Technical Reports Server (NTRS)
Chiaramonte, F. P.; Taylor, J. D.
1992-01-01
This paper presents the results of a transient computer simulation that was developed to study phase change energy storage techniques for Space Station Freedom (SSF) solar dynamic (SD) power systems. Such SD systems may be used in future growth SSF configurations. Two solar dynamic options are considered in this paper: Brayton and Rankine. Model elements consist of a single node receiver and concentrator, and takes into account overall heat engine efficiency and power distribution characteristics. The simulation not only computes the energy stored in the receiver phase change material (PCM), but also the amount of the PCM required for various combinations of load demands and power system mission constraints. For a solar dynamic power system in low earth orbit, the amount of stored PCM energy is calculated by balancing the solar energy input and the energy consumed by the loads corrected by an overall system efficiency. The model assumes an average 75 kW SD power system load profile which is connected to user loads via dedicated power distribution channels. The model then calculates the stored energy in the receiver and subsequently estimates the quantity of PCM necessary to meet peaking and contingency requirements. The model can also be used to conduct trade studies on the performance of SD power systems using different storage materials.
Phase change energy storage for solar dynamic power systems
NASA Astrophysics Data System (ADS)
Chiaramonte, F. P.; Taylor, J. D.
This paper presents the results of a transient computer simulation that was developed to study phase change energy storage techniques for Space Station Freedom (SSF) solar dynamic (SD) power systems. Such SD systems may be used in future growth SSF configurations. Two solar dynamic options are considered in this paper: Brayton and Rankine. Model elements consist of a single node receiver and concentrator, and takes into account overall heat engine efficiency and power distribution characteristics. The simulation not only computes the energy stored in the receiver phase change material (PCM), but also the amount of the PCM required for various combinations of load demands and power system mission constraints. For a solar dynamic power system in low earth orbit, the amount of stored PCM energy is calculated by balancing the solar energy input and the energy consumed by the loads corrected by an overall system efficiency. The model assumes an average 75 kW SD power system load profile which is connected to user loads via dedicated power distribution channels. The model then calculates the stored energy in the receiver and subsequently estimates the quantity of PCM necessary to meet peaking and contingency requirements. The model can also be used to conduct trade studies on the performance of SD power systems using different storage materials.
Pre-Test Assessment of the Upper Bound of the Drag Coefficient Repeatability of a Wind Tunnel Model
NASA Technical Reports Server (NTRS)
Ulbrich, N.; L'Esperance, A.
2017-01-01
A new method is presented that computes a pre{test estimate of the upper bound of the drag coefficient repeatability of a wind tunnel model. This upper bound is a conservative estimate of the precision error of the drag coefficient. For clarity, precision error contributions associated with the measurement of the dynamic pressure are analyzed separately from those that are associated with the measurement of the aerodynamic loads. The upper bound is computed by using information about the model, the tunnel conditions, and the balance in combination with an estimate of the expected output variations as input. The model information consists of the reference area and an assumed angle of attack. The tunnel conditions are described by the Mach number and the total pressure or unit Reynolds number. The balance inputs are the partial derivatives of the axial and normal force with respect to all balance outputs. Finally, an empirical output variation of 1.0 microV/V is used to relate both random instrumentation and angle measurement errors to the precision error of the drag coefficient. Results of the analysis are reported by plotting the upper bound of the precision error versus the tunnel conditions. The analysis shows that the influence of the dynamic pressure measurement error on the precision error of the drag coefficient is often small when compared with the influence of errors that are associated with the load measurements. Consequently, the sensitivities of the axial and normal force gages of the balance have a significant influence on the overall magnitude of the drag coefficient's precision error. Therefore, results of the error analysis can be used for balance selection purposes as the drag prediction characteristics of balances of similar size and capacities can objectively be compared. Data from two wind tunnel models and three balances are used to illustrate the assessment of the precision error of the drag coefficient.
Soto-Varela, Andrés; Gayoso-Diz, Pilar; Rossi-Izquierdo, Marcos; Faraldo-García, Ana; Vaamonde-Sánchez-Andrade, Isabel; del-Río-Valeiras, María; Lirola-Delgado, Antonio; Santos-Pérez, Sofía
2015-12-01
Evaluate the effectiveness of vestibular rehabilitation (VR) to improve the balance in older people, assessed immediately afterwards. (a) To verify the maintenance of improvement of the balance achieved in the medium term (6-12 months). (b) To consider whether this improvement results in a reduction in the number of falls. (c) To compare among themselves the effectiveness of three different methods of VR in improving balance and to explore whether there are differences to achieve a reduction in the number of falls. Experimental study, single-centre, open, randomised (balanced blocks of patients) in four branches in parallel, in 220 elderly patients (over 64 years) with high risk of falls and a follow-up period of 12 months. Department of Otolaryngology of the University Hospital of Santiago. People over 64 years, fulfilling one of the following requirements: (a) At least one fall in the last year. (b) Take at least 16 s or require some support in perform the test "timed up and go". (c) A percentage of average balance in the sensory organisation test (SOT) in the dynamic posturography (CDP) <68%. (d) At least one fall in any of the conditions in the SOT of CDP. Three different protocols of VR. The percentage of average balance in the SOT in CDP. Secondary measures: time and supports in the test of "timed up and go", scores of the dynamic posturography and SwayStar system, and rate of falls.
The Potsdam Parallel Ice Sheet Model (PISM-PIK) - Part 1: Model description
NASA Astrophysics Data System (ADS)
Winkelmann, R.; Martin, M. A.; Haseloff, M.; Albrecht, T.; Bueler, E.; Khroulev, C.; Levermann, A.
2011-09-01
We present the Potsdam Parallel Ice Sheet Model (PISM-PIK), developed at the Potsdam Institute for Climate Impact Research to be used for simulations of large-scale ice sheet-shelf systems. It is derived from the Parallel Ice Sheet Model (Bueler and Brown, 2009). Velocities are calculated by superposition of two shallow stress balance approximations within the entire ice covered region: the shallow ice approximation (SIA) is dominant in grounded regions and accounts for shear deformation parallel to the geoid. The plug-flow type shallow shelf approximation (SSA) dominates the velocity field in ice shelf regions and serves as a basal sliding velocity in grounded regions. Ice streams can be identified diagnostically as regions with a significant contribution of membrane stresses to the local momentum balance. All lateral boundaries in PISM-PIK are free to evolve, including the grounding line and ice fronts. Ice shelf margins in particular are modeled using Neumann boundary conditions for the SSA equations, reflecting a hydrostatic stress imbalance along the vertical calving face. The ice front position is modeled using a subgrid-scale representation of calving front motion (Albrecht et al., 2011) and a physically-motivated calving law based on horizontal spreading rates. The model is tested in experiments from the Marine Ice Sheet Model Intercomparison Project (MISMIP). A dynamic equilibrium simulation of Antarctica under present-day conditions is presented in Martin et al. (2011).
A single-stage optical load-balanced switch for data centers.
Huang, Qirui; Yeo, Yong-Kee; Zhou, Luying
2012-10-22
Load balancing is an attractive technique to achieve maximum throughput and optimal resource utilization in large-scale switching systems. However current electronic load-balanced switches suffer from severe problems in implementation cost, power consumption and scaling. To overcome these problems, in this paper we propose a single-stage optical load-balanced switch architecture based on an arrayed waveguide grating router (AWGR) in conjunction with fast tunable lasers. By reuse of the fast tunable lasers, the switch achieves both functions of load balancing and switching through the AWGR. With this architecture, proof-of-concept experiments have been conducted to investigate the feasibility of the optical load-balanced switch and to examine its physical performance. Compared to three-stage load-balanced switches, the reported switch needs only half of optical devices such as tunable lasers and AWGRs, which can provide a cost-effective solution for future data centers.
NASA Astrophysics Data System (ADS)
Schwing, Alan Michael
For computational fluid dynamics, the governing equations are solved on a discretized domain of nodes, faces, and cells. The quality of the grid or mesh can be a driving source for error in the results. While refinement studies can help guide the creation of a mesh, grid quality is largely determined by user expertise and understanding of the flow physics. Adaptive mesh refinement is a technique for enriching the mesh during a simulation based on metrics for error, impact on important parameters, or location of important flow features. This can offload from the user some of the difficult and ambiguous decisions necessary when discretizing the domain. This work explores the implementation of adaptive mesh refinement in an implicit, unstructured, finite-volume solver. Consideration is made for applying modern computational techniques in the presence of hanging nodes and refined cells. The approach is developed to be independent of the flow solver in order to provide a path for augmenting existing codes. It is designed to be applicable for unsteady simulations and refinement and coarsening of the grid does not impact the conservatism of the underlying numerics. The effect on high-order numerical fluxes of fourth- and sixth-order are explored. Provided the criteria for refinement is appropriately selected, solutions obtained using adapted meshes have no additional error when compared to results obtained on traditional, unadapted meshes. In order to leverage large-scale computational resources common today, the methods are parallelized using MPI. Parallel performance is considered for several test problems in order to assess scalability of both adapted and unadapted grids. Dynamic repartitioning of the mesh during refinement is crucial for load balancing an evolving grid. Development of the methods outlined here depend on a dual-memory approach that is described in detail. Validation of the solver developed here against a number of motivating problems shows favorable comparisons across a range of regimes. Unsteady and steady applications are considered in both subsonic and supersonic flows. Inviscid and viscous simulations achieve similar results at a much reduced cost when employing dynamic mesh adaptation. Several techniques for guiding adaptation are compared. Detailed analysis of statistics from the instrumented solver enable understanding of the costs associated with adaptation. Adaptive mesh refinement shows promise for the test cases presented here. It can be considerably faster than using conventional grids and provides accurate results. The procedures for adapting the grid are light-weight enough to not require significant computational time and yield significant reductions in grid size.
Differential Draining of Parallel-Fed Propellant Tanks in Morpheus and Apollo Flight
NASA Technical Reports Server (NTRS)
Hurlbert, Eric; Guardado, Hector; Hernandez, Humberto; Desai, Pooja
2015-01-01
Parallel-fed propellant tanks are an advantageous configuration for many spacecraft. Parallel-fed tanks allow the center of gravity (cg) to be maintained over the engine(s), as opposed to serial-fed propellant tanks which result in a cg shift as propellants are drained from tank one tank first opposite another. Parallel-fed tanks also allow for tank isolation if that is needed. Parallel tanks and feed systems have been used in several past vehicles including the Apollo Lunar Module. The design of the feedsystem connecting the parallel tank is critical to maintain balance in the propellant tanks. The design must account for and minimize the effect of manufacturing variations that could cause delta-p or mass flowrate differences, which would lead to propellant imbalance. Other sources of differential draining will be discussed. Fortunately, physics provides some self-correcting behaviors that tend to equalize any initial imbalance. The question concerning whether or not active control of propellant in each tank is required or can be avoided or not is also important to answer. In order to provide data on parallel-fed tanks and differential draining in flight for cryogenic propellants (as well as any other fluid), a vertical test bed (flying lander) for terrestrial use was employed. The Morpheus vertical test bed is a parallel-fed propellant tank system that uses passive design to keep the propellant tanks balanced. The system is operated in blow down. The Morpheus vehicle was instrumented with a capacitance level sensor in each propellant tank in order to measure the draining of propellants in over 34 tethered and 12 free flights. Morpheus did experience an approximately 20 lb/m imbalance in one pair of tanks. The cause of this imbalance will be discussed. This paper discusses the analysis, design, flight simulation vehicle dynamic modeling, and flight test of the Morpheus parallel-fed propellant. The Apollo LEM data is also examined in this summary report of the flight data.
An Evaluation of the HVAC Load Potential for Providing Load Balancing Service
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Ning
This paper investigates the potential of providing aggregated intra-hour load balancing services using heating, ventilating, and air-conditioning (HVAC) systems. A direct-load control algorithm is presented. A temperature-priority-list method is used to dispatch the HVAC loads optimally to maintain consumer-desired indoor temperatures and load diversity. Realistic intra-hour load balancing signals were used to evaluate the operational characteristics of the HVAC load under different outdoor temperature profiles and different indoor temperature settings. The number of HVAC units needed is also investigated. Modeling results suggest that the number of HVACs needed to provide a {+-}1-MW load balancing service 24 hours a day variesmore » significantly with baseline settings, high and low temperature settings, and the outdoor temperatures. The results demonstrate that the intra-hour load balancing service provided by HVAC loads meet the performance requirements and can become a major source of revenue for load-serving entities where the smart grid infrastructure enables direct load control over the HAVC loads.« less
Influence of Primary Gage Sensitivities on the Convergence of Balance Load Iterations
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred
2012-01-01
The connection between the convergence of wind tunnel balance load iterations and the existence of the primary gage sensitivities of a balance is discussed. First, basic elements of two load iteration equations that the iterative method uses in combination with results of a calibration data analysis for the prediction of balance loads are reviewed. Then, the connection between the primary gage sensitivities, the load format, the gage output format, and the convergence characteristics of the load iteration equation choices is investigated. A new criterion is also introduced that may be used to objectively determine if the primary gage sensitivity of a balance gage exists. Then, it is shown that both load iteration equations will converge as long as a suitable regression model is used for the analysis of the balance calibration data, the combined influence of non linear terms of the regression model is very small, and the primary gage sensitivities of all balance gages exist. The last requirement is fulfilled, e.g., if force balance calibration data is analyzed in force balance format. Finally, it is demonstrated that only one of the two load iteration equation choices, i.e., the iteration equation used by the primary load iteration method, converges if one or more primary gage sensitivities are missing. This situation may occur, e.g., if force balance calibration data is analyzed in direct read format using the original gage outputs. Data from the calibration of a six component force balance is used to illustrate the connection between the convergence of the load iteration equation choices and the existence of the primary gage sensitivities.
Collectively loading programs in a multiple program multiple data environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
NASA Astrophysics Data System (ADS)
Hager, P.; Czupalla, M.; Walter, U.
2010-11-01
In this paper we report on the development of a dynamic MATLAB SIMULINK® model for the water and electrolyte balance inside the human body. This model is part of an environmentally sensitive dynamic human model for the optimization and verification of environmental control and life support systems (ECLSS) in space flight applications. An ECLSS provides all vital supplies for supporting human life on board a spacecraft. As human space flight today focuses on medium- to long-term missions, the strategy in ECLSS is shifting to closed loop systems. For these systems the dynamic stability and function over long duration are essential. However, the only evaluation and rating methods for ECLSS up to now are either expensive trial and error breadboarding strategies or static and semi-dynamic simulations. In order to overcome this mismatch the Exploration Group at Technische Universität München (TUM) is developing a dynamic environmental simulation, the "Virtual Habitat" (V-HAB). The central element of this simulation is the dynamic and environmentally sensitive human model. The water subsystem simulation of the human model discussed in this paper is of vital importance for the efficiency of possible ECLSS optimizations, as an over- or under-scaled water subsystem would have an adverse effect on the overall mass budget. On the other hand water has a pivotal role in the human organism. Water accounts for about 60% of the total body mass and is educt and product of numerous metabolic reactions. It is a transport medium for solutes and, due to its high evaporation enthalpy, provides the most potent medium for heat load dissipation. In a system engineering approach the human water balance was worked out by simulating the human body's subsystems and their interactions. The body fluids were assumed to reside in three compartments: blood plasma, interstitial fluid and intracellular fluid. In addition, the active and passive transport of water and solutes between those compartments was modeled dynamically. A kidney model regulates the electrolyte concentration in body fluids (osmolality) in narrow confines and a thirst mechanism models the urge to ingest water. A controlled exchange of water and electrolytes with other human subsystems, as well as with the environment, is implemented. Finally, the changes in body composition due to muscle growth are accounted for. The outcome of this is a dynamic water and electrolyte balance, which is capable of representing body reactions like thirst and headaches, as well as heat stroke and collapse, as a response to its work load and environment.
NASA Technical Reports Server (NTRS)
Schuster, David M.; Panda, Jayanta; Ross, James C.; Roozeboom, Nettie H.; Burnside, Nathan J.; Ngo, Christina L.; Kumagai, Hiro; Sellers, Marvin; Powell, Jessica M.; Sekula, Martin K.;
2016-01-01
This NESC assessment examined the accuracy of estimating buffet loads on in-line launch vehicles without booster attachments using sparse unsteady pressure measurements. The buffet loads computed using sparse sensor data were compared with estimates derived using measurements with much higher spatial resolution. The current method for estimating launch vehicle buffet loads is through wind tunnel testing of models with approximately 400 unsteady pressure transducers. Even with this relatively large number of sensors, the coverage can be insufficient to provide reliable integrated unsteady loads on vehicles. In general, sparse sensor spacing requires the use of coherence-length-based corrections in the azimuthal and axial directions to integrate the unsteady pressures and obtain reasonable estimates of the buffet loads. Coherence corrections have been used to estimate buffet loads for a variety of launch vehicles with the assumption methodology results in reasonably conservative loads. For the Space Launch System (SLS), the first estimates of buffet loads exceeded the limits of the vehicle structure, so additional tests with higher sensor density were conducted to better define the buffet loads and possibly avoid expensive modifications to the vehicle design. Without the additional tests and improvements to the coherence-length analysis methods, there would have been significant impacts to the vehicle weight, cost, and schedule. If the load estimates turn out to be too low, there is significant risk of structural failure of the vehicle. This assessment used a combination of unsteady pressure-sensitive paint (uPSP), unsteady pressure transducers, and a dynamic force and moment balance to investigate the integration schemes used with limited unsteady pressure data by comparing them with direct integration of extremely dense fluctuating pressure measurements. An outfall of the assessment was to evaluate the potential of using the emerging uPSP technique in a production test environment for future launch vehicles. The results show that modifications to the current technique can improve the accuracy of buffet estimates. More importantly, the uPSP worked remarkably well and, with improvements to the frequency response, sensitivity, and productivity, will provide an enhanced method for measuring wind tunnel buffet forcing functions (BFFs).
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Brian B; Purba, Victor; Jafarpour, Saber
Next-generation power networks will contain large numbers of grid-connected inverters satisfying a significant fraction of system load. Since each inverter model has a relatively large number of dynamic states, it is impractical to analyze complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the point of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loopmore » for grid synchronization. We outline a structure-preserving reduced-order inverter model with lumped parameters for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. We show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as any individual inverter in the system. Numerical simulations validate the reduced-order model.« less
Accelerating Subsurface Transport Simulation on Heterogeneous Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Gawande, Nitin A.; Tumeo, Antonino
Reactive transport numerical models simulate chemical and microbiological reactions that occur along a flowpath. These models have to compute reactions for a large number of locations. They solve the set of ordinary differential equations (ODEs) that describes the reaction for each location through the Newton-Raphson technique. This technique involves computing a Jacobian matrix and a residual vector for each set of equation, and then solving iteratively the linearized system by performing Gaussian Elimination and LU decomposition until convergence. STOMP, a well known subsurface flow simulation tool, employs matrices with sizes in the order of 100x100 elements and, for numerical accuracy,more » LU factorization with full pivoting instead of the faster partial pivoting. Modern high performance computing systems are heterogeneous machines whose nodes integrate both CPUs and GPUs, exposing unprecedented amounts of parallelism. To exploit all their computational power, applications must use both the types of processing elements. For the case of subsurface flow simulation, this mainly requires implementing efficient batched LU-based solvers and identifying efficient solutions for enabling load balancing among the different processors of the system. In this paper we discuss two approaches that allows scaling STOMP's performance on heterogeneous clusters. We initially identify the challenges in implementing batched LU-based solvers for small matrices on GPUs, and propose an implementation that fulfills STOMP's requirements. We compare this implementation to other existing solutions. Then, we combine the batched GPU solver with an OpenMP-based CPU solver, and present an adaptive load balancer that dynamically distributes the linear systems to solve between the two components inside a node. We show how these approaches, integrated into the full application, provide speed ups from 6 to 7 times on large problems, executed on up to 16 nodes of a cluster with two AMD Opteron 6272 and a Tesla M2090 per node.« less
Team-based work and work system balance in the context of agile manufacturing.
Yauch, Charlene A
2007-01-01
Manufacturing agility is the ability to prosper in an environment characterized by constant and unpredictable change. The purpose of this paper is to analyze team attributes necessary to facilitate agile manufacturing, and using Balance Theory as a framework, it evaluates the potential positive and negative impacts related to these team attributes that could alter the balance of work system elements and resulting "stress load" experienced by persons working on agile teams. Teams operating within the context of agile manufacturing are characterized as multifunctional, dynamic, cooperative, and virtual. A review of the literature relevant to each of these attributes is provided, as well as suggestions for future research.
Crashworthiness Design of the Shear Bolts for Light Collision Safety Devices
NASA Astrophysics Data System (ADS)
Kim, Jin Sung; Huh, Hoon; Kwon, Tae Soo
This paper introduces the jig set for the crash test and the crash test results of shear bolts which are designed to fail at train crash conditions. The tension and shear bolts are attached to Light Collision Safety Devices(LCSD) as a mechanical fuse when tension and shear bolts reach their failure load designed. The kinetic energy due to the crash is absorbed by the secondary energy absorbing device after LCSD are detached from the main body by the fracture of shear bolts. A single shear bolt was designed to fail at the load of 250 kN. The jig set designed to convert a compressive loading to a shear loading was installed to the high speed crash tester for dynamic shear tests. Two strain gauges were attached at the parallel section of the jig set to measure the load responses acting on the shear bolts. Crash tests were performed with a carrier whose mass was 250 kg and the initial speed of the carrier was 9 m/sec. From the quasi-static and dynamic experiments as well as the numerical analysis, the capacity of the shear bolts were accurately predicted for the crashworthiness design.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji
A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA andmore » MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
A Parallel Pipelined Renderer for the Time-Varying Volume Data
NASA Technical Reports Server (NTRS)
Chiueh, Tzi-Cker; Ma, Kwan-Liu
1997-01-01
This paper presents a strategy for efficiently rendering time-varying volume data sets on a distributed-memory parallel computer. Time-varying volume data take large storage space and visualizing them requires reading large files continuously or periodically throughout the course of the visualization process. Instead of using all the processors to collectively render one volume at a time, a pipelined rendering process is formed by partitioning processors into groups to render multiple volumes concurrently. In this way, the overall rendering time may be greatly reduced because the pipelined rendering tasks are overlapped with the I/O required to load each volume into a group of processors; moreover, parallelization overhead may be reduced as a result of partitioning the processors. We modify an existing parallel volume renderer to exploit various levels of rendering parallelism and to study how the partitioning of processors may lead to optimal rendering performance. Two factors which are important to the overall execution time are re-source utilization efficiency and pipeline startup latency. The optimal partitioning configuration is the one that balances these two factors. Tests on Intel Paragon computers show that in general optimal partitionings do exist for a given rendering task and result in 40-50% saving in overall rendering time.
Visualization Co-Processing of a CFD Simulation
NASA Technical Reports Server (NTRS)
Vaziri, Arsi
1999-01-01
OVERFLOW, a widely used CFD simulation code, is combined with a visualization system, pV3, to experiment with an environment for simulation/visualization co-processing on a SGI Origin 2000 computer(O2K) system. The shared memory version of the solver is used with the O2K 'pfa' preprocessor invoked to automatically discover parallelism in the source code. No other explicit parallelism is enabled. In order to study the scaling and performance of the visualization co-processing system, sample runs are made with different processor groups in the range of 1 to 254 processors. The data exchange between the visualization system and the simulation system is rapid enough for user interactivity when the problem size is small. This shared memory version of OVERFLOW, with minimal parallelization, does not scale well to an increasing number of available processors. The visualization task takes about 18 to 30% of the total processing time and does not appear to be a major contributor to the poor scaling. Improper load balancing and inter-processor communication overhead are contributors to this poor performance. Work is in progress which is aimed at obtaining improved parallel performance of the solver and removing the limitations of serial data transfer to pV3 by examining various parallelization/communication strategies, including the use of the explicit message passing.
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Lesoinne, Michel
1993-01-01
Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
NASA's Functional Task Test: Providing Information for an Integrated Countermeasure System
NASA Technical Reports Server (NTRS)
Bloomberg, J. J.; Feiveson, A. H.; Laurie, S. S.; Lee, S. M. C.; Mulavara, A. P.; Peters, B. T.; Platts, S. H.; Ploutz-Snyder, L. L.; Reschke, M. F.; Ryder, J. W.;
2015-01-01
Exposure to the microgravity conditions of spaceflight causes astronauts to experience alterations in multiple physiological systems. These physiological changes include sensorimotor disturbances, cardiovascular deconditioning, and loss of muscle mass and strength. Some or all of these changes might affect the ability of crewmembers to perform critical mission tasks immediately after landing on a planetary surface. The goals of the Functional Task Test (FTT) study were to determine the effects of spaceflight on functional tests that are representative of critical exploration mission tasks and to identify the key physiological factors that contribute to decrements in performance. The FTT was comprised of seven functional tests and a corresponding set of interdisciplinary physiological measures targeting the sensorimotor, cardiovascular and muscular changes associated with exposure to spaceflight. Both Shuttle and ISS crewmembers participated in this study. Additionally, we conducted a supporting study using the FTT protocol on subjects before and after 70 days of 6? head-down bed rest. The bed rest analog allowed us to investigate the impact of body unloading in isolation on both functional tasks and on the underlying physiological factors that lead to decrements in performance, and then to compare them with the results obtained in our spaceflight study. Spaceflight data were collected on three sessions before flight, on landing day (Shuttle only) and 1, 6 and 30 days after landing. Bed rest subjects were tested three times before bed rest and immediately after getting up from bed rest as well as 1, 6, and 12 days after reambulation. We have shown that for Shuttle, ISS and bed rest subjects, functional tasks requiring a greater demand for dynamic control of postural equilibrium (i.e. fall recovery, seat egress/obstacle avoidance during walking, object translation, jump down) showed the greatest decrement in performance. Functional tests with reduced requirements for postural stability (i.e. hatch opening, ladder climb, manual manipulation of objects and tool use) showed little reduction in performance. These changes in functional performance were paralleled by similar decrements in sensorimotor tests designed to specifically assess postural equilibrium and dynamic gait control. Bed rest subjects experienced similar deficits both in functional tests with balance challenges and in sensorimotor tests designed to evaluate postural and gait control as spaceflight subjects indicating that body support unloading experienced during spaceflight plays a central role in post-flight alteration of functional task performance. To determine how differences in body-support loading experienced during in-flight treadmill exercise affect postflight functional performance, the loading history for each subject during in-flight treadmill (T2) exercise was correlated with postflight measures of performance. ISS crewmembers who walked on the treadmill with higher pull-down loads had enhanced post-flight performance on tests requiring mobility. Taken together the spaceflight and bed rest data point to the importance of supplementing inflight exercise countermeasures with balance and sensorimotor adaptability training. These data also support the notion that inflight treadmill exercise performed with higher body loading provides sensorimotor benefits leading to improved performance on functional tasks that require dynamic postural stability and mobility.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas
The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
A Unique Power System For The ISS Fluids And Combustion Facility
NASA Technical Reports Server (NTRS)
Fox, David A.; Poljak, Mark D.
2001-01-01
Unique power control technology has been incorporated into an electrical power control unit (EPCU) for the Fluids and Combustion Facility (FCF). The objective is to maximize science throughput by providing a flexible power system that is easily reconfigured by the science payload. Electrical power is at a premium on the International Space Station (ISS). The EPCU utilizes advanced power management techniques to maximize the power available to the FCF experiments. The EPCU architecture enables dynamic allocation of power from two ISS power channels for experiments. Because of the unique flexible remote power controller (FRPC) design, power channels can be paralleled while maintaining balanced load sharing between the channels. With an integrated and redundant architecture, the EPCU can tolerate multiple faults and still maintain FCF operation. It is important to take full advantage of the EPCU functionality. The EPCU acts as a buffer between the experimenter and the ISS power system with all its complex requirements. However, FCF science payload developers will still need to follow guidelines when designing the FCF payload power system. This is necessary to ensure power system stability, fault coordination, electromagnetic compatibility, and maximum use of available power for gathering scientific data.
NASA Astrophysics Data System (ADS)
Nardi, Albert; Idiart, Andrés; Trinchero, Paolo; de Vries, Luis Manuel; Molinero, Jorge
2014-08-01
This paper presents the development, verification and application of an efficient interface, denoted as iCP, which couples two standalone simulation programs: the general purpose Finite Element framework COMSOL Multiphysics® and the geochemical simulator PHREEQC. The main goal of the interface is to maximize the synergies between the aforementioned codes, providing a numerical platform that can efficiently simulate a wide number of multiphysics problems coupled with geochemistry. iCP is written in Java and uses the IPhreeqc C++ dynamic library and the COMSOL Java-API. Given the large computational requirements of the aforementioned coupled models, special emphasis has been placed on numerical robustness and efficiency. To this end, the geochemical reactions are solved in parallel by balancing the computational load over multiple threads. First, a benchmark exercise is used to test the reliability of iCP regarding flow and reactive transport. Then, a large scale thermo-hydro-chemical (THC) problem is solved to show the code capabilities. The results of the verification exercise are successfully compared with those obtained using PHREEQC and the application case demonstrates the scalability of a large scale model, at least up to 32 threads.
Assessment of the Uniqueness of Wind Tunnel Strain-Gage Balance Load Predictions
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2016-01-01
A new test was developed to assess the uniqueness of wind tunnel strain-gage balance load predictions that are obtained from regression models of calibration data. The test helps balance users to gain confidence in load predictions of non-traditional balance designs. It also makes it possible to better evaluate load predictions of traditional balances that are not used as originally intended. The test works for both the Iterative and Non-Iterative Methods that are used in the aerospace testing community for the prediction of balance loads. It is based on the hypothesis that the total number of independently applied balance load components must always match the total number of independently measured bridge outputs or bridge output combinations. This hypothesis is supported by a control volume analysis of the inputs and outputs of a strain-gage balance. It is concluded from the control volume analysis that the loads and bridge outputs of a balance calibration data set must separately be tested for linear independence because it cannot always be guaranteed that a linearly independent load component set will result in linearly independent bridge output measurements. Simple linear math models for the loads and bridge outputs in combination with the variance inflation factor are used to test for linear independence. A highly unique and reversible mapping between the applied load component set and the measured bridge output set is guaranteed to exist if the maximum variance inflation factor of both sets is less than the literature recommended threshold of five. Data from the calibration of a six{component force balance is used to illustrate the application of the new test to real-world data.
3-D modeling of ductile tearing using finite elements: Computational aspects and techniques
NASA Astrophysics Data System (ADS)
Gullerud, Arne Stewart
This research focuses on the development and application of computational tools to perform large-scale, 3-D modeling of ductile tearing in engineering components under quasi-static to mild loading rates. Two standard models for ductile tearing---the computational cell methodology and crack growth controlled by the crack tip opening angle (CTOA)---are described and their 3-D implementations are explored. For the computational cell methodology, quantification of the effects of several numerical issues---computational load step size, procedures for force release after cell deletion, and the porosity for cell deletion---enables construction of computational algorithms to remove the dependence of predicted crack growth on these issues. This work also describes two extensions of the CTOA approach into 3-D: a general 3-D method and a constant front technique. Analyses compare the characteristics of the extensions, and a validation study explores the ability of the constant front extension to predict crack growth in thin aluminum test specimens over a range of specimen geometries, absolutes sizes, and levels of out-of-plane constraint. To provide a computational framework suitable for the solution of these problems, this work also describes the parallel implementation of a nonlinear, implicit finite element code. The implementation employs an explicit message-passing approach using the MPI standard to maintain portability, a domain decomposition of element data to provide parallel execution, and a master-worker organization of the computational processes to enhance future extensibility. A linear preconditioned conjugate gradient (LPCG) solver serves as the core of the solution process. The parallel LPCG solver utilizes an element-by-element (EBE) structure of the computations to permit a dual-level decomposition of the element data: domain decomposition of the mesh provides efficient coarse-grain parallel execution, while decomposition of the domains into blocks of similar elements (same type, constitutive model, etc.) provides fine-grain parallel computation on each processor. A major focus of the LPCG solver is a new implementation of the Hughes-Winget element-by-element (HW) preconditioner. The implementation employs a weighted dependency graph combined with a new coloring algorithm to provide load-balanced scheduling for the preconditioner and overlapped communication/computation. This approach enables efficient parallel application of the HW preconditioner for arbitrary unstructured meshes.
Static versus dynamic fracturing in shallow carbonate fault zones
NASA Astrophysics Data System (ADS)
Fondriest, M.; Doan, M. L.; Aben, F. M.; Fusseis, F.; Mitchell, T. M.; Di Toro, G.
2015-12-01
Moderate to large earthquakes often nucleate within and propagate through carbonates in the shallow crust, therefore several field and experimental studies were recently aimed to constrain earthquake-related deformation processes within carbonate fault rocks. In particular, the occurrence of thick belts (10-100s m) of low-strain fault-related breccias (average size of rock fragments >1 cm), which is relatively common within carbonate damage zones, was generally interpreted as resulting from the quasi-static growth of fault zones rather than from the cumulative effect of multiple earthquake ruptures. Here we report the occurrence of up to hundreds of meters thick belts of intensely fragmented dolostones along the major transpressive Foiana Fault Zone (Italian Southern Alps) which was exhumed from < 2 km depth. Such dolostones are reduced into fragments ranging from few centimeters down to few millimeters in size with ultrafine-grained layers in proximity to the principal slip zones. Preservation of the original bedding indicates a lack of significant shear strain in the fragmented dolostones which seem to have been shattered in situ. To investigate the origin of the in-situ shattered rocks, the host dolostones were deformed in uniaxial compression both under quasi-static loading (strain rate ~10-3 s-1) and dynamic loading (strain rate >50 s-1). Dolostones deformed up to failure under low-strain rate were affected by single to multiple discrete (i.e. not interconnected) extensional fractures sub-parallel to the loading direction. Dolostones deformed under high-strain rate were shattered above a strain rate threshold of ~200 s-1(strain >1.2%) while they were split in few fragments or were macroscopically intact for lower strain rates. Experimentally shattered dolostones were reduced into a non-cohesive material with most rock fragments a few millimeters in size and elongated parallel to the loading direction. Fracture networks were investigated by X-ray microtomography showing that low- and high-strain rate damage patterns are different with the latter being similar to that of natural in-situ shattered dolostones. In-situ shattered dolostones are thus interpreted as the product of off-fault dynamic stress wave loading and can potentially be used to constrain coseismic energy release in fault zones.
NASA Astrophysics Data System (ADS)
Effenberg, F.; Feng, Y.; Schmitz, O.; Frerichs, H.; Bozhenkov, S. A.; Hölbe, H.; König, R.; Krychowiak, M.; Pedersen, T. Sunn; Reiter, D.; Stephey, L.; W7-X Team
2017-03-01
The results of a first systematic assessment of plasma edge transport processes for the limiter startup configuration at Wendelstein 7-X are presented. This includes an investigation of transport from intrinsic and externally injected impurities and their impact on the power balance and limiter heat fluxes. The fully 3D coupled plasma fluid and kinetic neutral transport Monte Carlo code EMC3-EIRENE is used. The analysis of the magnetic topology shows that the poloidally and toroidally localized limiters cause a 3D helical scrape-off layer (SOL) consisting of magnetic flux tubes of three different connection lengths L C. The transport in the helical SOL is governed by L C as topological scale length for the parallel plasma loss channel to the limiters. A clear modulation of the plasma pressure with L C is seen. The helical flux tube topology results in counter streaming sonic plasma flows. The heterogeneous SOL plasma structure yields an uneven limiter heat load distribution with localized peaking. Assuming spatially constant anomalous transport coefficients, increasing plasma density yields a reduction of the maximum peak heat loads from 12 MWm-2 to 7.5 MWm-2 and a broadening of the deposited heat fluxes. The impact of impurities on the limiter heat loads is studied by assuming intrinsic carbon impurities eroded from the limiter surfaces with a gross chemical sputtering yield of 2 % . The resulting radiative losses account for less than 10% of the input power in the power balance with marginal impact on the limiter heat loads. It is shown that a significant mitigation of peak heat loads, 40-50%, can be achieved with controlled impurity seeding with nitrogen and neon, which is a method of particular interest for the later island divertor phase.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boman, Erik G.; Catalyurek, Umit V.; Chevalier, Cedric
2015-01-16
This final progress report summarizes the work accomplished at the Combinatorial Scientific Computing and Petascale Simulations Institute. We developed Zoltan, a parallel mesh partitioning library that made use of accurate hypergraph models to provide load balancing in mesh-based computations. We developed several graph coloring algorithms for computing Jacobian and Hessian matrices and organized them into a software package called ColPack. We developed parallel algorithms for graph coloring and graph matching problems, and also designed multi-scale graph algorithms. Three PhD students graduated, six more are continuing their PhD studies, and four postdoctoral scholars were advised. Six of these students and Fellowsmore » have joined DOE Labs (Sandia, Berkeley), as staff scientists or as postdoctoral scientists. We also organized the SIAM Workshop on Combinatorial Scientific Computing (CSC) in 2007, 2009, and 2011 to continue to foster the CSC community.« less
A Comparison of Three Programming Models for Adaptive Applications
NASA Technical Reports Server (NTRS)
Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)
2000-01-01
We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.
NASA Technical Reports Server (NTRS)
Mccormick, S.; Quinlan, D.
1989-01-01
The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids (global and local) to provide adaptive resolution and fast solution of PDEs. Like all such methods, it offers parallelism by using possibly many disconnected patches per level, but is hindered by the need to handle these levels sequentially. The finest levels must therefore wait for processing to be essentially completed on all the coarser ones. A recently developed asynchronous version of FAC, called AFAC, completely eliminates this bottleneck to parallelism. This paper describes timing results for AFAC, coupled with a simple load balancing scheme, applied to the solution of elliptic PDEs on an Intel iPSC hypercube. These tests include performance of certain processes necessary in adaptive methods, including moving grids and changing refinement. A companion paper reports on numerical and analytical results for estimating convergence factors of AFAC applied to very large scale examples.
NASA Technical Reports Server (NTRS)
Hailperin, M.
1993-01-01
This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that the authors' techniques allow more accurate estimation of the global system loading, resulting in fewer object migrations than local methods. The authors' method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive load-balancing methods. Results from a preliminary analysis of another system and from simulation with a synthetic load provide some evidence of more general applicability.
Computational strategies for three-dimensional flow simulations on distributed computer systems
NASA Technical Reports Server (NTRS)
Sankar, Lakshmi N.; Weed, Richard A.
1995-01-01
This research effort is directed towards an examination of issues involved in porting large computational fluid dynamics codes in use within the industry to a distributed computing environment. This effort addresses strategies for implementing the distributed computing in a device independent fashion and load balancing. A flow solver called TEAM presently in use at Lockheed Aeronautical Systems Company was acquired to start this effort. The following tasks were completed: (1) The TEAM code was ported to a number of distributed computing platforms including a cluster of HP workstations located in the School of Aerospace Engineering at Georgia Tech; a cluster of DEC Alpha Workstations in the Graphics visualization lab located at Georgia Tech; a cluster of SGI workstations located at NASA Ames Research Center; and an IBM SP-2 system located at NASA ARC. (2) A number of communication strategies were implemented. Specifically, the manager-worker strategy and the worker-worker strategy were tested. (3) A variety of load balancing strategies were investigated. Specifically, the static load balancing, task queue balancing and the Crutchfield algorithm were coded and evaluated. (4) The classical explicit Runge-Kutta scheme in the TEAM solver was replaced with an LU implicit scheme. And (5) the implicit TEAM-PVM solver was extensively validated through studies of unsteady transonic flow over an F-5 wing, undergoing combined bending and torsional motion. These investigations are documented in extensive detail in the dissertation, 'Computational Strategies for Three-Dimensional Flow Simulations on Distributed Computing Systems', enclosed as an appendix.
Computational strategies for three-dimensional flow simulations on distributed computer systems
NASA Astrophysics Data System (ADS)
Sankar, Lakshmi N.; Weed, Richard A.
1995-08-01
This research effort is directed towards an examination of issues involved in porting large computational fluid dynamics codes in use within the industry to a distributed computing environment. This effort addresses strategies for implementing the distributed computing in a device independent fashion and load balancing. A flow solver called TEAM presently in use at Lockheed Aeronautical Systems Company was acquired to start this effort. The following tasks were completed: (1) The TEAM code was ported to a number of distributed computing platforms including a cluster of HP workstations located in the School of Aerospace Engineering at Georgia Tech; a cluster of DEC Alpha Workstations in the Graphics visualization lab located at Georgia Tech; a cluster of SGI workstations located at NASA Ames Research Center; and an IBM SP-2 system located at NASA ARC. (2) A number of communication strategies were implemented. Specifically, the manager-worker strategy and the worker-worker strategy were tested. (3) A variety of load balancing strategies were investigated. Specifically, the static load balancing, task queue balancing and the Crutchfield algorithm were coded and evaluated. (4) The classical explicit Runge-Kutta scheme in the TEAM solver was replaced with an LU implicit scheme. And (5) the implicit TEAM-PVM solver was extensively validated through studies of unsteady transonic flow over an F-5 wing, undergoing combined bending and torsional motion. These investigations are documented in extensive detail in the dissertation, 'Computational Strategies for Three-Dimensional Flow Simulations on Distributed Computing Systems', enclosed as an appendix.
Detection of Unexpected High Correlations between Balance Calibration Loads and Load Residuals
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Volden, T.
2014-01-01
An algorithm was developed for the assessment of strain-gage balance calibration data that makes it possible to systematically investigate potential sources of unexpected high correlations between calibration load residuals and applied calibration loads. The algorithm investigates correlations on a load series by load series basis. The linear correlation coefficient is used to quantify the correlations. It is computed for all possible pairs of calibration load residuals and applied calibration loads that can be constructed for the given balance calibration data set. An unexpected high correlation between a load residual and a load is detected if three conditions are met: (i) the absolute value of the correlation coefficient of a residual/load pair exceeds 0.95; (ii) the maximum of the absolute values of the residuals of a load series exceeds 0.25 % of the load capacity; (iii) the load component of the load series is intentionally applied. Data from a baseline calibration of a six-component force balance is used to illustrate the application of the detection algorithm to a real-world data set. This analysis also showed that the detection algorithm can identify load alignment errors as long as repeat load series are contained in the balance calibration data set that do not suffer from load alignment problems.
Pollock, C L; Ivanova, T D; Hunt, M A; Garland, S J
2014-10-01
There is limited investigation of the interaction between motor unit recruitment and rate coding for modulating force during standing or responding to external perturbations. Fifty-seven motor units were recorded from the medial gastrocnemius muscle with intramuscular electrodes in response to external perturbations in standing. Anteriorly directed perturbations were generated by applying loads in 0.45-kg increments at the pelvis every 25-40 s until 2.25 kg was maintained. Motor unit firing rate was calculated for the initial recruitment load and all subsequent loads during two epochs: 1) dynamic response to perturbation directly following each load drop and 2) maintenance of steady state between perturbations. Joint kinematics and surface electromyography (EMG) from lower extremities and force platform measurements were assessed. Application of the external loads resulted in a significant forward progression of the anterior-posterior center of pressure (AP COP) that was accompanied by modest changes in joint angles (<3°). Surface EMG increased more in medial gastrocnemius than in the other recorded muscles. At initial recruitment, motor unit firing rate immediately after the load drop was significantly lower than during subsequent load drops or during the steady state at the same load. There was a modest increase in motor unit firing rate immediately after the load drop on subsequent load drops associated with regaining balance. There was no effect of maintaining balance with increased load and forward progression of the AP COP on steady-state motor unit firing rate. The medial gastrocnemius utilized primarily motor unit recruitment to achieve the increased levels of activation necessary to maintain standing in the presence of external loads. Copyright © 2014 the American Physiological Society.
Analysis of the thermal balance characteristics for multiple-connected piezoelectric transformers.
Park, Joung-Hu; Cho, Bo-Hyung; Choi, Sung-Jin; Lee, Sang-Min
2009-08-01
Because the amount of power that a piezoelectric transformer (PT) can handle is limited, multiple connections of PTs are necessary for the power-capacity improvement of PT-applications. In the connection, thermal imbalance between the PTs should be prevented to avoid the thermal runaway of each PT. The thermal balance of the multiple-connected PTs is dominantly affected by the electrothermal characteristics of individual PTs. In this paper, the thermal balance of both parallel-parallel and parallel-series connections are analyzed by electrical model parameters. For quantitative analysis, the thermal-balance effects are estimated by the simulation of the mechanical loss ratio between the PTs. The analysis results show that with PTs of similar characteristics, the parallel-series connection has better thermal balance characteristics due to the reduced mechanical loss of the higher temperature PT. For experimental verification of the analysis, a hardware-prototype test of a Cs-Lp type 40 W adapter system with radial-vibration mode PTs has been performed.
Scalable Domain Decomposed Monte Carlo Particle Transport
NASA Astrophysics Data System (ADS)
O'Brien, Matthew Joseph
In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation. The main algorithms we consider are: • Domain decomposition of constructive solid geometry: enables extremely large calculations in which the background geometry is too large to fit in the memory of a single computational node. • Load Balancing: keeps the workload per processor as even as possible so the calculation runs efficiently. • Global Particle Find: if particles are on the wrong processor, globally resolve their locations to the correct processor based on particle coordinate and background domain. • Visualizing constructive solid geometry, sourcing particles, deciding that particle streaming communication is completed and spatial redecomposition. These algorithms are some of the most important parallel algorithms required for domain decomposed Monte Carlo particle transport. We demonstrate that our previous algorithms were not scalable, prove that our new algorithms are scalable, and run some of the algorithms up to 2 million MPI processes on the Sequoia supercomputer.
NASA Astrophysics Data System (ADS)
Park, George Ilhwan; Moin, Parviz
2016-01-01
This paper focuses on numerical and practical aspects associated with a parallel implementation of a two-layer zonal wall model for large-eddy simulation (LES) of compressible wall-bounded turbulent flows on unstructured meshes. A zonal wall model based on the solution of unsteady three-dimensional Reynolds-averaged Navier-Stokes (RANS) equations on a separate near-wall grid is implemented in an unstructured, cell-centered finite-volume LES solver. The main challenge in its implementation is to couple two parallel, unstructured flow solvers for efficient boundary data communication and simultaneous time integrations. A coupling strategy with good load balancing and low processors underutilization is identified. Face mapping and interpolation procedures at the coupling interface are explained in detail. The method of manufactured solution is used for verifying the correct implementation of solver coupling, and parallel performance of the combined wall-modeled LES (WMLES) solver is investigated. The method has successfully been applied to several attached and separated flows, including a transitional flow over a flat plate and a separated flow over an airfoil at an angle of attack.
NASA Astrophysics Data System (ADS)
Gutzwiller, David; Gontier, Mathieu; Demeulenaere, Alain
2014-11-01
Multi-Block structured solvers hold many advantages over their unstructured counterparts, such as a smaller memory footprint and efficient serial performance. Historically, multi-block structured solvers have not been easily adapted for use in a High Performance Computing (HPC) environment, and the recent trend towards hybrid GPU/CPU architectures has further complicated the situation. This paper will elaborate on developments and innovations applied to the NUMECA FINE/Turbo solver that have allowed near-linear scalability with real-world problems on over 250 hybrid GPU/GPU cluster nodes. Discussion will focus on the implementation of virtual partitioning and load balancing algorithms using a novel meta-block concept. This implementation is transparent to the user, allowing all pre- and post-processing steps to be performed using a simple, unpartitioned grid topology. Additional discussion will elaborate on developments that have improved parallel performance, including fully parallel I/O with the ADIOS API and the GPU porting of the computationally heavy CPUBooster convergence acceleration module. Head of HPC and Release Management, Numeca International.
NASA Astrophysics Data System (ADS)
Ferrando, N.; Gosálvez, M. A.; Cerdá, J.; Gadea, R.; Sato, K.
2011-03-01
Presently, dynamic surface-based models are required to contain increasingly larger numbers of points and to propagate them over longer time periods. For large numbers of surface points, the octree data structure can be used as a balance between low memory occupation and relatively rapid access to the stored data. For evolution rules that depend on neighborhood states, extended simulation periods can be obtained by using simplified atomistic propagation models, such as the Cellular Automata (CA). This method, however, has an intrinsic parallel updating nature and the corresponding simulations are highly inefficient when performed on classical Central Processing Units (CPUs), which are designed for the sequential execution of tasks. In this paper, a series of guidelines is presented for the efficient adaptation of octree-based, CA simulations of complex, evolving surfaces into massively parallel computing hardware. A Graphics Processing Unit (GPU) is used as a cost-efficient example of the parallel architectures. For the actual simulations, we consider the surface propagation during anisotropic wet chemical etching of silicon as a computationally challenging process with a wide-spread use in microengineering applications. A continuous CA model that is intrinsically parallel in nature is used for the time evolution. Our study strongly indicates that parallel computations of dynamically evolving surfaces simulated using CA methods are significantly benefited by the incorporation of octrees as support data structures, substantially decreasing the overall computational time and memory usage.
Whole body vibration for older persons: an open randomized, multicentre, parallel, clinical trial
2011-01-01
Background Institutionalized older persons have a poor functional capacity. Including physical exercise in their routine activities decreases their frailty and improves their quality of life. Whole-body vibration (WBV) training is a type of exercise that seems beneficial in frail older persons to improve their functional mobility, but the evidence is inconclusive. This trial will compare the results of exercise with WBV and exercise without WBV in improving body balance, muscle performance and fall prevention in institutionalized older persons. Methods/Design An open, multicentre and parallel randomized clinical trial with blinded assessment. 160 nursing home residents aged over 65 years and of both sexes will be identified to participate in the study. Participants will be centrally randomised and allocated to interventions (vibration or exercise group) by telephone. The vibration group will perform static/dynamic exercises (balance and resistance training) on a vibratory platform (Frequency: 30-35 Hz; Amplitude: 2-4 mm) over a six-week training period (3 sessions/week). The exercise group will perform the same exercise protocol but without a vibration stimuli platform. The primary outcome measure is the static/dynamic body balance. Secondary outcomes are muscle strength and, number of new falls. Follow-up measurements will be collected at 6 weeks and at 6 months after randomization. Efficacy will be analysed on an intention-to-treat (ITT) basis and 'per protocol'. The effects of the intervention will be evaluated using the "t" test, Mann-Witney test, or Chi-square test, depending on the type of outcome. The final analysis will be performed 6 weeks and 6 months after randomization. Discussion This study will help to clarify whether WBV training improves body balance, gait mobility and muscle strength in frail older persons living in nursing homes. As far as we know, this will be the first study to evaluate the efficacy of WBV for the prevention of falls. Trial Registration ClinicalTrials.gov: NCT01375790 PMID:22192313
Jiang, Ailian; Zheng, Lihong
2018-03-29
Low cost, high reliability and easy maintenance are key criteria in the design of routing protocols for wireless sensor networks (WSNs). This paper investigates the existing ant colony optimization (ACO)-based WSN routing algorithms and the minimum hop count WSN routing algorithms by reviewing their strengths and weaknesses. We also consider the critical factors of WSNs, such as energy constraint of sensor nodes, network load balancing and dynamic network topology. Then we propose a hybrid routing algorithm that integrates ACO and a minimum hop count scheme. The proposed algorithm is able to find the optimal routing path with minimal total energy consumption and balanced energy consumption on each node. The algorithm has unique superiority in terms of searching for the optimal path, balancing the network load and the network topology maintenance. The WSN model and the proposed algorithm have been implemented using C++. Extensive simulation experimental results have shown that our algorithm outperforms several other WSN routing algorithms on such aspects that include the rate of convergence, the success rate in searching for global optimal solution, and the network lifetime.
2018-01-01
Low cost, high reliability and easy maintenance are key criteria in the design of routing protocols for wireless sensor networks (WSNs). This paper investigates the existing ant colony optimization (ACO)-based WSN routing algorithms and the minimum hop count WSN routing algorithms by reviewing their strengths and weaknesses. We also consider the critical factors of WSNs, such as energy constraint of sensor nodes, network load balancing and dynamic network topology. Then we propose a hybrid routing algorithm that integrates ACO and a minimum hop count scheme. The proposed algorithm is able to find the optimal routing path with minimal total energy consumption and balanced energy consumption on each node. The algorithm has unique superiority in terms of searching for the optimal path, balancing the network load and the network topology maintenance. The WSN model and the proposed algorithm have been implemented using C++. Extensive simulation experimental results have shown that our algorithm outperforms several other WSN routing algorithms on such aspects that include the rate of convergence, the success rate in searching for global optimal solution, and the network lifetime. PMID:29596336
Development of a pneumatic tensioning device for gap measurement during total knee arthroplasty.
Kwak, Dai-Soon; Kong, Chae-Gwan; Han, Seung-Ho; Kim, Dong-Hyun; In, Yong
2012-09-01
Despite the importance of soft tissue balancing during total knee arthroplasty (TKA), all estimating techniques are dependent on a surgeon's manual distraction force or subjective feeling based on experience. We developed a new device for dynamic gap balancing, which can offer constant load to the gap between the femur and tibia, using pneumatic pressure during range of motion. To determine the amount of distraction force for the new device, 3 experienced surgeons' manual distraction force was measured using a conventional spreader. A new device called the consistent load pneumatic tensor was developed on the basis of the biomechanical tests. Reliability testing for the new device was performed using 5 cadaveric knees by the same surgeons. Intraclass correlation coefficients (ICCs) were calculated. The distraction force applied to the new pneumatic tensioning device was determined to be 150 N. The interobserver reliability was very good for the newly tested spreader device with ICCs between 0.828 and 0.881. The new pneumatic tensioning device can enable us to properly evaluate the soft tissue balance throughout the range of motion during TKA with acceptable reproducibility.
Bifurcation Analysis of a DC-DC Bidirectional Power Converter Operating with Constant Power Loads
NASA Astrophysics Data System (ADS)
Cristiano, Rony; Pagano, Daniel J.; Benadero, Luis; Ponce, Enrique
Direct current (DC) microgrids (MGs) are an emergent option to satisfy new demands for power quality and integration of renewable resources in electrical distribution systems. This work addresses the large-signal stability analysis of a DC-DC bidirectional converter (DBC) connected to a storage device in an islanding MG. This converter is responsible for controlling the balance of power (load demand and generation) under constant power loads (CPLs). In order to control the DC bus voltage through a DBC, we propose a robust sliding mode control (SMC) based on a washout filter. Dynamical systems techniques are exploited to assess the quality of this switching control strategy. In this sense, a bifurcation analysis is performed to study the nonlinear stability of a reduced model of this system. The appearance of different bifurcations when load parameters and control gains are changed is studied in detail. In the specific case of Teixeira Singularity (TS) bifurcation, some experimental results are provided, confirming the mathematical predictions. Both a deeper insight in the dynamic behavior of the controlled system and valuable design criteria are obtained.
NASA Astrophysics Data System (ADS)
Tan, D.; Erturk, A.
2018-03-01
For bio-inspired, fish-like robotic propulsion, the Macro-Fiber Composite (MFC) piezoelectric technology offers noiseless actuation with a balance between actuation force and velocity response. However, internal nonlinear- ities within the MFCs, such as piezoelectric softening, geometric hardening, inertial softening, and nonlinear dissipation, couple with the hydrodynamic loading on the structure from the surrounding fluid. In the present work, we explore nonlinear actuation of MFC cantilevers underwater and develop a mathematical framework for modeling and analysis. In vacuo resonant actuation experiments are conducted for a set of MFC cantilevers of varying length to width aspect ratios to validate the structural model in the absence of fluid loading. These MFC cantilevers are then subjected to underwater resonant actuation experiments, and model simulations are compared with nonlinear experimental frequency response functions. It is observed that semi-empirical hydro- dynamic loads obtained from quasilinear experiments have to be modified to account for amplitude dependent added mass, and additional nonlinear hydrodynamic effects might be present, yielding qualitative differences in the resulting underwater frequency respones curves with increased excitation amplitude.
Dynamic response of phenolic resin and its carbon-nanotube composites to shock wave loading
Arman, B.; An, Q.; Luo, S. N.; ...
2011-01-04
We investigate with nonreactive molecular dynamics simulations the dynamic response of phenolic resin and its carbon-nanotube (CNT) composites to shock wave compression. For phenolic resin, our simulations yield shock states in agreement with experiments on similar polymers except the “phase change” observed in experiments, indicating that such phase change is chemical in nature. The elastic–plastic transition is characterized by shear stress relaxation and atomic-level slip, and phenolic resin shows strong strain hardening. Shock loading of the CNT-resin composites is applied parallel or perpendicular to the CNT axis, and the composites demonstrate anisotropy in wave propagation, yield and CNT deformation. Themore » CNTs induce stress concentrations in the composites and may increase the yield strength. Our simulations indicate that the bulk shock response of the composites depends on the volume fraction, length ratio, impact cross-section, and geometry of the CNT components; the short CNTs in current simulations have insignificant effect on the bulk response of resin polymer.« less
Understanding traffic dynamics at a backbone POP
NASA Astrophysics Data System (ADS)
Taft, Nina; Bhattacharyya, Supratik; Jetcheva, Jorjeta; Diot, Christophe
2001-07-01
Spatial and temporal information about traffic dynamics is central to the design of effective traffic engineering practices for IP backbones. In this paper we study backbone traffic dynamics using data collected at a major POP on a tier-1 IP backbone. We develop a methodology that combines packet-level traces from access links in the POP and BGP routing information to build components of POP-to-POP traffic matrices. Our results show that there is wide disparity in the volume of traffic headed towards different egress POPs. At the same time, we find that current routing practices in the backbone tend to constrain traffic between ingress-egress POP pairs to a small number of paths. As a result, there is a wide variation in the utilization level of links in the backbone. Frequent capacity upgrades of the heavily used links are expensive; the need for such upgrades can be reduced by designing load balancing policies that will route more traffic over less utilized links. We identify traffic aggregates based on destination address prefixes and find that this set of criteria isolates a few aggregates that account for an overwhelmingly large portion of inter-POP traffic. We also demonstrate that these aggregates exhibit stability throughout the day on per-hour time scales, and thus they form a natural basis for splitting traffic over multiple paths in order to improve load balancing.
HOWELL, BRITTANY R.; SANCHEZ, MAR M.
2015-01-01
The mechanisms through which early life stress leads to psychopathology are thought to involve allostatic load, the “wear and tear” an organism is subjected to as a consequence of sustained elevated levels of glucocorticoids caused by repeated/prolonged stress activations. The allostatic load model described this phenomenon, but has been criticized as inadequate to explain alterations associated with early adverse experience in some systems, including behavior, which cannot be entirely explained from an energy balance perspective. The reactive scope model has been more recently proposed and focuses less on energy balance and more on dynamic ranges of physiological and behavioral mediators. In this review we examine the mechanisms underlying the behavioral consequences of early life stress in the context of both these models. We focus on adverse experiences that involve mother–infant relationship disruption, and dissect those mechanisms involving maternal care as a regulator of development of neural circuits that control emotional and social behaviors in the offspring. We also discuss the evolutionary purpose of the plasticity in behavioral development, which has a clear adaptive value in a changing environment. PMID:22018078
A strategy to load balancing for non-connectivity MapReduce job
NASA Astrophysics Data System (ADS)
Zhou, Huaping; Liu, Guangzong; Gui, Haixia
2017-09-01
MapReduce has been widely used in large scale and complex datasets as a kind of distributed programming model. Original Hash partitioning function in MapReduce often results the problem of data skew when data distribution is uneven. To solve the imbalance of data partitioning, we proposes a strategy to change the remaining partitioning index when data is skewed. In Map phase, we count the amount of data which will be distributed to each reducer, then Job Tracker monitor the global partitioning information and dynamically modify the original partitioning function according to the data skew model, so the Partitioner can change the index of these partitioning which will cause data skew to the other reducer that has less load in the next partitioning process, and can eventually balance the load of each node. Finally, we experimentally compare our method with existing methods on both synthetic and real datasets, the experimental results show our strategy can solve the problem of data skew with better stability and efficiency than Hash method and Sampling method for non-connectivity MapReduce task.
Qu, Xingda; Nussbaum, Maury A
2009-01-01
The purpose of this study was to identify the effects of external loads on balance control during upright stance, and to examine the ability of a new balance control model to predict these effects. External loads were applied to 12 young, healthy participants, and effects on balance control were characterized by center-of-pressure (COP) based measures. Several loading conditions were studied, involving combinations of load mass (10% and 20% of individual body mass) and height (at or 15% of stature above the whole-body COM). A balance control model based on an optimal control strategy was used to predict COP time series. It was assumed that a given individual would adopt the same neural optimal control mechanisms, identified in a no-load condition, under diverse external loading conditions. With the application of external loads, COP mean velocity in the anterior-posterior direction and RMS distance in the medial-lateral direction increased 8.1% and 10.4%, respectively. Predicted COP mean velocity and RMS distance in the anterior-posterior direction also increased with external loading, by 11.1% and 2.9%, respectively. Both experimental COP data and model-based predictions provided the same general conclusion, that application of larger external loads and loads more superior to the whole body center of mass lead to less effective postural control and perhaps a greater risk of loss of balance or falls. Thus, it can be concluded that the assumption about consistency in control mechanisms was partially supported, and it is the mechanical changes induced by external loads that primarily affect balance control.
Load Balancing in Structured P2P Networks
NASA Astrophysics Data System (ADS)
Zhu, Yingwu
In this chapter we start by addressing the importance and necessity of load balancing in structured P2P networks, due to three main reasons. First, structured P2P networks assume uniform peer capacities while peer capacities are heterogeneous in deployed P2P networks. Second, resorting to pseudo-uniformity of the hash function used to generate node IDs and data item keys leads to imbalanced overlay address space and item distribution. Lastly, placement of data items cannot be randomized in some applications (e.g., range searching). We then present an overview of load aggregation and dissemination techniques that are required by many load balancing algorithms. Two techniques are discussed including tree structure-based approach and gossip-based approach. They make different tradeoffs between estimate/aggregate accuracy and failure resilience. To address the issue of load imbalance, three main solutions are described: virtual server-based approach, power of two choices, and address-space and item balancing. While different in their designs, they all aim to improve balance on the address space and data item distribution. As a case study, the chapter discusses a virtual server-based load balancing algorithm that strives to ensure fair load distribution among nodes and minimize load balancing cost in bandwidth. Finally, the chapter concludes with future research and a summary.
Chaouachi, Anis; Othman, Aymen Ben; Hammami, Raouf; Drinkwater, Eric J; Behm, David G
2014-02-01
Because balance is not fully developed in children and studies have shown functional improvements with balance only training studies, a combination of plyometric and balance activities might enhance static balance, dynamic balance, and power. The objective of this study was to compare the effectiveness of plyometric only (PLYO) with balance and plyometric (COMBINED) training on balance and power measures in children. Before and after an 8-week training period, testing assessed lower-body strength (1 repetition maximum leg press), power (horizontal and vertical jumps, triple hop for distance, reactive strength, and leg stiffness), running speed (10-m and 30-m sprint), static and dynamic balance (Standing Stork Test and Star Excursion Balance Test), and agility (shuttle run). Subjects were randomly divided into 2 training groups (PLYO [n = 14] and COMBINED [n = 14]) and a control group (n = 12). Results based on magnitude-based inferences and precision of estimation indicated that the COMBINED training group was considered likely to be superior to the PLYO group in leg stiffness (d = 0.69, 91% likely), 10-m sprint (d = 0.57, 84% likely), and shuttle run (d = 0.52, 80% likely). The difference between the groups was unclear in 8 of the 11 dependent variables. COMBINED training enhanced activities such as 10-m sprints and shuttle runs to a greater degree. COMBINED training could be an important consideration for reducing the high velocity impacts of PLYO training. This reduction in stretch-shortening cycle stress on neuromuscular system with the replacement of balance and landing exercises might help to alleviate the overtraining effects of excessive repetitive high load activities.
Scheduling based on a dynamic resource connection
NASA Astrophysics Data System (ADS)
Nagiyev, A. E.; Botygin, I. A.; Shersntneva, A. I.; Konyaev, P. A.
2017-02-01
The practical using of distributed computing systems associated with many problems, including troubles with the organization of an effective interaction between the agents located at the nodes of the system, with the specific configuration of each node of the system to perform a certain task, with the effective distribution of the available information and computational resources of the system, with the control of multithreading which implements the logic of solving research problems and so on. The article describes the method of computing load balancing in distributed automatic systems, focused on the multi-agency and multi-threaded data processing. The scheme of the control of processing requests from the terminal devices, providing the effective dynamic scaling of computing power under peak load is offered. The results of the model experiments research of the developed load scheduling algorithm are set out. These results show the effectiveness of the algorithm even with a significant expansion in the number of connected nodes and zoom in the architecture distributed computing system.
NASA Astrophysics Data System (ADS)
Hu, Ling-Fei; Gao, Li-Dan; Li, Zhen-Jie; Wang, Shan-Feng; Sheng, Wei-Fan; Liu, Peng; Xu, Wei
2015-09-01
The high energy resolution monochromator (HRM) is widely used in inelastic scattering programs to detect phonons with energy resolution, down to the meV level. Although the large amount of heat from insertion devices can be reduced by a high heat-load monochromator, the unbalanced heat load on the inner pair of crystals in a nested HRM can affect its overall performance. Here, a theoretical analysis of the unbalanced heat load using dynamical diffraction theory and finite element analysis is presented. By utilizing the ray-tracing method, the performance of different HRM nesting configurations is simulated. It is suggested that the heat balance ratio, energy resolution, and overall spectral transmission efficiency are the figures of merit for evaluating the performance of nested HRMs. Although the present study is mainly focused on nested HRMs working at 57Fe nuclear resonant energy at 14.4 keV, it is feasible to extend this to other nested HRMs working at different energies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Etingov, Pavel V.; Makarov, Yuri V.; Wu, Di
The document describes detailed uncertainty quantification (UQ) methodology developed by PNNL to estimate secure ranges of potential dynamic intra-hour interchange adjustments in the ISO-NE system and provides description of the dynamic interchange adjustment (DINA) tool developed under the same contract. The overall system ramping up and down capability, spinning reserve requirements, interchange schedules, load variations and uncertainties from various sources that are relevant to the ISO-NE system are incorporated into the methodology and the tool. The DINA tool has been tested by PNNL and ISO-NE staff engineers using ISO-NE data.
Performance Optimization of Marine Science and Numerical Modeling on HPC Cluster
Yang, Dongdong; Yang, Hailong; Wang, Luming; Zhou, Yucong; Zhang, Zhiyuan; Wang, Rui; Liu, Yi
2017-01-01
Marine science and numerical modeling (MASNUM) is widely used in forecasting ocean wave movement, through simulating the variation tendency of the ocean wave. Although efforts have been devoted to improve the performance of MASNUM from various aspects by existing work, there is still large space unexplored for further performance improvement. In this paper, we aim at improving the performance of propagation solver and data access during the simulation, in addition to the efficiency of output I/O and load balance. Our optimizations include several effective techniques such as the algorithm redesign, load distribution optimization, parallel I/O and data access optimization. The experimental results demonstrate that our approach achieves higher performance compared to the state-of-the-art work, about 3.5x speedup without degrading the prediction accuracy. In addition, the parameter sensitivity analysis shows our optimizations are effective under various topography resolutions and output frequencies. PMID:28045972
Load-sharing through elastic micro-motion accelerates bone formation and interbody fusion.
Ledet, Eric H; Sanders, Glenn P; DiRisio, Darryl J; Glennon, Joseph C
2018-02-13
Achieving a successful spinal fusion requires the proper biological and biomechanical environment. Optimizing load-sharing in the interbody space can enhance bone formation. For anterior cervical discectomy and fusion (ACDF), loading and motion are largely dictated by the stiffness of the plate, which can facilitate a balance between stability and load-sharing. The advantages of load-sharing may be substantial for patients with comorbidities and in multilevel procedures where pseudarthrosis rates are significant. We aimed to evaluate the efficacy of a novel elastically deformable, continuously load-sharing anterior cervical spinal plate for promotion of bone formation and interbody fusion relative to a translationally dynamic plate. An in vivo animal model was used to evaluate the effects of an elastically deformable spinal plate on bone formation and spine fusion. Fourteen goats underwent an ACDF and received either a translationally dynamic or elastically deformable plate. Animals were followed up until 18 weeks and were evaluated by plain x-ray, computed tomography scan, and undecalcified histology to evaluate the rate and quality of bone formation and interbody fusion. Animals treated with the elastically deformable plate demonstrated statistically significantly superior early bone formation relative to the translationally dynamic plate. Trends in the data from 8 to 18 weeks postoperatively suggest that the elastically deformable implant enhanced bony bridging and fusion, but these enhancements were not statistically significant. Load-sharing through elastic micro-motion accelerates bone formation in the challenging goat ACDF model. The elastically deformable implant used in this study may promote early bony bridging and increased rates of fusion, but future studies will be necessary to comprehensively characterize the advantages of load-sharing through micro-motion. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Tom, Nathan; Yu, Yi-Hsiang; Wright, Alan; ...
2017-11-17
The focus of this paper is to balance power absorption against structural loading for a novel fixed-bottom oscillating surge wave energy converter in both regular and irregular wave environments. The power-to-load ratio will be evaluated using pseudospectral control (PSC) to determine the optimum power-takeoff (PTO) torque based on a multiterm objective function. This paper extends the pseudospectral optimal control problem to not just maximize the time-averaged absorbed power but also include measures for the surge-foundation force and PTO torque in the optimization. The objective function may now potentially include three competing terms that the optimizer must balance. Separate weighting factorsmore » are attached to the surge-foundation force and PTO control torque that can be used to tune the optimizer performance to emphasize either power absorption or load shedding. To correct the pitch equation of motion, derived from linear hydrodynamic theory, a quadratic-viscous-drag torque has been included in the system dynamics; however, to continue the use of quadratic programming solvers, an iteratively obtained linearized drag coefficient was utilized that provided good accuracy in the predicted pitch motion. Furthermore, the analysis considers the use of a nonideal PTO unit to more accurately evaluate controller performance. The PTO efficiency is not directly included in the objective function but rather the weighting factors are utilized to limit the PTO torque amplitudes, thereby reducing the losses resulting from the bidirectional energy flow through a nonideal PTO. Results from PSC show that shedding a portion of the available wave energy can lead to greater reductions in structural loads, peak-to-average power ratio, and reactive power requirement.« less
Dynamic waste management (DWM): towards an evolutionary decision-making approach.
Rojo, Gabriel; Glaus, Mathias; Laforest, Valerie; Laforest, Valérie; Bourgois, Jacques; Bourgeois, Jacques; Hausler, Robert
2013-12-01
To guarantee sustainable and dynamic waste management, the dynamic waste management approach (DWM) suggests an evolutionary new approach that maintains a constant flow towards the most favourable waste treatment processes (facilities) within a system. To that end, DWM is based on the law of conservation of energy, which allows the balancing of a network, while considering the constraints of incoming (h1 ) and outgoing (h2 ) loads, as well as the distribution network (ΔH) characteristics. The developed approach lies on the identification of the prioritization index (PI) for waste generators (analogy to h1 ), a global allocation index for each of the treatment processes (analogy to h2 ) and the linear index load loss (ΔH) associated with waste transport. To demonstrate the scope of DWM, we outline this approach, and then present an example of its application. The case study shows that the variable monthly waste from the three considered sources is dynamically distributed in priority to the more favourable processes. Moreover, the reserve (stock) helps temporarily store waste in order to ease the global load of the network and favour a constant feeding of the treatment processes. The DWM approach serves as a decision-making tool by evaluating new waste treatment processes, as well as their location and new means of transport for waste.
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
Quantum load balancing in ad hoc networks
NASA Astrophysics Data System (ADS)
Hasanpour, M.; Shariat, S.; Barnaghi, P.; Hoseinitabatabaei, S. A.; Vahid, S.; Tafazolli, R.
2017-06-01
This paper presents a novel approach in targeting load balancing in ad hoc networks utilizing the properties of quantum game theory. This approach benefits from the instantaneous and information-less capability of entangled particles to synchronize the load balancing strategies in ad hoc networks. The quantum load balancing (QLB) algorithm proposed by this work is implemented on top of OLSR as the baseline routing protocol; its performance is analyzed against the baseline OLSR, and considerable gain is reported regarding some of the main QoS metrics such as delay and jitter. Furthermore, it is shown that QLB algorithm supports a solid stability gain in terms of throughput which stands a proof of concept for the load balancing properties of the proposed theory.
Efficient implementation of a 3-dimensional ADI method on the iPSC/860
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van der Wijngaart, R.F.
1993-12-31
A comparison is made between several domain decomposition strategies for the solution of three-dimensional partial differential equations on a MIMD distributed memory parallel computer. The grids used are structured, and the numerical algorithm is ADI. Important implementation issues regarding load balancing, storage requirements, network latency, and overlap of computations and communications are discussed. Results of the solution of the three-dimensional heat equation on the Intel iPSC/860 are presented for the three most viable methods. It is found that the Bruno-Cappello decomposition delivers optimal computational speed through an almost complete elimination of processor idle time, while providing good memory efficiency.
Validation of Shear Wave Elastography in Skeletal Muscle
Eby, Sarah F.; Song, Pengfei; Chen, Shigao; Chen, Qingshan; Greenleaf, James F.; An, Kai-Nan
2013-01-01
Skeletal muscle is a very dynamic tissue, thus accurate quantification of skeletal muscle stiffness throughout its functional range is crucial to improve the physical functioning and independence following pathology. Shear wave elastography (SWE) is an ultrasound-based technique that characterizes tissue mechanical properties based on the propagation of remotely induced shear waves. The objective of this study is to validate SWE throughout the functional range of motion of skeletal muscle for three ultrasound transducer orientations. We hypothesized that combining traditional materials testing (MTS) techniques with SWE measurements will show increased stiffness measures with increasing tensile load, and will correlate well with each other for trials in which the transducer is parallel to underlying muscle fibers. To evaluate this hypothesis, we monitored the deformation throughout tensile loading of four porcine brachialis whole-muscle tissue specimens, while simultaneously making SWE measurements of the same specimen. We used regression to examine the correlation between Young's modulus from MTS and shear modulus from SWE for each of the transducer orientations. We applied a generalized linear model to account for repeated testing. Model parameters were estimated via generalized estimating equations. The regression coefficient was 0.1944, with a 95% confidence interval of (0.1463 – 0.2425) for parallel transducer trials. Shear waves did not propagate well for both the 45° and perpendicular transducer orientations. Both parallel SWE and MTS showed increased stiffness with increasing tensile load. This study provides the necessary first step for additional studies that can evaluate the distribution of stiffness throughout muscle. PMID:23953670
Utilization of Model Predictive Control to Balance Power Absorption Against Load Accumulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abbas, Nikhar; Tom, Nathan M
2017-06-03
Wave energy converter (WEC) control strategies have been primarily focused on maximizing power absorption. The use of model predictive control strategies allows for a finite-horizon, multiterm objective function to be solved. This work utilizes a multiterm objective function to maximize power absorption while minimizing the structural loads on the WEC system. Furthermore, a Kalman filter and autoregressive model were used to estimate and forecast the wave exciting force and predict the future dynamics of the WEC. The WEC's power-take-off time-averaged power and structural loads under a perfect forecast assumption in irregular waves were compared against results obtained from the Kalmanmore » filter and autoregressive model to evaluate model predictive control performance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abbas, Nikhar; Tom, Nathan
Wave energy converter (WEC) control strategies have been primarily focused on maximizing power absorption. The use of model predictive control strategies allows for a finite-horizon, multiterm objective function to be solved. This work utilizes a multiterm objective function to maximize power absorption while minimizing the structural loads on the WEC system. Furthermore, a Kalman filter and autoregressive model were used to estimate and forecast the wave exciting force and predict the future dynamics of the WEC. The WEC's power-take-off time-averaged power and structural loads under a perfect forecast assumption in irregular waves were compared against results obtained from the Kalmanmore » filter and autoregressive model to evaluate model predictive control performance.« less
NASA Astrophysics Data System (ADS)
Vasilyan, Suren; Rivero, Michel; Schleichert, Jan; Halbedel, Bernd; Fröhlich, Thomas
2016-04-01
In this paper, we present an application for realizing high-precision horizontally directed force measurements in the order of several tens of nN in combination with high dead loads of about 10 N. The set-up is developed on the basis of two identical state-of-the-art electromagnetic force compensation (EMFC) high precision balances. The measurement resolution of horizontally directed single-axis quasi-dynamic forces is 20 nN over the working range of ±100 μN. The set-up operates in two different measurement modes: in the open-loop mode the mechanical deflection of the proportional lever is an indication of the acting force, whereas in the closed-loop mode it is the applied electric current to the coil inside the EMFC balance that compensates deflection of the lever to the offset zero position. The estimated loading frequency (cutoff frequency) of the set-up in the open-loop mode is about 0.18 Hz, in the closed-loop mode it is 0.7 Hz. One of the practical applications that the set-up is suitable for is the flow rate measurements of low electrically conducting electrolytes by applying the contactless technique of Lorentz force velocimetry. Based on a previously developed set-up which uses a single EMFC balance, experimental, theoretical and numerical analyses of the thermo-mechanical properties of the supporting structure are presented.
Optimal dynamic remapping of parallel computations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Reynolds, Paul F., Jr.
1987-01-01
A large class of computations are characterized by a sequence of phases, with phase changes occurring unpredictably. The decision problem was considered regarding the remapping of workload to processors in a parallel computation when the utility of remapping and the future behavior of the workload is uncertain, and phases exhibit stable execution requirements during a given phase, but requirements may change radically between phases. For these problems a workload assignment generated for one phase may hinder performance during the next phase. This problem is treated formally for a probabilistic model of computation with at most two phases. The fundamental problem of balancing the expected remapping performance gain against the delay cost was addressed. Stochastic dynamic programming is used to show that the remapping decision policy minimizing the expected running time of the computation has an extremely simple structure. Because the gain may not be predictable, the performance of a heuristic policy that does not require estimnation of the gain is examined. The heuristic method's feasibility is demonstrated by its use on an adaptive fluid dynamics code on a multiprocessor. The results suggest that except in extreme cases, the remapping decision problem is essentially that of dynamically determining whether gain can be achieved by remapping after a phase change. The results also suggest that this heuristic is applicable to computations with more than two phases.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.
1997-01-01
Applications are described of high-performance computing methods to the numerical simulation of complete jet engines. The methodology focuses on the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field elements. New partitioned analysis procedures to treat this coupled three-component problem were developed. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. The NASA-sponsored ENG10 program was used for the global steady state analysis of the whole engine. This program uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 was developed as well as the capability for the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames.
Nonrecursive formulations of multibody dynamics and concurrent multiprocessing
NASA Technical Reports Server (NTRS)
Kurdila, Andrew J.; Menon, Ramesh
1993-01-01
Since the late 1980's, research in recursive formulations of multibody dynamics has flourished. Historically, much of this research can be traced to applications of low dimensionality in mechanism and vehicle dynamics. Indeed, there is little doubt that recursive order N methods are the method of choice for this class of systems. This approach has the advantage that a minimal number of coordinates are utilized, parallelism can be induced for certain system topologies, and the method is of order N computational cost for systems of N rigid bodies. Despite the fact that many authors have dismissed redundant coordinate formulations as being of order N(exp 3), and hence less attractive than recursive formulations, we present recent research that demonstrates that at least three distinct classes of redundant, nonrecursive multibody formulations consistently achieve order N computational cost for systems of rigid and/or flexible bodies. These formulations are as follows: (1) the preconditioned range space formulation; (2) penalty methods; and (3) augmented Lagrangian methods for nonlinear multibody dynamics. The first method can be traced to its foundation in equality constrained quadratic optimization, while the last two methods have been studied extensively in the context of coercive variational boundary value problems in computational mechanics. Until recently, however, they have not been investigated in the context of multibody simulation, and present theoretical questions unique to nonlinear dynamics. All of these nonrecursive methods have additional advantages with respect to recursive order N methods: (1) the formalisms retain the highly desirable order N computational cost; (2) the techniques are amenable to concurrent simulation strategies; (3) the approaches do not depend upon system topology to induce concurrency; and (4) the methods can be derived to balance the computational load automatically on concurrent multiprocessors. In addition to the presentation of the fundamental formulations, this paper presents new theoretical results regarding the rate of convergence of order N constraint stabilization schemes associated with the newly introduced class of methods.
A novel load balanced energy conservation approach in WSN using biogeography based optimization
NASA Astrophysics Data System (ADS)
Kaushik, Ajay; Indu, S.; Gupta, Daya
2017-09-01
Clustering sensor nodes is an effective technique to reduce energy consumption of the sensor nodes and maximize the lifetime of Wireless sensor networks. Balancing load of the cluster head is an important factor in long run operation of WSNs. In this paper we propose a novel load balancing approach using biogeography based optimization (LB-BBO). LB-BBO uses two separate fitness functions to perform load balancing of equal and unequal load respectively. The proposed method is simulated using matlab and compared with existing methods. The proposed method shows better performance than all the previous works implemented for energy conservation in WSN
Efficient Synthesis of Graph Methods: a Dynamically Scheduled Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino
RDF databases naturally map to a graph representation and employ languages, such as SPARQL, that implements queries as graph pattern matching routines. Graph methods exhibit an irregular behavior: they present unpredictable, fine-grained data accesses, and are synchronization inten- sive. Graph data structures expose large amounts of dy- namic parallelism, but are difficult to partition without gen- erating load unbalance. In this paper, we present a novel ar- chitecture to improve the synthesis of graph methods. Our design addresses the issues of these algorithms with two com- ponents: a Dynamic Task Scheduler (DTS), which reduces load unbalance and maximize resource utilization,more » and a Hi- erarchical Memory Interface controller (HMI), which pro- vides support for concurrent memory operations on multi- ported/multi-banked shared memories. We evaluate our ap- proach by generating the accelerators for a set of SPARQL queries from the Lehigh University Benchmark (LUBM). We first analyze the load unbalance of these queries, showing that execution time among tasks can differ even of order of magnitudes. We then synthesize the queries and com- pare the performance of the resulting accelerators against the current state of the art. Experimental results show that our solution provides a speedup over the serial implementa- tion close to the theoretical maximum and a speedup up to 3.45 over a baseline parallel implementation. We conclude our study by exploring the design space to achieve maximum memory channels utilization. The best design used at least three of the four memory channels for more than 90% of the execution time.« less
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM
NASA Astrophysics Data System (ADS)
de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl
2002-03-01
We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
NASA Astrophysics Data System (ADS)
Martin, M. A.; Winkelmann, R.; Haseloff, M.; Albrecht, T.; Bueler, E.; Khroulev, C.; Levermann, A.
2011-09-01
We present a dynamic equilibrium simulation of the ice sheet-shelf system on Antarctica with the Potsdam Parallel Ice Sheet Model (PISM-PIK). The simulation is initialized with present-day conditions for bed topography and ice thickness and then run to steady state with constant present-day surface mass balance. Surface temperature and sub-shelf basal melt distribution are parameterized. Grounding lines and calving fronts are free to evolve, and their modeled equilibrium state is compared to observational data. A physically-motivated calving law based on horizontal spreading rates allows for realistic calving fronts for various types of shelves. Steady-state dynamics including surface velocity and ice flux are analyzed for whole Antarctica and the Ronne-Filchner and Ross ice shelf areas in particular. The results show that the different flow regimes in sheet and shelves, and the transition zone between them, are captured reasonably well, supporting the approach of superposition of SIA and SSA for the representation of fast motion of grounded ice. This approach also leads to a natural emergence of sliding-dominated flow in stream-like features in this new 3-D marine ice sheet model.
Physical load handling and listening comprehension effects on balance control.
Qu, Xingda
2010-12-01
The purpose of this study was to determine the physical load handling and listening comprehension effects on balance control. A total of 16 young and 16 elderly participants were recruited in this study. The physical load handling task required holding a 5-kg load in each hand with arms at sides. The listening comprehension task involved attentive listening to a short conversation. Three short questions were asked regarding the conversation right after the testing trial to test the participants' attentiveness during the experiment. Balance control was assessed by centre of pressure-based measures, which were calculated from the force platform data when the participants were quietly standing upright on a force platform. Results from this study showed that both physical load handling and listening comprehension adversely affected balance control. Physical load handling had a more deleterious effect on balance control under the listening comprehension condition vs. no-listening comprehension condition. Based on the findings from this study, interventions for the improvement of balance could be focused on avoiding exposures to physically demanding tasks and cognitively demanding tasks simultaneously. STATEMENT OF RELEVANCE: Findings from this study can aid in better understanding how humans maintain balance, especially when physical and cognitive loads are applied. Such information is useful for developing interventions to prevent fall incidents and injuries in occupational settings and daily activities.
Coelho, Daniel Boari; Bourlinova, Catarina; Teixeira, Luis Augusto
2016-12-01
In the present experiment, we aimed to evaluate the interactive effect of performing a cognitive task simultaneously with a manual task requiring either high or low steadiness on APRs. Young volunteers performed the task of recovering upright balance following a mechanical perturbation provoked by unanticipatedly releasing a load pulling the participant's body backwards. The postural task was performed while holding a cylinder steadily on a tray. One group performed that task under high (cylinder' round side down) and another one under low (cylinder' flat side down) manual steadiness constraint. Those tasks were evaluated in the conditions of performing concurrently a cognitive numeric subtraction task and under no cognitive task. Analysis showed that performance of the cognitive task led to increased body and tray displacement, associated with higher displacement at the hip and upper trunk, and lower magnitude of activation of the GM muscle in response to the perturbation. Conversely, high manual steadiness constraint led to reduced tray velocity in association with lower values of trunk displacement, and decreased rotation amplitude at the ankle and hip joints. We found no interactions between the effects of the cognitive and manual tasks on APRs, suggesting that they were processed in parallel in the generation of responses for balance recovery. Modulation of postural responses from the manual and cognitive tasks indicates participation of higher order neural structures in the generation of APRs, with postural responses being affected by multiple mental processes occurring in parallel. Copyright © 2016 Elsevier B.V. All rights reserved.
Collectively loading an application in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
NASA Technical Reports Server (NTRS)
Hailperin, Max
1993-01-01
This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that our techniques allow more accurate estimation of the global system load ing, resulting in fewer object migration than local methods. Our method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive methods.
Design and implementation of streaming media server cluster based on FFMpeg.
Zhao, Hong; Zhou, Chun-long; Jin, Bao-zhao
2015-01-01
Poor performance and network congestion are commonly observed in the streaming media single server system. This paper proposes a scheme to construct a streaming media server cluster system based on FFMpeg. In this scheme, different users are distributed to different servers according to their locations and the balance among servers is maintained by the dynamic load-balancing algorithm based on active feedback. Furthermore, a service redirection algorithm is proposed to improve the transmission efficiency of streaming media data. The experiment results show that the server cluster system has significantly alleviated the network congestion and improved the performance in comparison with the single server system.
Design and Implementation of Streaming Media Server Cluster Based on FFMpeg
Zhao, Hong; Zhou, Chun-long; Jin, Bao-zhao
2015-01-01
Poor performance and network congestion are commonly observed in the streaming media single server system. This paper proposes a scheme to construct a streaming media server cluster system based on FFMpeg. In this scheme, different users are distributed to different servers according to their locations and the balance among servers is maintained by the dynamic load-balancing algorithm based on active feedback. Furthermore, a service redirection algorithm is proposed to improve the transmission efficiency of streaming media data. The experiment results show that the server cluster system has significantly alleviated the network congestion and improved the performance in comparison with the single server system. PMID:25734187
Coarse-graining to the meso and continuum scales with molecular-dynamics-like models
NASA Astrophysics Data System (ADS)
Plimpton, Steve
Many engineering-scale problems that industry or the national labs try to address with particle-based simulations occur at length and time scales well beyond the most optimistic hopes of traditional coarse-graining methods for molecular dynamics (MD), which typically start at the atomic scale and build upward. However classical MD can be viewed as an engine for simulating particles at literally any length or time scale, depending on the models used for individual particles and their interactions. To illustrate I'll highlight several coarse-grained (CG) materials models, some of which are likely familiar to molecular-scale modelers, but others probably not. These include models for water droplet freezing on surfaces, dissipative particle dynamics (DPD) models of explosives where particles have internal state, CG models of nano or colloidal particles in solution, models for aspherical particles, Peridynamics models for fracture, and models of granular materials at the scale of industrial processing. All of these can be implemented as MD-style models for either soft or hard materials; in fact they are all part of our LAMMPS MD package, added either by our group or contributed by collaborators. Unlike most all-atom MD simulations, CG simulations at these scales often involve highly non-uniform particle densities. So I'll also discuss a load-balancing method we've implemented for these kinds of models, which can improve parallel efficiencies. From the physics point-of-view, these models may be viewed as non-traditional or ad hoc. But because they are MD-style simulations, there's an opportunity for physicists to add statistical mechanics rigor to individual models. Or, in keeping with a theme of this session, to devise methods that more accurately bridge models from one scale to the next.
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2010 CFR
2010-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2012 CFR
2012-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2011 CFR
2011-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2014 CFR
2014-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
14 CFR 25.393 - Loads parallel to hinge line.
Code of Federal Regulations, 2013 CFR
2013-01-01
... designed for inertia loads acting parallel to the hinge line. (b) In the absence of more rational data, the inertia loads may be assumed to be equal to KW, where— (1) K=24 for vertical surfaces; (2) K=12 for...
40 CFR Table 33 to Subpart G of... - Saturation Factors
Code of Federal Regulations, 2011 CFR
2011-07-01
... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...
40 CFR Table 33 to Subpart G of... - Saturation Factors
Code of Federal Regulations, 2014 CFR
2014-07-01
... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...
40 CFR Table 33 to Subpart G of... - Saturation Factors
Code of Federal Regulations, 2012 CFR
2012-07-01
... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...
40 CFR Table 33 to Subpart G of... - Saturation Factors
Code of Federal Regulations, 2010 CFR
2010-07-01
... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...
40 CFR Table 33 to Subpart G of... - Saturation Factors
Code of Federal Regulations, 2013 CFR
2013-07-01
... service 0.60 Submerged loading: dedicated vapor balance service 1.00 Splash loading of a clean cargo tank 1.45 Splash loading: dedicated normal service 1.45 Splash loading: dedicated vapor balance service 1...
A Theoretical Study of Cold Air Damming.
NASA Astrophysics Data System (ADS)
Xu, Qin
1990-12-01
The dynamics of cold air damming are examined analytically with a two-layer steady state model. The upper layer is a warm and saturated cross-mountain (easterly or southeasterly onshore) flow. The lower layer is a cold mountain-parallel (northerly) jet trapped on the windward (eastern) side of the mountain. The interface between the two layers represents a coastal front-a sloping inversion layer coupling the trapped cold dome with the warm onshore flow above through pressure continuity.An analytical expression is obtained for the inviscid upper-layer flow with hydrostatic and moist adiabatic approximations. Blackadar's PBL parameterization of eddy viscosity is used in the lower-layer equations. Solutions for the mountain-parallel jet and its associated secondary transverse circulation are obtained by expanding asymptotically upon a small parameter proportional to the square root of the inertial aspect ratio-the ratio between the mountain height and the radius of inertial oscillation. The geometric shape of the sloping interface is solved numerically from a differential-integral equation derived from the pressure continuity condition imposed at the interface.The observed flow structures and force balances of cold air damming events are produced qualitatively by the model. In the cold dome the mountain-parallel jet is controlled by the competition between the mountain-parallel pressure gradient and friction: the jet is stronger with smoother surfaces, higher mountains, and faster mountain-normal geostrophic winds. In the mountain-normal direction the vertically averaged force balance in the cold dome is nearly geostrophic and controls the geometric shape of the cold dome. The basic mountain-normal pressure gradient generated in the cold dome by the negative buoyancy distribution tends to flatten the sloping interface and expand the cold dome upstream against the mountain-normal pressure gradient (produced by the upper-layer onshore wind) and Coriolis force (induced by the lower-layer mountain-parallel jet). It is found that the interface slope increases and the cold dome shrinks as the Froude number and/or upstream mountain-parallel geostrophic wind increase, or as the Rossby number, upper-layer depth, and/or surface roughness length decrease, and vice versa. The cold dome will either vanish or not be in a steady state if the Froude number is large enough or the roughness length gets too small. The theoretical findings are explained physically based on detailed analyses of the force balance along the inversion interface.
Expert Systems on Multiprocessor Architectures. Volume 3. Technical Reports
1991-06-01
choice of load balancing vs. load sharing 1141. While load balancing strives to keep all sites equally loaded, load sharing merely tries to prevent ...unnecessary idleness. Loo. balancing is appropriate to object- oriented real- time systems because * real-time systems ne ,l to prevent long waits for...oetavir ConClass siy51cr Iz a n ubjeU rephitation ’-enare ir order wo prevent a partic=Lar abiec:;ram heing (ntrlu ~lel Ar iic]en:f etautaan ire chanw
DOE Office of Scientific and Technical Information (OSTI.GOV)
Easy, L., E-mail: le590@york.ac.uk; CCFE, Culham Science Centre, Abingdon OX14 3DB; Militello, F.
2016-01-15
The propagation of filaments in the Scrape Off Layer (SOL) of tokamaks largely determines the plasma profiles in the region. In a conduction limited SOL, parallel temperature gradients are expected, such that the resistance to parallel currents is greater at the target than further upstream. Since the perpendicular motion of an isolated filament is largely determined by balance of currents that flow through it, this may be expected to affect filament transport. 3D simulations have thus been used to study the influence of enhanced parallel resistivity on the dynamics of filaments. Filaments with the smallest perpendicular length scales, which weremore » inertially limited at low resistivity (meaning that polarization rather than parallel currents determines their radial velocities), were unaffected by resistivity. For larger filaments, faster velocities were produced at higher resistivities due to two mechanisms. First parallel currents were reduced and polarization currents were enhanced, meaning that the inertial regime extended to larger filaments, and second, a potential difference formed along the parallel direction so that higher potentials were produced in the region of the filament for the same amount of current to flow into the sheath. These results indicate that broader SOL profiles could be produced at higher resistivities.« less
NASA Astrophysics Data System (ADS)
Wang, Guiji; Chen, Xuemiao; Cai, Jintao; Zhang, Xuping; Chong, Tao; Luo, Binqiang; Zhao, Jianheng; Sun, Chengwei; Tan, Fuli; Liu, Cangli; Wu, Gang
2016-06-01
A high current pulsed power generator CQ-3-MMAF (Multi-Modules Assembly Facility, MMAF) was developed for material dynamics experiments under ramp wave and shock loadings at the Institute of Fluid Physics (IFP), which can deliver 3 MA peak current to a strip-line load. The rise time of the current is 470 ns (10%-90%). Different from the previous CQ-4 at IFP, the CQ-3-MMAF energy is transmitted by hundreds of co-axial high voltage cables with a low impedance of 18.6 mΩ and low loss, and then hundreds of cables are reduced and converted to tens of cables into a vacuum chamber by a cable connector, and connected with a pair of parallel metallic plates insulated by Kapton films. It is composed of 32 capacitor and switch modules in parallel. The electrical parameters in short circuit are with a capacitance of 19.2 μF, an inductance of 11.7 nH, a resistance of 4.3 mΩ, and working charging voltage of 60 kV-90 kV. It can be run safely and stable when charged from 60 kV to 90 kV. The vacuum of loading chamber can be up to 10-2 Pa, and the current waveforms can be shaped by discharging in time sequences of four groups of capacitor and switch modules. CQ-3-MMAF is an adaptive machine with lower maintenance because of its modularization design. The COMSOL Multi-physics® code is used to optimize the structure of some key components and calculate their structural inductance for designs, such as gas switches and cable connectors. Some ramp wave loading experiments were conducted to check and examine the performances of CQ-3-MMAF. Two copper flyer plates were accelerated to about 3.5 km/s in one shot when the working voltage was charged to 70 kV. The velocity histories agree very well. The dynamic experiments of some polymer bonded explosives and phase transition of tin under ramp wave loadings were also conducted. The experimental data show that CQ-3-MMAF can be used to do material dynamics experiments in high rate and low cost shots. Based on this design concept, the peak current of new generators can be increased to 5-6 MA and about 100 GPa ramp stress can be produced on the metallic samples for high pressure physics, and a conceptual design of CQ-5-MMAF was given.
Wang, Guiji; Chen, Xuemiao; Cai, Jintao; Zhang, Xuping; Chong, Tao; Luo, Binqiang; Zhao, Jianheng; Sun, Chengwei; Tan, Fuli; Liu, Cangli; Wu, Gang
2016-06-01
A high current pulsed power generator CQ-3-MMAF (Multi-Modules Assembly Facility, MMAF) was developed for material dynamics experiments under ramp wave and shock loadings at the Institute of Fluid Physics (IFP), which can deliver 3 MA peak current to a strip-line load. The rise time of the current is 470 ns (10%-90%). Different from the previous CQ-4 at IFP, the CQ-3-MMAF energy is transmitted by hundreds of co-axial high voltage cables with a low impedance of 18.6 mΩ and low loss, and then hundreds of cables are reduced and converted to tens of cables into a vacuum chamber by a cable connector, and connected with a pair of parallel metallic plates insulated by Kapton films. It is composed of 32 capacitor and switch modules in parallel. The electrical parameters in short circuit are with a capacitance of 19.2 μF, an inductance of 11.7 nH, a resistance of 4.3 mΩ, and working charging voltage of 60 kV-90 kV. It can be run safely and stable when charged from 60 kV to 90 kV. The vacuum of loading chamber can be up to 10(-2) Pa, and the current waveforms can be shaped by discharging in time sequences of four groups of capacitor and switch modules. CQ-3-MMAF is an adaptive machine with lower maintenance because of its modularization design. The COMSOL Multi-physics® code is used to optimize the structure of some key components and calculate their structural inductance for designs, such as gas switches and cable connectors. Some ramp wave loading experiments were conducted to check and examine the performances of CQ-3-MMAF. Two copper flyer plates were accelerated to about 3.5 km/s in one shot when the working voltage was charged to 70 kV. The velocity histories agree very well. The dynamic experiments of some polymer bonded explosives and phase transition of tin under ramp wave loadings were also conducted. The experimental data show that CQ-3-MMAF can be used to do material dynamics experiments in high rate and low cost shots. Based on this design concept, the peak current of new generators can be increased to 5-6 MA and about 100 GPa ramp stress can be produced on the metallic samples for high pressure physics, and a conceptual design of CQ-5-MMAF was given.
NASA Astrophysics Data System (ADS)
Delgosha, Payam; Anantharam, Venkat
2018-03-01
Consider a simple locally finite hypergraph on a countable vertex set, where each edge represents one unit of load which should be distributed among the vertices defining the edge. An allocation of load is called balanced if load cannot be moved from a vertex to another that is carrying less load. We analyze the properties of balanced allocations of load. We extend the concept of balancedness from finite hypergraphs to their local weak limits in the sense of Benjamini and Schramm (Electron J Probab 6(23):13, 2001) and Aldous and Steele (in: Probability on discrete structures. Springer, Berlin, pp 1-72, 2004). To do this, we define a notion of unimodularity for hypergraphs which could be considered an extension of unimodularity in graphs. We give a variational formula for the balanced load distribution and, in particular, we characterize it in the special case of unimodular hypergraph Galton-Watson processes. Moreover, we prove the convergence of the maximum load under some conditions. Our work is an extension to hypergraphs of Anantharam and Salez (Ann Appl Probab 26(1):305-327, 2016), which considered load balancing in graphs, and is aimed at more comprehensively resolving conjectures of Hajek (IEEE Trans Inf Theory 36(6):1398-1414, 1990).
Simulation capability for dynamics of two-body flexible satellites
NASA Technical Reports Server (NTRS)
Austin, F.; Zetkov, G.
1973-01-01
An analysis and computer program were prepared to realistically simulate the dynamic behavior of a class of satellites consisting of two end bodies separated by a connecting structure. The shape and mass distribution of the flexible end bodies are arbitrary; the connecting structure is flexible but massless and is capable of deployment and retraction. Fluid flowing in a piping system and rigid moving masses, representing a cargo elevator or crew members, have been modeled. Connecting structure characteristics, control systems, and externally applied loads are modeled in easily replaced subroutines. Subroutines currently available include a telescopic beam-type connecting structure as well as attitude, deployment, spin and wobble control. In addition, a unique mass balance control system was developed to sense and balance mass shifts due to the motion of a cargo elevator. The mass of the cargo may vary through a large range. Numerical results are discussed for various types of runs.
NASA Astrophysics Data System (ADS)
Carroll, R. W. H.; Flickinger, A.; Warwick, J. J.; Schumer, R.
2015-12-01
A bioenergetic and mercury (Hg) mass balance (BioHg) model is developed for the Sacramento blackfish (Orthodon microlepidotus), a filter feeding cyprinid found in northern California and Nevada. Attention focuses on the Lahontan Reservoir in northern Nevada, which receives a strong temporally varying load of dissolved methylmercury (DMeHg) from the Carson River. Hg loads are the result of contaminated bank erosion during high flows and diffusion from bottom sediments during low flows. Coupling of dynamic reservoir loading with periods of maximum plankton growth and maximum fish consumption rates are required to explain the largest body burdens observed in the planktivore. In contrast, the large body burdens cannot be achieved using average water column concentrations. The United States Bureau of Reclamation has produced future streamflow estimates for 2000-2099 using 112 CMIP3 climate projections and the Variable Infiltration Capacity (VIC) model. These are used to drive a fully dynamic Hg transport model to assess changes in contaminant loading to the reservoir and implications on planktivorous bioaccumulation. Model results suggest the future loads of DMeHg entering the Lahontan Reservoir will decrease most significantly in the spring and summer due to channel width increases and depth decreases in the Carson River which reduce bank erosion over the century. The modeled concentrations of DMeHg in the reservoir are expected to increase during the summer due to a decrease in reservoir volume affecting the concentrations more than the decrease in loads, and the model results show that bioaccumulation levels may increase in the upstream sections of the reservoir while maintaining contamination levels above the federal action limit for human consumption in the lower reservoir.
Scalable parallel communications
NASA Technical Reports Server (NTRS)
Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.
1992-01-01
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
NASA Technical Reports Server (NTRS)
Wasynczuk, O.; Krause, P. C.; Biess, J. J.; Kapustka, R.
1990-01-01
A detailed computer simulation was used to illustrate the steady-state and dynamic operating characteristics of a 20-kHz resonant spacecraft power system. The simulated system consists of a parallel-connected set of DC-inductor resonant inverters (drivers), a 440-V cable, a node transformer, a 220-V cable, and a transformer-rectifier-filter (TRF) AC-to-DC receiver load. Also included in the system are a 1-kW 0.8-pf RL load and a double-LC filter connected at the receiving end of the 20-kHz AC system. The detailed computer simulation was used to illustrate the normal steady-state operating characteristics and the dynamic system performance following, for example, TRF startup. It is shown that without any filtering the given system exhibits harmonic resonances due to an interaction between the switching of the source and/or load converters and the AC system. However, the double-LC filter at the receiving-end of the AC system and harmonic traps connected in series with each of the drivers significantly reduce the harmonic distortion of the 20-kHz bus voltage. Significant additional improvement in the waveform quality can be achieved by including a double-LC filter with each driver.
Formalization, equivalence and generalization of basic resonance electrical circuits
NASA Astrophysics Data System (ADS)
Penev, Dimitar; Arnaudov, Dimitar; Hinov, Nikolay
2017-12-01
In the work are presented basic resonance circuits, which are used in resonance energy converters. The following resonant circuits are considered: serial, serial with parallel load parallel capacitor, parallel and parallel with serial loaded inductance. For the circuits under consideration, expressions are generated for the frequencies of own oscillations and for the equivalence of the active power emitted in the load. Mathematical expressions are graphically constructed and verified using computer simulations. The results obtained are used in the model based design of resonant energy converters with DC or AC output. This guaranteed the output indicators of power electronic devices.
Analysis and control of the vibration of doubly fed wind turbine
NASA Astrophysics Data System (ADS)
Yu, Manye; Lin, Ying
2017-01-01
The fault phenomena of the violent vibration of certain doubly-fed wind turbine were researched comprehensively, and the dynamic characteristics, load and fault conditions of the system were discussed. Firstly, the structural dynamics analysis of wind turbine is made, and the dynamics mold is built. Secondly, the vibration testing of wind turbine is done with the German test and analysis systems BBM. Thirdly, signal should be analyzed and dealt with. Based on the experiment, spectrum analysis of the motor dynamic balance can be made by using signal processing toolbox of MATLAB software, and the analysis conclusions show that the vibration of wind turbine is caused by dynamic imbalance. The results show that integrating mechanical system dynamics theory with advanced test technology can solve the vibration problem more successfully, which is important in vibration diagnosis of mechanical equipment.
Lübken, Manfred; Wichern, Marc; Schlattmann, Markus; Gronauer, Andreas; Horn, Harald
2007-10-01
Knowledge of the net energy production of anaerobic fermenters is important for reliable modelling of the efficiency of anaerobic digestion processes. By using the Anaerobic Digestion Model No. 1 (ADM1) the simulation of biogas production and composition is possible. This paper shows the application and modification of ADM1 to simulate energy production of the digestion of cattle manure and renewable energy crops. The paper additionally presents an energy balance model, which enables the dynamic calculation of the net energy production. The model was applied to a pilot-scale biogas reactor. It was found in a simulation study that a continuous feeding and splitting of the reactor feed into smaller heaps do not generally have a positive effect on the net energy yield. The simulation study showed that the ratio of co-substrate to liquid manure in the inflow determines the net energy production when the inflow load is split into smaller heaps. Mathematical equations are presented to calculate the increase of biogas and methane yield for the digestion of liquid manure and lipids for different feeding intervals. Calculations of different kinds of energy losses for the pilot-scale digester showed high dynamic variations, demonstrating the significance of using a dynamic energy balance model.
Retrieval Property of Attractor Network with Synaptic Depression
NASA Astrophysics Data System (ADS)
Matsumoto, Narihisa; Ide, Daisuke; Watanabe, Masataka; Okada, Masato
2007-08-01
Synaptic connections are known to change dynamically. High-frequency presynaptic inputs induce decrease of synaptic weights. This process is known as short-term synaptic depression. The synaptic depression controls a gain for presynaptic inputs. However, it remains a controversial issue what are functional roles of this gain control. We propose a new hypothesis that one of the functional roles is to enlarge basins of attraction. To verify this hypothesis, we employ a binary discrete-time associative memory model which consists of excitatory and inhibitory neurons. It is known that the excitatory-inhibitory balance controls an overall activity of the network. The synaptic depression might incorporate an activity control mechanism. Using a mean-field theory and computer simulations, we find that the synaptic depression enlarges the basins at a small loading rate while the excitatory-inhibitory balance enlarges them at a large loading rate. Furthermore the synaptic depression does not affect the steady state of the network if a threshold is set at an appropriate value. These results suggest that the synaptic depression works in addition to the effect of the excitatory-inhibitory balance, and it might improve an error-correcting ability in cortical circuits.
Recent Developments at the NASA Langley Research Center National Transonic Facility
NASA Technical Reports Server (NTRS)
Paryz, Roman W.
2011-01-01
Several upgrade projects have been completed or are just getting started at the NASA Langley Research Center National Transonic Facility. These projects include a new high capacity semi-span balance, model dynamics damping system, semi-span model check load stand, data acquisition system upgrade, facility automation system upgrade and a facility reliability assessment. This presentation will give a brief synopsis of each of these efforts.
Design of a space shuttle structural dynamics model
NASA Technical Reports Server (NTRS)
1972-01-01
A 1/8 scale structural dynamics model of a parallel burn space shuttle has been designed. Basic objectives were to represent the significant low frequency structural dynamic characteristics while keeping the fabrication costs low. The model was derived from the proposed Grumman Design 619 space shuttle. The design includes an orbiter, two solid rocket motors (SRM) and an external tank (ET). The ET consists of a monocoque LO2 tank an interbank skirt with three frames to accept SRM attachment members, an LH2 tank with 10 frames of which 3 provide for orbiter attachment members, and an aft skirt with on frame to provide for aft SRM attachment members. The frames designed for the SRM attachments are fitted with transverse struts to take symmetric loads.
Visualizing Chemical Interaction Dynamics of Confined DNA Molecules
NASA Astrophysics Data System (ADS)
Henkin, Gilead; Berard, Daniel; Stabile, Frank; Leslie, Sabrina
We present a novel nanofluidic approach to controllably introducing reagent molecules to interact with confined biopolymers and visualizing the reaction dynamics in real time. By dynamically deforming a flow cell using CLiC (Convex Lens-induced Confinement) microscopy, we are able to tune reaction chamber dimensions from micrometer to nanometer scales. We apply this gentle deformation to load and extend DNA polymers within embedded nanotopographies and visualize their interactions with other molecules in solution. Quantifying the change in configuration of polymers within embedded nanotopographies in response to binding/unbinding of reagent molecules provides new insights into their consequent change in physical properties. CLiC technology enables an ultra sensitive, massively parallel biochemical analysis platform which can acces a broader range of interaction parameters than existing devices.
NASA Astrophysics Data System (ADS)
Aneri, Parikh; Sumathy, S.
2017-11-01
Cloud computing provides services over the internet and provides application resources and data to the users based on their demand. Base of the Cloud Computing is consumer provider model. Cloud provider provides resources which consumer can access using cloud computing model in order to build their application based on their demand. Cloud data center is a bulk of resources on shared pool architecture for cloud user to access. Virtualization is the heart of the Cloud computing model, it provides virtual machine as per application specific configuration and those applications are free to choose their own configuration. On one hand, there is huge number of resources and on other hand it has to serve huge number of requests effectively. Therefore, resource allocation policy and scheduling policy play very important role in allocation and managing resources in this cloud computing model. This paper proposes the load balancing policy using Hungarian algorithm. Hungarian Algorithm provides dynamic load balancing policy with a monitor component. Monitor component helps to increase cloud resource utilization by managing the Hungarian algorithm by monitoring its state and altering its state based on artificial intelligent. CloudSim used in this proposal is an extensible toolkit and it simulates cloud computing environment.
NASA Astrophysics Data System (ADS)
Rossi, Francesco; Londrillo, Pasquale; Sgattoni, Andrea; Sinigardi, Stefano; Turchetti, Giorgio
2012-12-01
We present `jasmine', an implementation of a fully relativistic, 3D, electromagnetic Particle-In-Cell (PIC) code, capable of running simulations in various laser plasma acceleration regimes on Graphics-Processing-Units (GPUs) HPC clusters. Standard energy/charge preserving FDTD-based algorithms have been implemented using double precision and quadratic (or arbitrary sized) shape functions for the particle weighting. When porting a PIC scheme to the GPU architecture (or, in general, a shared memory environment), the particle-to-grid operations (e.g. the evaluation of the current density) require special care to avoid memory inconsistencies and conflicts. Here we present a robust implementation of this operation that is efficient for any number of particles per cell and particle shape function order. Our algorithm exploits the exposed GPU memory hierarchy and avoids the use of atomic operations, which can hurt performance especially when many particles lay on the same cell. We show the code multi-GPU scalability results and present a dynamic load-balancing algorithm. The code is written using a python-based C++ meta-programming technique which translates in a high level of modularity and allows for easy performance tuning and simple extension of the core algorithms to various simulation schemes.
Wu, Zhen; Liu, Yong; Liang, Zhongyao; Wu, Sifeng; Guo, Huaicheng
2017-06-01
Lake eutrophication is associated with excessive anthropogenic nutrients (mainly nitrogen (N) and phosphorus (P)) and unobserved internal nutrient cycling. Despite the advances in understanding the role of external loadings, the contribution of internal nutrient cycling is still an open question. A dynamic mass-balance model was developed to simulate and measure the contributions of internal cycling and external loading. It was based on the temporal Bayesian Hierarchical Framework (BHM), where we explored the seasonal patterns in the dynamics of nutrient cycling processes and the limitation of N and P on phytoplankton growth in hyper-eutrophic Lake Dianchi, China. The dynamic patterns of the five state variables (Chla, TP, ammonia, nitrate and organic N) were simulated based on the model. Five parameters (algae growth rate, sediment exchange rate of N and P, nitrification rate and denitrification rate) were estimated based on BHM. The model provided a good fit to observations. Our model results highlighted the role of internal cycling of N and P in Lake Dianchi. The internal cycling processes contributed more than external loading to the N and P changes in the water column. Further insights into the nutrient limitation analysis indicated that the sediment exchange of P determined the P limitation. Allowing for the contribution of denitrification to N removal, N was the more limiting nutrient in most of the time, however, P was the more important nutrient for eutrophication management. For Lake Dianchi, it would not be possible to recover solely by reducing the external watershed nutrient load; the mechanisms of internal cycling should also be considered as an approach to inhibit the release of sediments and to enhance denitrification. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Duval, R. W.; Bahrami, M.
1985-01-01
The Rotor Systems Research Aircraft uses load cells to isolate the rotor/transmission systm from the fuselage. A mathematical model relating applied rotor loads and inertial loads of the rotor/transmission system to the load cell response is required to allow the load cells to be used to estimate rotor loads from flight data. Such a model is derived analytically by applying a force and moment balance to the isolated rotor/transmission system. The model is tested by comparing its estimated values of applied rotor loads with measured values obtained from a ground based shake test. Discrepancies in the comparison are used to isolate sources of unmodeled external loads. Once the structure of the mathematical model has been validated by comparison with experimental data, the parameters must be identified. Since the parameters may vary with flight condition it is desirable to identify the parameters directly from the flight data. A Maximum Likelihood identification algorithm is derived for this purpose and tested using a computer simulation of load cell data. The identification is found to converge within 10 samples. The rapid convergence facilitates tracking of time varying parameters of the load cell model in flight.
NASA Technical Reports Server (NTRS)
Barnett, Alan R.; Widrick, Timothy W.; Ludwiczak, Damian R.
1995-01-01
Solving for the displacements of free-free coupled systems acted upon by static loads is commonly performed throughout the aerospace industry. Many times, these problems are solved using static analysis with inertia relief. This solution technique allows for a free-free static analysis by balancing the applied loads with inertia loads generated by the applied loads. For some engineering applications, the displacements of the free-free coupled system induce additional static loads. Hence, the applied loads are equal to the original loads plus displacement-dependent loads. Solving for the final displacements of such systems is commonly performed using iterative solution techniques. Unfortunately, these techniques can be time-consuming and labor-intensive. Since the coupled system equations for free-free systems with displacement-dependent loads can be written in closed-form, it is advantageous to solve for the displacements in this manner. Implementing closed-form equations in static analysis with inertia relief is analogous to implementing transfer functions in dynamic analysis. Using a MSC/NASTRAN DMAP Alter, displacement-dependent loads have been included in static analysis with inertia relief. Such an Alter has been used successfully to solve efficiently a common aerospace problem typically solved using an iterative technique.
Scemama, Anthony; Caffarel, Michel; Oseret, Emmanuel; Jalby, William
2013-04-30
Various strategies to implement efficiently quantum Monte Carlo (QMC) simulations for large chemical systems are presented. These include: (i) the introduction of an efficient algorithm to calculate the computationally expensive Slater matrices. This novel scheme is based on the use of the highly localized character of atomic Gaussian basis functions (not the molecular orbitals as usually done), (ii) the possibility of keeping the memory footprint minimal, (iii) the important enhancement of single-core performance when efficient optimization tools are used, and (iv) the definition of a universal, dynamic, fault-tolerant, and load-balanced framework adapted to all kinds of computational platforms (massively parallel machines, clusters, or distributed grids). These strategies have been implemented in the QMC=Chem code developed at Toulouse and illustrated with numerical applications on small peptides of increasing sizes (158, 434, 1056, and 1731 electrons). Using 10-80 k computing cores of the Curie machine (GENCI-TGCC-CEA, France), QMC=Chem has been shown to be capable of running at the petascale level, thus demonstrating that for this machine a large part of the peak performance can be achieved. Implementation of large-scale QMC simulations for future exascale platforms with a comparable level of efficiency is expected to be feasible. Copyright © 2013 Wiley Periodicals, Inc.
Knee Joint Kinetics in Relation to Commonly Prescribed Squat Loads and Depths
Cotter, Joshua A.; Chaudhari, Ait M.; Jamison, Steve T.; Devor, Steven T.
2014-01-01
Controversy exists regarding the safety and performance benefits of performing the squat exercise to depths beyond 90° of knee flexion. Our aim was to compare the net peak external knee flexion moments (pEKFM) experienced over typical ranges of squat loads and depths. Sixteen recreationally trained males (n = 16; 22.7 ± 1.1 yrs; 85.4 ± 2.1 kg; 177.6 ± 0.96 cm; mean ± SEM) with no previous lower limb surgeries or other orthopedic issues and at least one year of consistent resistance training experience while utilizing the squat exercise performed single repetition squat trials in a random order at squat depths of above parallel, parallel, and below parallel. Less than one week before testing, one repetition maximum (1RM) values were found for each squat depth. Subsequent testing required subjects to perform squats at the three depths with three different loads: unloaded, 50% 1RM, and 85% 1RM (nine total trials). Force platform and kinematic data were collected to calculate pEKFM. To assess differences among loads and depths, a two-factor (load and depth) repeated-measures ANOVA with significance set at the P < 0.05 level was used. Squat 1RM significantly decreased 13.6% from the above parallel to parallel squat and another 3.6% from the parallel to the below parallel squat (P < 0.05). Net peak external knee flexion moments significantly increased as both squat depth and load were increased (P ≤ 0.02). Slopes of pEKFM were greater from unloaded to 50% 1RM than when progressing from 50% to 85% 1RM (P < 0.001). The results suggest that that typical decreases in squat loads used with increasing depths are not enough to offset increases in pEKFM. PMID:23085977